Docker Pull Resolver Problem
Say you're running a docker swarm mode and your CI (e.g. gitlab runner) runs as a service in your swarm too.
Next to your ci service, you're running a registry inside the swarm too.
When your runner is entering the CD stage, it possibly wants to run docker push
and might ended up like this.
docker push registry:5000/some_shit
Using default tag: latest
Error response from daemon: Get http://registry:5000/v2/: dial tcp: lookup registry on 10.50.0.2:53: no such host
Because you're clever, you attached the runner service and start debugging like this
# system configuration
cat /etc/resolv.conf
search eu-central-1.compute.internal
nameserver 127.0.0.11
options timeout:2 attempts:5 ndots:0
# resolves correctly
dig registry
; <<>> DiG 9.11.2 <<>> enc-registry
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33867
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;registry. IN A
;; ANSWER SECTION:
registry. 600 IN A 10.0.1.5
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Wed Jan 24 10:48:31 UTC 2018
;; MSG SIZE rcvd: 58
# other programms behave corretly too
curl http://registry:5000/v2/_catalog
{"repositories":["some_shit"]}
And now you don't understand the world anymore.
The puzzle solution is how you've started your CI service.
docker create service \
...
--mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock \
...
So it is using the docker daemon from your host system, which is not part of the swarm network and does not have the resolver information of any node in the swarm.
The solution: publish (-p 5000:5000
) the ports of your registry and use localhost:5000
for pull and push in your CI job.