Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker agent [auto-] configuration for agent v6 #1381

Open
n0mer opened this issue Mar 2, 2018 · 6 comments
Open

docker agent [auto-] configuration for agent v6 #1381

n0mer opened this issue Mar 2, 2018 · 6 comments

Comments

@n0mer
Copy link

n0mer commented Mar 2, 2018

hello,

after migration from dd-agent v5 i've got 2 folders in /etc/datadog-agent/conf.d: docker.d and docker_daemon.d

Collector says there are following Loading Errorsin docker_daemon:

  • Core Check Loader: Check docker_daemon not found in Catalog
  • JMX Check Loader: check is not a jmx check, or unable to determine if it's so
  • Python Check Loader: No module named docker_daemon

docker-daemon.yaml in v5 had very simple configuration:

init_config:

instances:
  - ## Daemon and system configuration
    url: "unix://var/run/docker.sock"
    new_tag_names: true

So, there are several post-migration questions:

  • do i need both docker.d and docker_daemon.d? If yes - what's the difference between them?
  • whether this content in ./conf.d/docker.d/conf.yaml sufficient for v6:
init_config:

instances:
  - ## The agent honors the DOCKER_HOST, DOCKER_CERT_PATH and DOCKER_TLS_VERIFY
    url: "unix://var/run/docker.sock"
    new_tag_names: true

    collect_container_size: true
    collect_images_stats: true
    collect_image_size: true
    collect_disk_stats: true
    collect_exit_codes: true
  • what's exactly can be wrong with docker agent if it reports images, but does not report running containers?

image

host "dashboard" contents:

image
image

@n0mer
Copy link
Author

n0mer commented Mar 2, 2018

got this after turning DEBUG logging:

2018-03-02 02:21:40 CET | DEBUG | (loader.go:88 in Load) | Unable to load python module - datadog_checks.docker: No module named docker
2018-03-02 02:21:40 CET | DEBUG | (loader.go:88 in Load) | Unable to load python module - docker: No module named docker
2018-03-02 02:21:40 CET | DEBUG | (autoconfig.go:487 in getChecks) | Python Check Loader: unable to load the check 'docker': No module named docker
2018-03-02 02:21:40 CET | DEBUG | (autoconfig.go:476 in getChecks) | Core Check Loader: successfully loaded check 'docker'
2018-03-02 02:21:40 CET | WARN | (check.go:243 in Configure) | could not get a check instance with the new api: __init__() takes exactly 5 arguments (4 given)
2018-03-02 02:21:40 CET | WARN | (check.go:244 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
2018-03-02 02:21:40 CET | WARN | (check.go:269 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (http_check).

@n0mer
Copy link
Author

n0mer commented Mar 2, 2018

from agent.log

2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: /dev/vda1, ext4, /var/lib/docker/plugins
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: /dev/vda1, ext4, /var/lib/docker/aufs
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/fa07eca91c50fa767b5317e2b61f12400cbefffd2a74bf6c9901b8fd0f741a24
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: nsfs, nsfs, /run/docker/netns/default
2018-03-02 02:31:26 CET | WARN | (datadog_agent.go:135 in LogMessage) | (disk.py:104) | Unable to get disk metrics for /run/docker/netns/default: [Errno 13] Permission denied: '/run/docker/netns/default'
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/1447584bef070dd23155717c9b2d0cf10c1f31c1eef97d8b78a21e277e872c14/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/a4068e9ef162687262715c0ca56508082e8cb36477bf2a2fe64e3043e14e2153
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/046ded549ef697e72181451cebf0fb8a08ea9c3ce89c8106400e789b4cacfeac/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/bb5376de41cc3608551412cff7a46e0937f6ea39b714dff253aaa4f03fa6ab38
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/2e5d2a551c553ef46d646d84a244a95ffdfedbbc624d529e2b62323160cd806c/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/7912f4ec8c0235eb64c9c5c3bb19f563438b8572aa68a337dc83f662ebdf4663
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/1fc09ad3b9f71ead0c60273c7ea7f5a3fc05cb2e24155a2e623f8140d2d4d92e/shm
2018-03-02 02:31:26 CET | INFO | (runner.go:246 in work) | Running check docker
2018-03-02 02:31:26 CET | DEBUG | (job.go:99 in waitForTick) | Enqueuing check docker for queue 15000000000
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 1fc09ad3b9f7 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 2e5d2a551c55 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 046ded549ef6 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 1447584bef07 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | INFO | (runner.go:302 in work) | Done running check docker

@xvello xvello self-assigned this Mar 2, 2018
@xvello
Copy link
Contributor

xvello commented Mar 2, 2018

Hi @n0mer ,

The docker_daemon check is deprecated and replaced by the docker check. Could you please describe your migration path? docker_daemon.d should not be automatically copied by the migration command.

You docker.d config looks OK, what could happen is:

  • either the docker daemon timeouts because the container list command takes too long to respond, because of the collect_container_size: true option. As described in the documentation, this option basically runs a du -hs on every container and takes a long time if you have a high number of containers
  • the cgroup detection logic failed because either /host/proc or /host/sys/fs/group is improperly mounted. What is your host system? Could you please share your datadog-agent container inspect?

@n0mer
Copy link
Author

n0mer commented Mar 2, 2018

@xvello Xavier, i executed command

 DD_UPGRADE=true bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"

from https://github.com/DataDog/datadog-agent/blob/master/docs/agent/upgrade.md , and i got 2 folders docker.d and docker_daemon.d

@n0mer
Copy link
Author

n0mer commented Mar 2, 2018

@xvello are those error log messages _exclude_disk , could not parse container id from path and Container id ... has an empty cgroup, skipping irrelevant, and can be ignored?

so, i removed docker_daemon.d (so only docker.d is left), set collect_container_size: false - still no luck.

Anyway, docker agent working with collect_container_size: true on another server, so this might not be a problem.

I opened support case #132559 , submitted flares with configs and logs. Brian B. is looking into it.

@n0mer
Copy link
Author

n0mer commented Mar 2, 2018

@xvello i'm running datadog agent w/out docker image

# ps auxww | grep dd-agent
dd-agent 29545  2.7  0.8 1077968 65860 ?       Ssl  17:17   0:05 /opt/datadog-agent/bin/agent/agent start -p /opt/datadog-agent/run/agent.pid
dd-agent 29546  0.3  0.2 780084 23100 ?        Ssl  17:17   0:00 /opt/datadog-agent/embedded/bin/trace-agent --config /etc/datadog-agent/datadog.yaml --pid /opt/datadog-agent/run/trace-agent.pid
dd-agent 29547  0.6  0.3  44944 28772 ?        Ssl  17:17   0:01 /opt/datadog-agent/embedded/bin/process-agent --config=/etc/datadog-agent/datadog.yaml --pid=/opt/datadog-agent/run/process-agent.pid
dd-agent 29644  6.2  2.0 3668924 164228 ?      Sl   17:17   0:12 java -Xmx200m -Xms50m -classpath /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.18.2-jar-with-dependencies.jar org.datadog.jmxfetch.App --ipc_host localhost --ipc_port 5001 --check_period 15000 --log_level DEBUG --reporter statsd:localhost:8125 collect

# mount | grep group
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
# mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12273)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

i had tmpfs excluded in conf.d/disk.d/conf.yaml , now tmpfs is not excluded (please also notice use_mount:yes, dunno whether it can affect docker running containers meta collection):

init_config:

instances:
  # The use_mount parameter will instruct the check to collect disk
  # and fs metrics using mount points instead of volumes
  - use_mount: yes
    # The (optional) excluded_filesystems parameter will instruct the check to
    # ignore disks using these filesystems. Note: On some linux distributions,
    # rootfs will be found and tagged as a device, add rootfs here to exclude.
    excluded_filesystems:
#      - tmpfs
#      - none
#      - shm
#      - nsfs
#      - tracefs

@xvello xvello removed their assignment Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants