Netdata: Distributed real-time performance and health monitoring

Netdata is a system for distributed real-time performance and health monitoring. It provides real-time insight of everything happening on the system it runs (including applications such as web and database servers), using modern interactive web dashboards.

netdata

Netdata 1.8.0, released yesterday, focuses on metrics streaming improvements and containers monitoring.

Streaming improvements

Bug fix: streaming slaves consuming 100% CPU

The software, as a slave, was not handling all the error cases properly, resulting in 100% cpu utilization of a single core, under certain conditions. Especially under FreeBSD and macOS slaves, these conditions were always met, so using FreeBSD or macOS as netdata slaves, was completely broken.

Bug fix: missing alarm notifications on netdata masters

The tool was incorrectly messing cached alarm state data between the alarms of the mirrored hosts, resulting in alarm notifications not dispatched under certain conditions. This was affecting only netdata masters (ie. netdata servers with more than one host databases, with health monitoring enabled). The alarms were generated and were visible at the dashboards, but the notifications were not always sent.

Bug fix: streamed charts with duplicate names

There was a minor issue with charts that were created with name aliases. When these charts were streamed from netdata slaves to netdata masters, they ended up with duplicate chart names.

Containers monitoring improvements

  • Container network interfaces are now moved to the container section and they are rendered from the container view point (i.e. sent = what the container sent) – no more veth* garbage on the dashboard.
  • The interfaces also appear as eth0 (or whatever the container sees) and they are inside the container section of the dashboard. netdata maps each veth* interface to the right container, using plain cgroups features, so this works for all container managers (docker, lxc, etc).
  • Eliminated the nested containers shown under certain versions of lxc.
  • Also, containers and VMs now have summary gauges on the dashboard.

Generic enhancements

  • The tool can now listen on UNIX domain sockets (.sock files). This allows a local web server and netdata to communicate bypassing the network stack (for netdata set bind to = unix:/path/to/netdata.sock – this option supports multiple arguments, so netdata can listen to multiple unix sockets and tcp sockets, at the same time).
  • The tool was assuming that the JSON representation of a chart would at most be 1024 bytes, and it was generating corrupted JSON output when any chart was exceeding that limit. Removed the limitation (ie. now there is no limit).
  • The software was crashing while starting, if no usable disks were found.
  • Systemd netdata.service now allows setting negative netdata OOM score and restarts netdata if it crashes. The new netdata.service is not automatically installed when updating netdata. Either delete /etc/systemd/system/netdata.service and then update/re-install netdata, or copy the file by hand.