Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Network tracing comes with an additional overhead and the output can be quite large (JSON with tens of kilobytes in size). Therefore, we must be careful about how often are network traces obtained and how do we publish them. For example, logging network trace as a single message is not an option. Instead, EVE publishes network traces inside Tar/GZip archives, labeled as "netdumps", by storing them persistently under /persist/netdump directory (for now EVE does not upload them to the cloud or anywhere else remote). This is done by the pillar's netdump package, which additionally adds some more files into each archive to capture the config/state of the device connectivity at the moment of the publication. All this information combined allows to troubleshoot a connectivity issue (between device and the controller or a data-store) even after it is no longer reproducible. Ideally, it should not be required to ask a customer for more (networking-specific) information to better understand the issue, let alone to run some commands and retrieve the output for us (because this has already been done automatically by netdump).

...

  • /ping request done by nim to verify connectivity for the latest DPC (testing of older DPCs is never traced). Packet capture is also enabled and the obtained pcap files are included in the published netdumps. To limit the overhead associated with tracing and packet capture, nim is only allowed to enable them and produce netdump at most once per day (configurable by netdump.topic.publishpostonboard.interval). However, before device is fully onboarded this interval is lowered to one netdump per hour a much lower interval configured by netdump.topic.preonboard.interval (by default one hour) is applied to get more frequent diagnostics for initial connectivity troubleshooting.
  • /config and /info requests done by zedagent to obtain device configuration and publish info messages, respectively. Packet capture is not enabled in this case. Follows the same interval as given by netdump.topic.publishpostonboard.interval. For /info requests, tracing only covers publication of the ZInfoDevice message. Moreover, tracing is enabled only if the highest priority DPC is currently being applied and is reported by nim as working. Otherwise, we will eventually get nim-fail* netdump which should be sufficient for connectivity troubleshooting. The purpose of zedagent netdumps is to debug issues specific to /config an /info requests (which are essential to keep the device remotely manageable). Netdumps are published separately into topics zedagent-config-<ok|fail> and zedagent-info-<ok|fail>.
  • every download request performed by downloader using the HTTP protocol is traced and netdump is published into the topic downloader-ok or downloader-fail, depending on the outcome of the download process. By default, this does not include packet captures. However, a limited PCAP can be enabled with config option netdump.downloader.with.pcap. Limited in the sense that it will not include TCP segments carrying non-empty payload (i.e. packets with the downloaded data). The total PCAP size is also limited to 64MB (packets past this limit will not be included).

...