Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • No more logging to files, unless there is a component that we cannot make to use standard syslog (eg. hypervisor logs, lisp logs etc). Even the containers launched by EVE (eg: wlan, wwan etc) should be made to use standard syslog.
  • Have a disk backed queueing mechanism that saves logs from being lost in the event of unexpected power failures or reboots. This includes both main message queues and more specific action queues.
  • Have mechanism to save debug logs on the device disk along with sending them to cloud. This can help engineers to access debug logs from device in the event when remote log level is not set to accept debug logs. Or should we ignore the remote log level? If we decide to persist debug logs in device, the logging infrastructure should take care of limiting the space occupied and also rotate logs without using any additional tools like linux logrotate.
  • In the event of an upgrade failure, queueing mechanism should make sure to not lose the other partition (failed partition with failure messages) logs. These logs should be preserved and sent to cloud after the device comes back online.
  • Have a transformer that adds the partition attribute (partition name IMGA/IMGB), eve service name and version of EVE to log messages that are exported to cloud. This helps while debugging to grep for logs specific to a particular release, partition and service.
  • Ability to prioritize log queue when logs are sent cloud. Logs with different priorities should be in different queues.
  • To prevent making too many API calls to cloud, logs should be exported to cloud in batches.
  • The logging in /opt/zededa/bin/watchdog-report.sh should be preserved so that we get the reboot-reason. (This can be done by having the agentlog append to the reboot-reason file and avoid having to grep the log files in watchdog-report.sh)

...

  1. Create a plugin written in "C" language that interacts with rsyslogd demon. This plugin written in C language would get log messages from rsyslogd and pass them on to another shared library written in golang for EVE specific processing. Interface between the C code and golang code should be kept simple with primitive data types. This approach makes is very easy for exerting back pressure on the message queue when there are network or other failures due to which log messages cannot be delivered to cloud.
  2. If the above approach does not work or has issues, we can always implement the EVE specific functionality as a separate process and have rsyslogd forward messages to our new service using omtcp module. With this there is a problem that the message in transit (from rsyslogd to EVE forwarder service) will always be lost when connectivity to cloud fails.

** Question for Roman - How do we build the plugin in our build environment?

Avoid making too many API calls to cloud for sending log messages

Rsyslog supports batching of log messages before sending to output plugin. Rsyslogd starts a transaction, sends a a bunch of logs and then ends transaction. Plugin has the option to selectively acknowledge a subset of log messages or reject all. Rsyslogd does not mark these log messages as completed until EVE plugin acknowledges/accepts these log messages. This mechanism can be used for putting back pressure to rsyslogd in the event of network or other failures.

Envisioned list of log sources on an EVE edge-node

  1. EVE specific agents. This include EVE agents like zedagent, zedrouter, domainmgr, verifier, zedmanager, downloader, identitymgr, vaultmgr, baseosmgr, nim, nodeagent, ledmanager, wstunnelclient, lisp-ztr and python lisp control plane.
  2. External tools used by EVE. This include dnsmasq, radvd, watchdog, dhcpcd
  3. EVE specific scripts and short running executables. eg. client, diag, device-steps.sh
  4. Other containers running in EVE control domain (dom0). eg. wwan, wlan, guacd, sshd, vtpm (vtpm_server) and ntpd containers.
  5. Hypervisor and associated tools. eg. xen, xenstored, xenconsoled, xl, qemu.
  6. Kernel logs.

These log sources can broadly be grouped into these five categories based on how their logs are output (log destinations):

  1. Sources that send logs only to files. Such sources include hypervisor, qemu, lisp control plane, xen-tools. 
    These logs will need special handling. We can start with having such logs be dumped to /var or /persist and use imfile module of rsyslogd to pick up from there.
    We should later invent a mechanism like LD_PRELOAD or named pipes to makes these sources send logs to rsyslogd without using files.
    There have been mixed opinions from team. Having such services keep logging to files and then make imfile module of rsyslogd scrape logs from file is an option. This will mean that
    we might need aggressive log management (archiving old log files and aggressively deleting the oldest archives).
  2. Sources that are flexible and can be made to change their log destination easily. Such sources include EVE agents, short lived EVE executables and scripts.
    EVE agents for examples can be changed (since they use logrus for their logging needs) to send their logs directly to syslog (/dev/log) or stdout and then re-direct to rsyslogd using logger tool.
    After discussion with team the most preferred way for EVE services/executables is to have an env variable the presence/absence of which will make EVE services log to syslog or stdout directly.
  3. Sources that send logs to both syslog and files. Eg. dnsmasq, dhcpcd, radvd, watchdog etc.
    No special handling will be required in this case.
  4. Sources that run inside containers and the logs of which are collected by memlogd. Eg. wlan, wwan, sshd, guacd, ntpd, vtpm etc.
    Linuxkit has a module called memlogd that collects container logs in a circular buffer. Linuxkit's memlogd module is shipped with logread tool that can read logs from memlogd and output to it's stdout.
    Output of logread can be piped into logger and subsequently sent to rsyslogd.
  5. Kernel logs (easily collected by rsyslogd). Rsyslogd has a module called imklog that can read kernel logs into rsyslogd.
    ** Question: Can rsyslogd user imklog to read xen logs directly?