Author: Milan Lenco

Date: May 2th, 2024

Status: Proposal draft

Motivation

Implementation of the EVE networking is quite complicated. It combines various features of the Linux network stack, netfilter, dhcpcd, dnsmasq, etc. The way we use some of these components is not obvious, for example dhcpcd is also used to apply static IP config, iptables “mangle” table plays the major role for traffic filtering (not the “filter” table), main IP routing table is not used at all, etc. Even experienced Linux users have a hard time troubleshooting connectivity issues without having understanding of some implementation details.

Furthermore, the network state information is scattered across multiple sources and requires the use of multiple different shell commands to collect them (ip addr, ip rule, ip route, iptables, dhcpcd -U, conntrack -L, etc.). Also, there are some pubsub topics with very useful state information published, but without looking into the EVE source code it is not possible to make sense of the content and it may even change between EVE releases.

The options to interact with the EVE networking are rather limited and also scattered across multiple commands. For example, user is not able to trigger network connectivity verification on demand, but rather has to wait for some timer to fire. Generating traffic to test packet flow is also rather difficult and often not possible with the tools available. While packet tracing is enabled simply by packaging tcpdump into the debug container, users still have to pick the right interface to listen on and write an appropriate filter to capture the intended flow.

Finally, preparing and testing the “override” network configuration for device bootstrapping is cumbersome and bug prone. User cannot easily test different network configurations, make corrections, run and retry connectivity tests, etc.

Simplifying all of the above while providing more visibility and interactive access to EVE networking is in our interest to empower users in their own troubleshooting efforts and thus to decrease the number of support tickets that we have to allocate our resources for.

Proposal

We propose to enhance the already available “eve” command and add a set of “eve net …” sub-commands for networking visibility and troubleshooting.

Requirements

This is a list of all nice-to-have features, feasibility and implementation difficulty of some of them is not yet clear.

  • Should be accessible and fully functional over serial console
    • i.e. can be used even without device connectivity
    • preferably output max width 79 chars
  • Should be accessible also over ssh and edge-view
    • to be used remotely after device is onboarded and connected to cloud
  • Should provide auto-completion when possible (no need to remember commands)
    • even for arguments like app name it should provide auto-completion
  • Should use colors if (and only if) supported by the terminal for better readability
  • Should not depend on the terminal for the support of scrolling up/down
  • Should be easy to use even without networking background
    • only the conceptual eve networking model should be understood, such as what do we mean by "network instance" etc.
    • but we can allow to enable an expert-level CLI variant through some environmental variable
  • Should be documentation-wise self-contained
    • no need to read some docs from help-center or eve repo
  • Should provide basic “show” commands
    • Show interface configuration, show routing table for a given NI, show ACLs for a give application adapter, etc.
  • Should provide a concise view on the networking configuration, state and topology
    • with pretty formatting, ASCII art, etc.
  • Should allow to visualize the configured route for a given src->dst flow
    • like traceroute, but incl. the ACL applied, ip rule used, interfaces traversed, routing tables and routes applied, etc., not just routing hops
  • Should provide a set of automatic checks to detect the most common issues
    • should provide suggestions for resolving those issues (next steps)
  • Should allow to interactively prepare and manage an “override network config” in an easy-to-use way
    • with verification, incl. connectivity tests and option to revert if they fail
    • this will for example allow to test and fine-tune network proxy config
    • should allow to export the prepared override config (as JSON)
  • Should allow the user to trigger some operations (without having to wait for some timer to fire etc.)
    • e.g. run connectivity verification, request new DHCP lease, retry image download, etc.
  • Should allow to watch for changes and operation progress
    • observe interface, ACL counters
    • observe traffic load
    • observe traffic flows (potentially filtered by app, path, dst, etc.)
    • observe progress of connectivity testing
    • observe config CRUD operations as they happen
    • observe app/eve image download process
    • observe zedcloud HTTP requests (ongoing + pending)
  • Should allow to trace packets easily
    • should not require to know the interface names as they appear in the Linux kernel or to write complicated filters
  • Should allow to generate and inject some traffic for testing (ICMP, ARP, DHCP, DNS etc.)
    • for both external networks and apps
    • show responses (if any), packet flow, etc.
  • Should allow to test connectivity to a datastore and download of an image
    • should allow to either download an entire image or just get headers and maybe first few bytes of every chunk
    • should allow to select (management) port
  • No labels