Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There will be no VETH links between VRFs of network instances. The current behavior of applications from different networks not being able to talk to each other directly will be preserved (and enforced with stronger measures). Hairpinning through portmaps will remain as the only option for communication for network-separated applications. In the default VRF domain there will be one routing table per uplink interface. Using ip rules each network instance will be matched with the RT of the uplink that was selected for that network by the configuration/probing. Network instances that use different uplinks at a given moment will be completely isolated from each other, not even sharing any RT along the routing path. Consequently, connections between uplink-separate separated NIs can only be established by hairpinning outside the edge device (through portmaps).

...

A single strongSwan process will continue operating for all VPN network instances. For every VPN NI there will be a separate XFRM device created inside the NI VRF, linked with the corresponding IPsec connection configuration using XFRM interface ID. Packet sent from application will be routed by the VRF routing table via the associated XFRM device, which then determines which SAs to use for encryption. An encrypted (and encapsulated) packet then continues through the VETH pair into the default VRF domain where it is routed out by the uplink routing tableRT. In the opposite direction, the SPI field of the encrypted packet will link to the XFRM device and thus the VRF where the decrypted packet should be inserted into for forwarding (i.e. VETH is skipped in this direction).

...

An alternative solution to VRFs is that instead of using per-NI VRF and CT zone, we could isolate on all network levels and run each Linux bridge and associated network interfaces plus external processes (dnsmasq, radvd, strongSwan ...) in a separate network namespace. For every new network instance zedrouter would create a new named network namespace (alternatively it could start a new containerd task with its own network namespace), connected with the "default" net namespace using a VETH pair. Downlink interfaces have to be moved into the target network namespace before they are put under the bridge. The bridge part is currently done by hypervisors (veth.sh for containers, qemu & xen allow to specify bridge in the domU config). This would be removed and we would instead finalize downlink interface configuration ourselves in doActivateTail() of domain manager.

...

And so while switching to another namespace, locking the thread, doing something quick (not asynchronous, e.g. listing conntracks) and immediately switching back is safe, running a long asynchronous task (e.g. tcp server, packet capture) has the risk of having some Go sub-routines escaping the namespace, which leads to some funky some obscure bugs.

In general, it is recommended to spawn a new child process for every network namespace that needs to be operated in. For this reason, this proposal will follow up on the “Bridge Managerdescribed here.

The main idea of Bridge Manager is to split an overloaded zedrouter and run management of every network instance in a separate Go routine. In this proposal we would go even further and suggest to run Bridge Manager (here called "NetNS manager" because it would manage an entire namespace) as a child process of zedbox (or as a separate containerd processtask). The main communication mechanism of EVE - pubsub - is already prepared for interprocess messaging.

The following diagram shows how network instances can could be isolated from each other using network namespaces. As it can be seen, not only the network configuration is spread across namespaces, but also the management plane is split into multiple processes (all of which increases complexity and overhead, thus making this proposal less appealing).

...