Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An SR-IOV enabled Ethernet network adapter will run a hardware L2 classifier to sort incoming packets using the MAC address or VLAN ID and assign them to corresponding VF queues. Each assigned VF communicates with the VFIO (kvm) or pciback/pcifront (xen) driver running vendor-specific) VF driver installed on the VM to exchange descriptors with destination addresses for packet data transfer. The IOMMU translates between I/O Virtual Addresses (IOVA) and physical memory addresses. This allows the virtual guest to program the device with guest physical addresses, which are then translated to host physical addresses by the IOMMU. The actual packet data transfer between the NIC and the VM memory is performed by DMA without interrupting the CPU.

Currently, EVE allows to either directly assign an entire physical network adapter into a VM, or to share a single port with multiple VMs using a software switch based on Linux bridge. While the former option reduces resource utilization efficiency, the latter software/interrupt-based approach cannot keep up with the bandwidth and latency requirements of NFV applicationsVNFs.

To summarize, SR-IOV enabled network devices provide a high degree of scalability in a virtualized environment environments as well as improved I/O throughput and reduced CPU utilization. Actually, even in cases where only a single VF is allocated for the physical device and dedicated to a single VM, the extra security features of SR-IOV, such as the MAC/VLAN spoofing protection, make the solution superior to the traditional direct assignment of the entire physical PCI device to a VM.

...

On the face of it, it may appear that hardware-based SR-IOV must always outperform software-based vswitch like Linux bridge or OVS. But in situation situations where the traffic flows East-West within the same server (between applications on the same edge node), OVS (accelerated by DPDK) will likely win against SR-IOV. This is because with OVS-DPDK, traffic is routed/switched within the server and not going back-and-forth between memory and the NIC. There is no advantage of bringing SR-IOV for east-west communication. Rather SR-IOV can become a bottle-neck: traffic paths can become long are longer with PCI bus in-between -e bus potentially limiting the bandwidth and NIC resources becoming over-utilized. In this case, it is better to route the traffic within the server using technology like DPDK. There is a good study from Intel which compares SR-IOV with DPDK-backed vhost interfaces in detail.

To conclude, SR-IOV is suitable for accelerating North-South traffic (beats DPDK in most benchmarks), but for service function chaining (SFC) on the same node it should be combined with high-performance software-based solutions such as DPDK, VPP, etc.

...

SR-IOV virtual functions are instantiated when the PF driver is informed (typically from dom0) about the (maximum) number of required VFs:

...

SR-IOV support can be added into EVE with only few changes/additions to the API and even to the implementation. The API should allow to configure the maximum number of required VFs. We propose to make this part of the device model. For SR-IOV capable NICs we will introduce new values for PhyIoType: PhyIoNetEthPF and PhyIoNetEthVF, to be used in PhysicalIO.ptype and Adapter.type, respectively.

No Format
type PhyIoType int32

const (
  PhyIoType_PhyIoNoop    PhyIoType = 0
  PhyIoType_PhyIoNetEth  PhyIoType = 1
  …
  PhyIoType_PhyIoNetEthPF  PhyIoType = 8
  PhyIoType_PhyIoNetEthVF  PhyIoType = 9
)

To pass the number of required VFs, we could reuse cbattr map and define a new attribute sriov-vf-count, expecting integer as a value (which must be a power of two).

...

On the application side, we need to allow to configure the MAC address and the VLAN ID for an assigned VF. Adapter from devcommon.proto will be extended with EthVF:

No Format
message Adapter {
  org.lfedge.eve.common.PhyIoType type = 1;  // “9” for VF
  string name = 2;
  EthVF eth_vf = 3;
}

message EthVF {
  string mac = 1;
  Uint16 vlan_id = 2;
}

For VF assignment, Adapter.type should be set to PhyIoNetEthVF and MAC/VLAN optionally defined under Adapter.eth_vf. The underlying NIC of the VF (PhysicalIO of type PhyIoNetEthPF) should be referenced by Adapter.name. For other types of assignments, eth_vf should be left empty. Exceeding the number of available VFs will result in error.

...

  1. Create VFs for PhyIoNetEthPF if requested through PhysicalIO.cbattr["sriov-vf-count"]:
    echo <vf-num> > /sys/class/net/<device-name>/device/sriov_numvfs
    It may need to wait for a few seconds for the VFs to be created by the PF driver.
    This could be deferred until there is at least one VF assignment requested.
  2. Once VFs are created, domainmgr needs to collect information about them, most importantly their PCI addresses. This can be obtained from the sys filesystem:

    No Format
    $ ls -l /sys/class/net/eth0/device/virtfn*
    /sys/class/net/eth0/device/virtfn0 ->../0000:18:02.0
    /sys/class/net/eth0/device/virtfn1 -> ../0000:18:02.1
    /sys/class/net/eth0/device/virtfn2 -> ../0000:18:02.2

    Note that the PCI address used uses BDF notation (bus:device.function). VF ID is used as the function number.

  3. The list of VFs would be added to IOBundle (bundle itself would be the PF):

    No Format
    type IoBundle struct {
      …
      VFs []EthVF
    }
    
    type EthVF struct {
      Index      uint8
      PciLong    string
      Ifname     string
      UsedByUUID uuid.UUID
      // etc.
    }

    Note that IoBundle.KeepInHost will always be true for SR-IOV PF.

  4. domainmgr will manage IoBundle.VFs. Most importantly, it will record reservations using EthVF.UsedbyUUID, prevent over-assignments (exceeding the number of available VFs), etc.
  5. kvmContext.CreateDomConfig and xenContext.CreateDomConfig will need to change to use the right PCI address for assignment - if the type of the assignment is PhyIoNetEthVF, then they will take the PCI address from the VFs list, from the entry with the corresponding UsedByUUID.
  6. In doActivateTail(), domainmgr will additionally configure MAC addresses and VLANs before the domain is started. This can be done using the netlink package for Go. Even though this is a network configuration, it is not related to device connectivity or network instances, thus there is no need to move this to zedrouter or nim and make things more complicated.

...

Lastly, we will need to add a grub option to grub.cfg, e.g. enable_sriov, which would automatically enable include intel_iommu and add iommu=pt into the list of kernel command line parameters.