In order to support network metadata we have to rely on services, installed in the VM of user. The most common tool for instance initialization is cloud-init. There are set of images supporting it out-of-the-box: It supports several DataStores (from cloud), and as a variant of datasource we can use OpenStack one. It is open source and documented.

Сloud-init OpenStack DataSource requirements

In order to start communication with OpenStack DataSource, cloud-init do some checks of environment:

  • Maybe OpenStack if
    • non-x86 cpu architecture: because DMI data is buggy on some arches
  • Is OpenStack if x86 architecture and ANY of the following
    • /proc/1/environ: Nova-lxd contains product_name=OpenStack Nova
    • DMI product_name: Either Openstack Nova or OpenStack Compute
    • DMI chassis_asset_tag is OpenTelekomCloud, SAP CCloud VM, OpenStack Nova (since 19.2) or OpenStack Compute (since 19.2)

We can set product_name in smbios for our VMs to tell cloud-init to fire communication with endpoints.

Also we should take into account, that there are an order of DataSource observation inside cloud-init. By default NoCloud (drive we use now) has priority (the order is here).

So, with both DataStores activated:

root@1a831fa7-c50b-4693-a16e-fb8171f1b69e:~# grep Datasource /var/log/cloud-init-output.log
Cloud-init v. 20.4-0ubuntu1~20.10.1 finished at Tue, 09 Mar 2021 07:10:44 +0000. Datasource DataSourceNoCloud [seed=/dev/sr0][dsmode=net].  Up 22.97 seconds

With manually removed NoCloud drive:

ubuntu@niceshamir:~$ grep Datasource /var/log/cloud-init-output.log
Cloud-init v. 20.4-0ubuntu1~20.10.1 finished at Tue, 09 Mar 2021 07:25:26 +0000. Datasource DataSourceOpenStack [net,ver=2].  Up 23.16 seconds

Сloud-init OpenStack DataSource endpoints

OpenStack metadata serves several endpoints

  •{version}/meta_data.json - contains (among other fields) public_keys, hostname, devices (disk, nic)
  •{version}/network_data.json - contains information about networks, dns service and links (which will be configured inside VM)
  •{version}/user_data - contains script to run inside VM
  •{version}/vendor_data2.json - data, which independent from VM deployments (we can omit it now)

  • - contains versions of OpenStack metadata

Those endpoints should be accessible from VM and serve separate information for different VMs.

Cloud-init EC2 DataStore

We can also try to implement EC2-compatible datastore described here: It will be called in case of image has no OpenStack datasource inside and forced to skip check (Cirros image for example).

  1. Why would we want to use the openstack schema? We don't do most of that is in its meta-data such as having EVE (or Nova) generate and provide a ssh public key.

    Do all of the cloud-init clients (I understand there are different versions used in different Linux distros) support the same set datasources and associated schemas? Or are some more commonly supported?

    I realize that what we do now with the noCloud is mostly user-data (with only two attributes in meta-data - instance-id and local-hostname) but we need to understand whether providing the openstack or EC2 API endpoints mean that we must provide more meta-data attributes for the clients to work correctly.

    1. The choice of the scheme is a matter of discussion. I choose it and propose because of the presence of a large number of images builded for it. Of course, the images that have the OpenStack field in the name imply the installation of cloud-init and the presence of a number of data sources supported by it. However, as I suspect, testing is done with this particular platform.

      Cloud-init without modifications in config supports whole set of datasources. But of course, there are other options for supporting the metadata service. For example CirrOS comes with EC2-only support of obtaining public keys.

      Unused fields can be omitted and defined in further development (network_data.json looks promising for defining logic inside VM).

      1. The functionality we are currently missing, where network_data could be useful, is when there are multiple network interfaces. Today we set those up in /etc/network/interfaces.d/ which means a different image with 1 vs 2 vs 3 interfaces. Can we do that with network_data.json? If so adding that makes sense.

        But we need to make sure that the various flavors which work now with cloud-init (Ubuntu, Centos, etc) do not get upset by the subset of meta-data that we will provide.