Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

<Describe how the problem is fixed. Include all affected components. Include diagrams for clarity. This should be the longest section in the document. Use the sections below to call out specifics related to each aspect of the overall system, and refer back to this section for context. Provide links to any relevant external information.>


The main problem to be solved in this design is to ensure that the existing programming model (i.e. the ESS API) for receiving models (and updates) can be used when a service is running on an edge cluster. This problem is decomposed into 2 smaller problems:

  • How to enable network access from a service in the edge cluster to the agent's ESS API?
  • How to communicate the protocol, host, port, SSL cert and login credentials necessary to access the ESS API?

These two problems are discussed in the following sections.

Receiving models

The ESS API is the means by which the service can poll for new and updated models. On edge devices, when a service is started by the agent, it is provided with a URL, login credentials, and an SSL certificate for accessing the ESS API. The URL and SSL certificate are the same for every service that is started by a given agent. The login credentials are unique to each service instance, and are the means of identifying which models the service is able to receive. The URL is provided through OpenHorizon platform environment variables (HZN_ESS_API_PROTOCOL, HZN_ESS_API_ADDRESS, HZN_ESS_API_PORT), the login credentials and SLL cert are mounted to the service container at a location indicated by two other environment variables; HZN_ESS_AUTH and HZN_ESS_CERT. Please note that the SSL cert does not contain a private key, it is a client side cert. The only truly sensitive information is the login credentials.

...

This k8s service allows the edge cluster agent to continue to be the ESS API provider, and enables application containers within the cluster to access the API, even if the agent is moved from one pod to another.

Model Deployment

There are numerous "node type" checks throughout the anax code for the agent and the agbot, some of which disable the deployment of models to edge clusters. These checks should be removed were appropriate to the re-enable model deployment. Removing these checks will allow the ESS to be started in edge cluster agent and allow the agbot to route models to edge cluster nodes.  Aside from these minor code updates, model deployment should work exactly as it does for device nodes. When an agreement is formed, the agbot instructs the MMS to deploy models to the node. Since the ESS will be enabled in the edge cluster agent, the agbot's routing instructions will be performed by the MMS exactly as is done for agreements with device nodes.

Model Storage in the Agent

On edge devices, models deployed to an edge node are stored in root protected storage on the host. For edge clusters, the models are stored in a k8s persistent volume that is available when the agent is installed. A persistent volume is required in case the agent is moved from one pod to another. The persistent volume must be large enough to accommodate the expected number and size of models that will be needed by the edge cluster. Given the potential variability of storage requirements, the node owner must be able to provide this persistent volume to the agent install script. A default persistent volume can be created by the agent install script if one is not provided by the node owner, but that default is unlikely to meet the requirements of all use cases.

User Experience

<Describe which user roles are related to the problem AND the solution, e.g. admin, deployer, node owner, etc. If you need to define a new role in your design, make that very clear. Remember this is about what a user is thinking when interacting with the system before and after this design change. This section is not about a UI, it's more abstract than that. This section should explain all the aspects of the proposed feature that will surface to users.>

...