Submitted by: David Booz (booz@us.ibm.com)

Affiliation(s): IBM

Date of Submission: 19 Aug 2021

Sponsor User: IBM

<Please fill out the above fields, and the Overview, Design and User Experience sections below for an initial review of the proposed feature.>

Scope and Signoff: (to be filled out by Chair)

Status:

Overview

Support for independent and autonomous AI and ML model deployment was added to OpenHorizon a few years ago. Since that time, support for edge nodes which are manifest as Kubernetes clusters has also been added. However, when the edge cluster support was added, there was not sufficient time and resource to support model deployment to edge clusters. This design proposes to address that problem by introducing model deployment to edge clusters.

Model deployment in OpenHorizon touches on several aspects of the system and involves several roles that interact with the models at various stages of the model deployment. The goal of this feature is to enable model deployment to edge clusters without altering the way in which the system handles models nor the way in which the relevant roles interact with the system. Concretely, this design will:

Enable policy based deployment of models to edge clusters, with no changes to the existing model policy schema.
Enable deployed applications to receive models and model updates using the same APIs used by applications which run on an edge device.

Design

Receiving models

On edge devices, when a service is started by the agent, it is provided with a URL, login credentials, and an SSL certificate for accessing the ESS API. The ESS API is the means by which the service can poll for new and updated models. The URL and SSL certificate are the same for every service that is started on a given agent. The login credentials are unique to each service instance, and are the means of identifying which models the service is able to receive. The URL is provided through OpenHorizon platform environment variables (HZN_ESS_API_PROTOCOL, HZN_ESS_API_ADDRESS, HZN_ESS_API_PORT), the login credentials and SLL cert are mounted to the service container at a location indicated by two other environment variables; HZN_ESS_AUTH and HZN_ESS_CERT. Please note that the SSL cert does not contain a private key, it is a client side cert. The only truly sensitive information is the login credentials.

On edge clusters, the service that is deployed is actually a k8s operator (built by the service developer). The operator is responsible for starting the real application containers. Because OpenHorizon has no visibility to the application containers, it is the responsibility of the OH deployed operator to forward the HZN_ESS environment variables, login credentials and SSL cert to the relevant application containers. An operator deployed as an OH service does not need to perform this forwarding if model deployment is not a feature required by the application.

There is a subtle but important difference in how the operator will interact with the HZN_ESS_AUTH and HZN_ESS_CERT environment variables. These env vars will contain the name of a k8s secret containing the respective information; one for the login credentials and one for the SSL certificate. This is different from edge devices, where that env var contains the name of the folder where the credentials are mounted. This difference will enable the operator to simply attach the secrets to any application containers that need them, in a way that is natural for k8s application developers. The OH agent will create these two secrets as part of deploying the operator.

Enabling the ESS API

User Experience

As an org admin, I want to write a model deployment policy that targets a service that is deployed to an edge cluster.

As a service developer, I want to receive deployed models by using the agent's ESS API when my application is running on an edge cluster.

As a node owner, I want the agent installed and configured on an edge cluster to automatically support deployment of models to services that run on my edge cluster.

Command Line Interface

<Describe any changes to the hzn CLI, including before and after command examples for clarity. Include which users will use the changed CLI. This section should flow very naturally from the User Experience section.>

None

External Components

<Describe any new or changed interactions with components that are not the agent or the management hub.>

None

Affected Components

<List all of the internal components (agent, MMS, Exchange, etc) which need to be updated to support the proposed feature. Include a link to the github epic for this feature (and the epic should contain the github issues for each component).>

Agent (the k8s agent container)

Agent install

Security

APIs

<Describe and new/changed/deprecated APIs, including before and after snippets for clarity. Include which components or users will use the APIs.>

None

Build, Install, Packaging

Documentation Notes

Edge cluster agent install

Test

<Summarize new automated tests that need to be added in support of this feature, and describe any special test requirements that you can foresee.>

Space shortcuts

Page tree

Overview

Design

Receiving models

Enabling the ESS API

User Experience

Command Line Interface

External Components

Affected Components

Security

APIs

Build, Install, Packaging

Documentation Notes

Test

Space shortcuts

Page tree

Model Deployment to Edge Clusters

Overview

Design

Receiving models

Enabling the ESS API

User Experience

Command Line Interface

External Components

Affected Components

Security

APIs

Build, Install, Packaging

Documentation Notes

Test