You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

<Please fill out the Overview, Design and User Experience sections for an initial review of the proposed feature.>

Overview

<Briefly describe the problem being solved, not how the problem is solved, just focus on the problem. Think about why the feature is needed, and what is the relevant context to understand the problem.>

Open Horizon generally treats nodes as entities with an independent lifecycle, apart from all other nodes. But there are use cases, such as using sensors to for monitoring critical systems, where it is important to have redundant monitoring in place so that there is always at least one monitoring agent operating. This is of course similar to the principles of high availability and continuous availability that are commonly found within IT systems. Open Horizon already contains a little known feature, called HA Groups, that enables nodes to be associated with each node in a group such that at least 1 copy of a service deployed to the group is always running. Further, Open Horizon also  ensures that when services are upgraded, the upgrade will be rolled across all the members of a node group such that at least 1 copy of the service is always running. The problem with the existing HA Group support is that it is not dynamic. For example, nodes must be added to a group as part of registering them with the management hub. Nodes cannot be removed from a group. Nodes cannot be added to an existing group without unregistering the entire group and registering again with the new group member. Node registration is something that happens once for the lifetime of the node. A node should not need to be unregistered unless it is being decommissioned.

Design

<Describe how the problem is fixed. Include all affected components. Include diagrams for clarity. This should be the longest section in the document. Use the sections below to call out specifics related to each aspect of the overall system, and refer back to this section for context. Provide links to any relevant external information.>


A few design principles to get started:

  1. Nodes in an HA group MAY have different node policies.
  2. Adding a node to an HA Group MUST NOT terminate/restart running services.
  3. Nodes MAY be placed into an HA Group after node registration.
  4. A node MUST be in 0 or 1 HA Groups. A node MUST NOT be in more than 1 HA Group.
  5. A node specifies the other nodes in it's HA Group by Id. All nodes in an HA Group MUST specify all the other nodes in the group. Can we get the exchange to enforce this part of the model?
  6. A user MUST have permission to modify all the node's (resources) in an HA Group in order to form the group.
  7. A service that is deployed to all the nodes in an HA Group MUST be upgraded in a rolling restart in order to avoid a complete outage of the service.
  8. The node agent on nodes in an HA Group MUST be upgraded in a rolling restart in order to avoid a complete outage of the service and model deployment capability.




User Experience

<Describe which user roles are related to the problem AND the solution, e.g. admin, deployer, node owner, etc. If you need to define a new role in your design, make that very clear. Remember this is about what a user is thinking when interacting with the system before and after this design change. This section is not about a UI, it's more abstract than that. This section should explain all the aspects of the proposed feature that will surface to users.>

  • As an org admin, I want to place two or more nodes into an HA Group so that my services will be continuously available.
  • As an org admin, I want to add a node to an HA Group without affecting currently deployed services.
  • As an org user, I want to place two or more nodes into an HA Group so that my services will be continuously available.
  • As an org user, I want to add a node to an HA Group without affecting currently deployed services.
  • As a device owner, I want to place two or more nodes into an HA Group so that my services will be continuously available.
  • As a service deployer, I want to deploy non-HA services to a subset of members in an HA Group to avoid compute resource consumption for services that don't need to be continuously available.
  • As a service deployer, I want to deploy a service ONLY to nodes in an HA Group.

Command Line Interface

<Describe any changes to the hzn CLI, including before and after command examples for clarity. Include which users will use the changed CLI. This section should flow very naturally from the User Experience section.>


External Components

<Describe any new or changed interactions with components that are not the agent or the management hub.>


Affected Components

<List all of the internal components (agent, MMS, Exchange, etc) which need to be updated to support the proposed feature. Include a link to the github epic for this feature (and the epic should contain the github issues for each component).>


Security

<Describe any related security aspects of the solution. Think about security of components interacting with each other, users interacting with the system, components interacting with external systems, permissions of users or components>


APIs

<Describe and new/changed/deprecated APIs, including before and after snippets for clarity. Include which components or users will use the APIs.>


Build, Install, Packaging

<Describe any changes to the way any component of the system is built (e.g. agent packages, containers, etc), installed (operators, manual install, batch install, SDO), configured, and deployed (consider the hub and edge nodes).>


Documentation Notes

<Describe the aspects of documentation that will be new/changed/updated. Be sure to indicate if this is new or changed doc, the impacted artifacts (e.g. technical doc, website, etc) and links to the related doc issue(s) in github.>


Test

<Summarize new automated tests that need to be added in support of this feature, and describe any special test requirements that you can foresee.>

  • No labels