Overview
...
The design proposes to enhance the current concept of HA node groups by enabling organization administrators and node owners to create HA node groups at any time in the lifecycle of a node. Further, the design proposes to loosen the current restriction that all services deployed to a node in an HA group are deployed on all nodes in the group, enabling the use of heterogeneous node equipment within a group. Note that for the purposes of this design, the HA node group concept is intended to provide both HA and CA for the services running on those nodes. Following are
...
- HA Group membership is obtained from the node's new /hagroup resource in the Exchange. The existing HA support obtains this info from an internal representation of the node (in the code it's called the producer policy, and the info is also saved in the Agbot's agreement object in the DB). The HA Group membership should be removed from this internal representation and obtained from the node's /hagroup resource. The /hagroup resources will also need to be added to the resource cache in the Agbot.
- The existing HA support assumes that ALL services running on a node in an HA Group are supposed to be running on ALL nodes in the group. This assumption is no longer true with this design. The Agbot needs to perform some additional checking (before attempting while managing a rolling upgrade) to understand which nodes in the group the service should be running on, and ensuring that it is always running on at least one of them. The Agbot MUST NOT assume that the service being upgraded is intended to be running on all nodes in the group. A service is intended to be running on a node if the node policy is compatible with the service's policy and all deployment policies that reference the service.
...
When new resources are added to the system, the scope of change notification of those resources needs to be defined. Both the Agbot and the agent need to be aware of /hagroup resource creation/update/deletion.
Agent Upgrades
In addition to HA/CA rolling upgrade support for service software upgradesservices, agent upgrades also need to be performed in a rolling fashion across all the nodes in an HA /CA node group. Agents are responsible for autonomously upgrading themselves based on node management policy (NMP) as defined by the administrator, therefore there is no central entity that is able to coordinate across agents within a group. The only entity in the system capable of assisting with the coordination is the Agbot. Agents
Group will ask the Agbot if the agent can start the agent upgrade process. If the Agbot agrees, it will record (in the database) that the calling node is performing an upgrade, including which NMP is being processed by the node. With multiple Agbot instances, the database is needed to ensure that concurrent calls from different agents receive the correct response (i.e. only one agent is allowed to proceed with the upgrade). Subsequent calls from other nodes in the group will result in the agent being told to pause the upgrade. It is the agent's responsibility to poll the Agbot until it agrees that the upgrade may proceed. This will ensure that only 1 agent in an HA Group is upgrading at any point in time. The Agbot will use NMP status to know when a node upgrade has completed, allowing another node in the group to proceed.
...
To create, modify and delete HS HA Groups, use the following commands:
hzn exchange node hagroup create <name> --nodeId node1 --nodeId node2 node2 [ --nodeId node3 --forcenode3 ]
hzn exchange node hagroup remove update <name> --nodeId node1 --nodeId node2 [ --nodeId node3 ]
hzn exchange hagroup delete <name>
hzn exchange node hagroup listlist [ <name> ]
To check if a service will be deployed to all nodes in an HA Group, we need a new flag on the command. This flag will cause the command to retrieve the list of nodes in the HA group and compare them all against the policy inputs:
hzn deploycheck --checkHA
...
Agent - Awareness of HAGroup membership for agent upgrade procedureUse of the Agbot API to direct agent upgrades.
Agbot - Awareness of a node's HAGroup HA Group membership for making agreements, and an API for tracking rolling agent upgrades.
CLI - To list, add, remove nodes from an HA Group.
Exchange - To hold the new HAGroup HA Group membership APIsAPI.
...
Exchange APIs
The following new APIs are introduced in this design. Any user in an org can use these APIs (or corresponding CLI). Org users can only create/modify/delete ha HA groups containing nodes that the user has permission to modify.
The HAGroup HA Group object schemamembers: [
{
...
"name": "
...
hagroup name",
"members":
...
]
...
[ "node1234", ... ],
"updated": <update time stamp>
}
Create a new node group. The caller must have permission to modify all the nodes listed in the body (shown above). The Exchange will set a reference to this same object onto all the nodes node resources listed in the body. The Exchange will return an error (409) if one of the nodes is already in an hagroup. If force=true is specified, the Exchange will set this membership onto all listed nodes and will remove listed nodes from any group they are already inTo remove a node from a group, use the PUT API to provide the list of nodes that should be in the group.
POST /org/<org>/node/<node-id>/hagroup?force=truehagroups
Modify the group membership of an existing group. All the desired members of the group MUST be listed in the body. This API behaves like a full replace. The force=true parameter has the same behavior as on POST.
PUT /org/<org>/node/<node-id>/hagroup?force=true/hagroups
List all the hagroups.
GET /org/<org>/hagroups
List all the members in an hagroup. This API returns the exact same results when called on any member in an ha group.members of an hagroup.
GET /org/<org>/hagroups/<name>
Delete an hagroup.
DELETE /org/<org>/hagroups/<name>
Node Resource:
In addition to the new hagroup resource, the node resource is also extended with a new field called "ha_group" that contains a reference to the HA group in which this node is a member. This field is updated for all nodes in an HA group when a new HA group is created, updated or deleted. Updates to this field are atomic with updates to the hagroup resource.
Agbot APIs
To tell the Agbot that the node is ready to upgrade the agent.
POST GET /org/<org>/node/<node-id>/hagroupupgrade
200 - yes, go ahead with the upgrade
409 - no, another node is upgrading
...
- Need an Overview of how HA Groups work on the OH doc site. Hopefully the material from this doc can be used for that content.
- Need a new article describing how to use HA Groups, this would be focused toward the administrator and node owners. It could be the same article for both roles (but we might change our mind on this AFTER we have tried to write it). This article would show the use of the CLI and probably need to include or refer to a CLI reference section.
- Update/remove HA doc in the anax repo for the HA /attribute API.
...
- Support for this capability is being removed.
...
...