...
- HA Group membership is obtained from the new /hagroup resource in the Exchange. The existing HA support obtains this info from an internal representation of the node (in the code it's called the producer policy, and the info is also saved in the Agbot's agreement object in the DB). The HA Group membership should be removed from this internal representation and obtained from the /hagroup resource. The /hagroup resources will also need to be added to the resource cache in the Agbot.
- The existing HA support assumes that ALL services running on a node in an HA Group are supposed to be running on ALL nodes in the group. This assumption is no longer true with this design. The Agbot needs to perform some additional checking (while managing a rolling upgrade) to understand which nodes in the group the service should be running on, and ensuring that it is always running on at least one of them. The Agbot cannot MUST NOT assume that the service being upgraded is intended to be running on all nodes in the group. A service is intended to be running on a node if the node policy is compatible with the service's policy and all deployment policies that reference the service.
...
In addition to rolling upgrade support for service upgradesservices, agent upgrades also need to be performed in a rolling fashion across all the nodes in an HA /CA node group. Agents are responsible for autonomously upgrading themselves based on node management policy (NMP) as defined by the administrator, therefore there is no central entity that is able to coordinate across agents within a group. The only entity in the system capable of assisting with the coordination is the Agbot. Agents
Group will ask the Agbot if the agent can start the agent upgrade process. If the Agbot agrees, it will record (in the database) that the calling node is performing an upgrade, including which NMP is being processed by the node. With multiple Agbot instances, the database is needed to ensure that concurrent calls from different agents receive the correct response (i.e. only one agent is allowed to proceed with the upgrade). Subsequent calls from other nodes in the group will result in the agent being told to pause the upgrade. It is the agent's responsibility to poll the Agbot until it agrees that the upgrade may proceed. This will ensure that only 1 agent in an HA Group is upgrading at any point in time. The Agbot will use NMP status to know when a node upgrade has completed, allowing another node in the group to proceed.
...
To create, modify and delete HS HA Groups, use the following commands:
hzn exchange node hagroup create <name> --nodeId node1 --nodeId node2 node2 [ --nodeId node3 --forcenode3 ]
hzn exchange node hagroup remove update <name> --nodeId node1 --nodeId node2 [ --nodeId node3 ]
hzn exchange hagroup delete <name>
hzn exchange node hagroup listlist [ <name> ]
To check if a service will be deployed to all nodes in an HA Group, we need a new flag on the command. This flag will cause the command to retrieve the list of nodes in the HA group and compare them all against the policy inputs:
...
Agent - Awareness of HAGroup membership for agent upgrade procedureUse of the Agbot API to direct agent upgrades.
Agbot - Awareness of a node's HAGroup HA Group membership for making agreements, and an API for tracking rolling agent upgrades.
CLI - To list, add, remove nodes from an HA Group.
Exchange - To hold the new HAGroup HA Group membership APIsAPI.
...
Exchange APIs
The following new APIs are introduced in this design. Any user in an org can use these APIs (or corresponding CLI). Org users can only create/modify/delete ha HA groups containing nodes that the user has permission to modify.
The HAGroup HA Group object schemamembers: [
{
"node":"node1234"
},
...
{ "name": "hagroup name",
"members": [ "node1234", ... ],
}
Exchange APIs:
Create a new node group. The caller must have permission to modify all the nodes listed in the body (shown above). The Exchange will set a reference to this same object onto all the nodes node resources listed in the body. The Exchange will return an error (409) if one of the nodes is already in an hagroup. If force=true is specified, the Exchange will set this membership onto all listed nodes and will remove listed nodes from any group they are already inTo remove a node from a group, use the PUT API to provide the list of nodes that should be in the group.
POST /org/<org>/node/<node-id>/hagroup?force=true
Modify the group membership of an existing group. All the desired members of the group MUST be listed in the body. This API behaves like a full replace. The force=true parameter has the same behavior as on POST.
PUT /org/<org>/node/<node-id>/hagroup?force=truehagroup
List all the hagroups.
GET /org/<org>/hagroup
List all the members in an hagroup. This API returns the exact same results when called on any member in an ha group.members of an hagroup.
GET /org/<org>/hagroup/<name>
Delete an hagroup.
DELETE /org/<org>/hagroup/<name>
Agbot APIs
To tell the Agbot that the node is ready to upgrade the agent.
POST GET /org/<org>/node/<node-id>/hagroupupgrade
200 - yes, go ahead with the upgrade
409 - no, another node is upgrading
...
- Need an Overview of how HA Groups work on the OH doc site. Hopefully the material from this doc can be used for that content.
- Need a new article describing how to use HA Groups, this would be focused toward the administrator and node owners. It could be the same article for both roles (but we might change our mind on this AFTER we have tried to write it). This article would show the use of the CLI and probably need to include or refer to a CLI reference section.
- Update/remove HA doc in the anax repo for the HA /attribute API.
...
- Support for this capability is being removed.
...
...