Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
{
  "owner":"",                // The user that created this resource
  ''"label": "",               // (Optional) A short description of the policy
  "description": "",         // (Optional) A much longer description of the policy
  "constraints": [""],       // (Optional) Typical constraint expression used to select nodes
  "properties": [{}],        // (Optional) Typical property expressions used by nodes to select policy 
  "patterns": [""],          // (Optional) This policy applies to nodes using one of these patterns
  "enabled": boolean,        // Is this policy enabled or disabled, default false == disabled
  "agentUpgradePolicy": {},  // (Optional) Assertions on how the agent should update itself
  "lastUpdated":"<time-stamp>"
}

...


The "pattern" field is NOT mutually exclusive with "properties" and "constraints". This allows a single NMP to apply to nodes which use patterns or policies. The patterns list may contain the string "*", indicating that the NMP applies to any node that is using a pattern.

Because patterns can be public, the pattern array can contain org qualified pattern names. That is, the format of the strings in the patterns field is org/pattern-name, where the character '/' is used as a separator. If org is omitted, the org of the NMP is the default org.(Ling: the nodes with patterns can also have node policies, feel like the patterns attribute is redundant here, we can put "pattern == blah" in the constraints as a special constraints, pattern can be a special property for a node).

The constraints field contains the same text based constraint language found in The constraints field contains the same text based constraint language found in deployment, service, model and node policy constraints. The same language parser and therefore the same syntax is supported. The properties field contains policy properties, using the same syntax as deployment, service, model and node policies. The properties and constraints are used to determine which policy based nodes are subject to the intent of the NMP. This determination includes the same bi-directional policy compatibility calculations used for deployment, service, model and node policy.

An important new concept with NMP is the ability to administratively enable and disable the policy. This allows an NMP to be published but not acted on by the agent until the policy is enabled. Retrofitting this concept into deployment policy is not within the scope of this design.


??? Question: Is there a flavor of NMP that deactivates itself after it is complete? How do we know when an NMP is complete? Do we need to know? ???

...

In a large scale environment with thousands of nodes, an enabled NMP with empty constraints and/or patterns set to "*" could be quite disruptive to the management hub or even the entire system of nodes by attempting to upgrade all agents at approximately the same time. Publishing a NMP that meets these criteria  criteria requires the user to confirm that they understand the potentially disruptive nature of the policy before the NMP is published in the exchange.

Note that a default NMP is equivalent to having no NMP at all because the default for the enabled field is false.

NMP Constraint Collisions

Given the flexibility of policy within OH, it is possible that NMPs with conflicting actions could be compatible with the same node(s). There are a myriad of circumstances in which this could occur. Within this design, the conflict could occur in relation to the atLeastVersion setting of the agent update policy. Features (in the future) which add additional policy to a NMP could create additional conflicting situations. Two or more NMPs cannot be in conflict if they are enacting different kinds of policy. This is important for the future when additional management policy kinds are added.

The agent is responsible for interpreting each NMP and acting on it accordingly, therefore an agent is only concerned with the NMPs that apply to it. If there are any conflicts, the agent will not enact the conflicting policy action. For example, NMP1 enables policy intent1 at time t1, NMP2 enables policy intent2 at time t2, and NMP3 enables policy intent1 at time t3 with a policy setting that conflicts with NMP1. For the purposes of conflict resolution, the only potential conflicts are between NMP1 and NMP3. Agents will implement the policy intents in NMP1 and 2. NMP3 is ignored because NMP3 conflicts over intent1 and was added after NMP1 was already enacted. 

It is possible that conflicting actions could be introduced in close temporal proximity. This could result in actions that are immediately reversed. The agent should be prepared to handle these cases. For example, NMP1 and NMP2 are introduced/enabled a few seconds apart and both adjust the same policy setting (e.g. autoUpgrade). Some of the nodes that match NMP1 and NMP2 will be notified of both NMP1 and NMP2 at the same time. These nodes will enact the actions in NMP2, completely ignoring NMP1. Other nodes might only be notified of NMP1, and will enact the policy in NMP1. Very soon after that, these same nodes are notified of NMP2, and recognizing that NMP2 is newer than NMP1, will enact the action in NMP2, reversing what was done when NMP1 was enacted.

Here are the changes that might result in a management policy collision:

  1. new/changed node policy - this can occur at node registration (which is essentially creation of a node policy) or when there are changes to an existing node policy
  2. enablement of an existing NMP - an NMP can be published in the disabled state
  3. new/changed NMP that is also enabled

Agent Upgrade Policy

There are several discrete lifecycle steps associated with upgrading the agent software and/or configuration to a new version:

  • a stimulus to begin the process
  • download the new version
  • verify a hash of the downloaded package
  • install the new package (if there is one)
  • update any new or changed configuration (if any)
  • restart the agent process
  • report the new (or failed) version info to the node/status resource in the exchange

This is the implied lifecycle for a new kind of policy called "agentUpgradePolicy" defined within a NMP. 

For any given node, the stimulus to initiate the "agentUpgradePolicy" occurs when a NMP is successfully enabled (or created already enabled) that matches the node (policy or pattern). The node self discovers this stimulus (through the exchange notification framework, i.e. the /changes API) and initiates the policy actions on its own. Packages are downloaded from the hub (CSS), the package hash is verified and the package is installed and/or the config is updated. The agent is restarted (see below) and updates it's new package version in its exchange node/status resource.

The "agentUpgradePolicy" section in a NMP contains the specifics of the agent upgrade intent being expressed by the administrator, and conforms to the following JSON snippet:
"agentUpgradePolicy": {
      "autoUpgrade":  boolean, // Do the selected nodes auto upgrade or not, default false == no auto upgrade
      "atLeastVersion": "<version> | current", // specify the minimum agent version these nodes should have, default "current"
      "start": "<RFC3339 timestamp> | now", // when to start an upgrade, default "now"
      "duration": seconds // enable agents to randomize upgrade start time within start + duration, default 0
},

The autoUpgrade flag instructs the agent whether or not to actively (periodically) check for new updates. Checking for new updates is performed based on the node's heartbeat configuration.

The atLeastVersion field indicates the minimum version the agents should upgrade to when instructed. Enabling autoUpgrade implies that toVersion is set to current, therefore any value other than current is invalid. The toVersion field may be omitted when autoUpgrade == true. When atLeastVersion is set to current, the agent will periodically check for new versions and automatically upgrade to the newest version.

The start field contains an RFC3339 timestamp indicating when the upgrade should start. The timestamp is in UTC format so that affected nodes anywhere in the world are acting at approximately the same time. When autoUpgrade == true, the start field is ignored.

 The duration field is specified in seconds, and is added to the start time to form a window of time within which the agent will initiate an update. Each agent will randomly select a point within that window to begin the upgrade process. This prevents all affected agents from upgrading at the same time, possibly overwhelming the management hub. The combination of selecting nodes based on properties and constraints plus the start time and duration is intended to enable org administrators sufficient control over mass upgrades to prevent overwhelming the management hub. The duration field is ignored when autoUpgrade == true.

If an agent is instructed to upgrade to a version which it is already running, it will not perform the upgrade.

If an agent detects a policy conflict, it will log the conflict in its event log.

If an agent upgrade fails, the agent will be rolled back to the previous version. The failure status will be recorded in the node's /status resource in the exchange, and the upgrade will not be attempted again until a newer version of the package is made available. 
(Ling: rolling back is not supported yet in agent-install)

...

Agent Upgrade Policy

There are several discrete lifecycle steps associated with upgrading the agent software and/or configuration to a new version:

  • a stimulus to begin the process
  • download the new version
  • verify a hash of the downloaded package
  • install the new package (if there is one)
  • update any new or changed configuration (if any)
  • restart the agent process
  • report the new (or failed) version info to the node/status resource in the exchange

This is the implied lifecycle for a new kind of policy called "agentUpgradePolicy" defined within a NMP. 

For any given node, the stimulus to initiate the "agentUpgradePolicy" occurs when a NMP is successfully enabled (or created already enabled) that matches the node (policy or pattern). The node self discovers this stimulus (through the exchange notification framework, i.e. the /changes API) and initiates the policy actions on its own. Packages are downloaded from the hub (CSS), the package hash is verified and the package is installed and/or the config is updated. The agent is restarted (see below) and updates it's new package version in its exchange node/status resource.

The "agentUpgradePolicy" section in a NMP contains the specifics of the agent upgrade intent being expressed by the administrator, and conforms to the following JSON snippet:

Code Block
"agentUpgradePolicy": {
      "atLeastVersion": "<version> | current", // Specify the minimum agent version these nodes should have, default "current"
      "start": "<RFC3339 timestamp> | now",    // When to start an upgrade, default "now"
      "duration": seconds                      // Enable agents to randomize upgrade start time within start + duration, default 0
},

The autoUpgrade flag instructs the agent whether or not to actively (periodically) check for new updates. Checking for new updates is performed based on the node's heartbeat configuration.

The atLeastVersion field indicates the minimum version the agents should upgrade to when instructed. Enabling autoUpgrade implies that toVersion is set to current, therefore any value other than current is invalid. The toVersion field may be omitted when autoUpgrade == true. When atLeastVersion is set to current, the agent will periodically check for new versions and automatically upgrade to the newest version.

The start field contains an RFC3339 timestamp indicating when the upgrade should start. The timestamp is in UTC format so that affected nodes anywhere in the world are acting at approximately the same time. When autoUpgrade == true, the start field is ignored.

 The duration field is specified in seconds, and is added to the start time to form a window of time within which the agent will initiate an update. Each agent will randomly select a point within that window to begin the upgrade process. This prevents all affected agents from upgrading at the same time, possibly overwhelming the management hub. The combination of selecting nodes based on properties and constraints plus the start time and duration is intended to enable org administrators sufficient control over mass upgrades to prevent overwhelming the management hub. The duration field is ignored when autoUpgrade == true.

If an agent is instructed to upgrade to a version which it is already running, it will not perform the upgrade.

If an agent detects a policy conflict, it will log the conflict in its event log.

If an agent upgrade fails, the agent will be rolled back to the previous version. The failure status will be recorded in the node's /status resource in the exchange, and the upgrade will not be attempted again until a newer version of the package is made available.


Question: Can agents be instructed to downgrade?

NMP Constraint Collisions

Given the flexibility of policy within OH, it is possible that NMPs with conflicting actions could be compatible with the same node(s). There are a myriad of circumstances in which this could occur. Within this design, the conflict could occur in relation to the atLeastVersion setting of the agent update policy. Features (in the future) which add additional policy to a NMP could create additional conflicting situations. Two or more NMPs cannot be in conflict if they are enacting different kinds of policy. This is important for the future when additional management policy kinds are added.

The agent is responsible for interpreting each NMP and acting on it accordingly, therefore an agent is only concerned with the NMPs that apply to it. If there are any conflicts, the agent will not enact the conflicting policy action. For example, NMP1 enables policy intent1 at time t1, NMP2 enables policy intent2 at time t2, and NMP3 enables policy intent1 at time t3 with a policy setting that conflicts with NMP1. For the purposes of conflict resolution, the only potential conflicts are between NMP1 and NMP3. Agents will implement the policy intents in NMP1 and 2. NMP3 is ignored because NMP3 conflicts over intent1 and was added after NMP1 was already enacted. 

It is possible that conflicting actions could be introduced in close temporal proximity. This could result in actions that are immediately reversed. The agent should be prepared to handle these cases. For example, NMP1 and NMP2 are introduced/enabled a few seconds apart and both adjust the same policy setting (e.g. autoUpgrade). Some of the nodes that match NMP1 and NMP2 will be notified of both NMP1 and NMP2 at the same time. These nodes will enact the actions in NMP2, completely ignoring NMP1. Other nodes might only be notified of NMP1, and will enact the policy in NMP1. Very soon after that, these same nodes are notified of NMP2, and recognizing that NMP2 is newer than NMP1, will enact the action in NMP2, reversing what was done when NMP1 was enacted.

Here are the changes that might result in a management policy collision:

  1. new/changed node policy - this can occur at node registration (which is essentially creation of a node policy) or when there are changes to an existing node policy
  2. enablement of an existing NMP - an NMP can be published in the disabled state
  3. new/changed NMP that is also enabled



In the absence of any enabled NMPs, an agent never checks for an update.

...