You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

Sometimes, a BaseOs upgrade may fail because of transient error conditions. In general EVE provides for eventual consistency, where it will retry operations after a failure (in case the failure condition has gone away). However, a BaseOS update with associated reboot is quite disruptive and going in a loop repeating this even more so. Hence it makes sense to require some user intervention before a failed BaseOs Upgrade is retried.

Currently, Device Config API doesn't have a retry mechanism. Controller has to first remove the baseos configuration, wait for the device to sync-up, then reconfigure the BaseOs again. This is not very userfriendly.

This document describes the support to retry a failed BaseOs upgrade.

Proposed Solution

Introduce a new command "baseos_upgrade_retry" for devices.

EVE API

diff --git a/api/proto/config/devconfig.proto b/api/proto/config/devconfig.proto

index c58376ab7..7dc9c59e2 100644

--- a/api/proto/config/devconfig.proto

+++ b/api/proto/config/devconfig.proto

@@ -83,6 +83,19 @@ message EdgeDevConfig {

   // if we set new epoch, EVE sends all info messages to controller

   // it captures when a new controller takes over and needs all the info be resent

   int64 controller_epoch = 25;

+

+  // Retry the BaseOs upgrade for the configured image ONLY if the image

+  // upgrade has failed. If the currently configured image is in FAILED state in the other

+  // partition, retry the image upgrade. ELSE - Do nothing. Just update the

+  // baseos_upgrade_retry counter in Info message.

+  DeviceOpsCmd baseos_upgrade_retry = 26;

}

 

diff --git a/api/proto/info/info.proto b/api/proto/info/info.proto

index 7bead8777..230452ac1 100644

--- a/api/proto/info/info.proto

+++ b/api/proto/info/info.proto

@@ -344,6 +344,13 @@ message ZInfoDevice {

 

   // Are we in the process of rebooting EVE?

   bool reboot_inprogress = 41;

+  // BaseOsUpgrade Retry Counter. This must be updated only when:

+  // 1) if the configured BaseOs partition is set to UPDATED, mirror

+  // the current value of baseOs_upgrade_retry.counter

+  // 2) At the start if a BaseOs upgrade (either from a partition in error state

+  // or from UPDATED state of another version), copy over current d

+  // deviceConfig.baseOs_upgrade_retry_counter

+  uint32 baseOs_upgrade_retry_counter = 42;

 }

 

Note: Even in case of No-Op for upgrade_retry, the device sends an Info message to the controller to update its baseos_upgrade_retry_counter.

EVE Support

  1. If the currently configured image is in FAILED state in the other partition, retry the image upgrade. ( Intended Use Case )
  2. ELSE Do nothing. Just update the baseos_upgrade_retry counter in the Info message and send an Info message to the Controller.
  • No labels