Introduction

Sometimes, a BaseOs upgrade may fail because of transient error conditions. In general EVE provides for eventual consistency, where it will retry operations after a failure (in case the failure condition has gone away). However, a BaseOS update with associated reboot is quite disruptive and going in a loop repeating this even more so. Hence it makes sense to require some user intervention before a failed BaseOs Upgrade is retried.

Currently, Device Config API doesn't have a retry mechanism. Controller has to first remove the baseos configuration, wait for the device to sync-up, then reconfigure the BaseOs again. This is not very userfriendly.

This document describes the support to retry a failed BaseOs upgrade.

Proposed Solution

Introduce a new command "baseos_upgrade_retry" for devices.

EVE API

diff --git a/api/proto/config/devconfig.proto b/api/proto/config/devconfig.proto

index c58376ab7..7dc9c59e2 100644

--- a/api/proto/config/devconfig.proto

+++ b/api/proto/config/devconfig.proto

@@ -83,6 +83,19 @@ message EdgeDevConfig {

// if we set new epoch, EVE sends all info messages to controller

// it captures when a new controller takes over and needs all the info be resent

int64 controller_epoch = 25;

+

+ // Retry the BaseOs upgrade for the configured image ONLY if the image

+ // upgrade has failed. If the currently configured image is in FAILED state in the other

+ // partition, retry the image upgrade. ELSE - Do nothing. Just update the

+ // baseos_upgrade_retry counter in Info message.

+ DeviceOpsCmd baseos_upgrade_retry = 26;

}

diff --git a/api/proto/info/info.proto b/api/proto/info/info.proto

index 7bead8777..230452ac1 100644

--- a/api/proto/info/info.proto

+++ b/api/proto/info/info.proto

@@ -344,6 +344,13 @@ message ZInfoDevice {

// Are we in the process of rebooting EVE?

bool reboot_inprogress = 41;

+ // BaseOsUpgrade Retry Counter. This must be updated only when:

+ // 1) if the configured BaseOs partition is set to UPDATED, mirror

+ // the current value of baseOs_upgrade_retry.counter

+ // 2) At the start if a BaseOs upgrade (either from a partition in error state

+ // or from UPDATED state of another version), copy over current d

+ // deviceConfig.baseOs_upgrade_retry_counter

+ uint32 baseOs_upgrade_retry_counter = 42;

}

Note: Even in case of No-Op for upgrade_retry, the device sends an Info message to the controller to update its baseos_upgrade_retry_counter.

EVE Support

If the currently configured image is in FAILED state in the other partition, retry the image upgrade. ( Intended Use Case )
ELSE Do nothing. Just update the baseos_upgrade_retry counter in the Info message and send an Info message to the Controller.

Space shortcuts

Page tree

Introduction

Proposed Solution

EVE API

EVE Support

Space shortcuts

Page tree

Support For Retrying BaseOs Upgrade

Introduction

Proposed Solution

EVE API

EVE Support