The EVE lifecycle management is based on eventual consistency. This means that if an state change to some object (device, app instance, etc) can not be performed then EVE will report the current operational state (which could indicating that something is in progress like a download) and also an error if there is some failure (such as a download failing, or the memory or adapter is currently used by some other app instance.)
However, in the EVE info API those are reported as errors; there is no indication that EVE might retry, and if so when, or whether it has already retried N times and it still fails.
This is a proposal for how to extend EVE and the EVE API to convey this information to the controller.
Some error string (could be from DNS, HTTP, TLS, etc)
Will retry in N minutes (a timer which we can set with configItem)
Have retried M times already
Some error string about needed vs. available memory
Will retry when: some other app instance is halted and frees up memory
(Retry count might be less important here; each time an app instance is halted EVE will check if there is sufficient memory for this app instance but that isn’t really a “retry” but a “check again” operation)
Some error string about app UUID XYZ using ethN (or USB)
Will retry when: app UUID XYZ is halted
(As above a retry count might not be useful.)
If the device model indicates that eth3 should exist we report an error. But since we don’t support hot plug of hardware we are unlikely to ever retry. Hence such errors should not be retriable.
Also incorrect information (bad IP address string in some API which doesn’t parse) would not be marked as retriable.
In the internal controller API we already have a severity field for the errors. This is never filled in from the EVE API since the EVE API does not have such a field.
One approach is to introduce a severity field in the EVE API (with values like ERROR, WARN, NOTICE) and use the NOTICE setting for things which will be retried. The EVE API would also have a retry condition (in X minutes, when resource Y is freed up) and perhaps also a retry count.
With such an approach we would need to add a retry-condition and retry-count to the internal controller API.
Furthermore, if retry-count reaches some large value (10?) or if the time since the original error exceeds some time (1 hour?), then maybe EVE or the controller should increase the severity from NOTICE to WARNING and later to ERROR. But if we do that we still need to report the retry_condition unless EVE at some point in time gives up. Current suggestion is to have EVE do this raising of the severity.
Look at whether we want a retry_count and other aspects from e.g, AWS Device Shadow service documents - AWS IoT Core in the EVE API. That is more related to EVE informing controller about pending changes and operationsthan the error/info reporting
To incorporate this change, the info.proto’s errorinfo has been updated as follows:
message ErrorInfo {
string description = 1; // error description
google.protobuf.Timestamp timestamp = 2; // Timestamp at which error had occurred
Severity severity = 3; // Severity of the error
repeated DeviceEntity entities = 4; // list of objects referenced by the description
string retry_condition = 5; // condition to retry
}
Where Severity is a enum:
enum Severity {
SEVERITY_UNSPECIFIED = 0; // severity unspecified
SEVERITY_NOTICE = 1; // severity notice
SEVERITY_WARNING = 2; // severity warning
SEVERITY_ERROR = 3; // severity error
}
and DeviceEntity is object of entityType and entityId:
message DeviceEntity {
Entity entity = 1; // entity type
string entity_id = 2; // entity uuid
}
Where Entity can be any of the following,
enum Entity {
// Invalid Device Entity
ENTITY_UNSPECIFIED = 0;
// Base OS entity
ENTITY_BASE_OS = 1;
// System Adapter Entity
ENTITY_SYSTEM_ADAPTER = 2;
// Vault Entity
ENTITY_VAULT = 3;
// Attestation Entity
ENTITY_ATTESTATION = 4;
// App Instance Entity
ENTITY_APP_INSTANCE = 5;
// Port Entity
ENTITY_PORT = 6;
// Network Entity
ENTITY_NETWORK = 7;
// Network Instance Entity
ENTITY_NETWORK_INSTANCE = 8;
// ContentTree Entity
ENTITY_CONTENT_TREE = 9;
// Blob Entity
ENTITY_CONTENT_BLOB = 10;
// VOLUME Entity
ENTITY_VOLUME = 11;
}
Please note that entities like ENTITY_SYSTEM_ADAPTER, ENTITY_VAULT, ENTITY_ATTESTATION and ENTITY_PORT have their own entity ID even though, unlike the others, they do not have UUIDs.