• Requirement to never have to visit device due to software bugs and failures
    • Including due to power failure during flashing of base image
    • Either fall back to old image or be able to do another update
  • Dual partition boot (IMGA/IMGB)
    • grub patches for gpt priority boot
    • Additional partitions for identity (CONFIG) and app instances (PERSIST)
  • Policies and timers for fallback vs. commit to new
    • “Test” that new base image can connect to EVC etc
    • Deployed app instances are not tested as part of this
  • Using hardware watchdog plus Linux watchdog to detect hangs and core dumps and reboot
  • Been using this approach in dev for 12 months without bricking a device
  • No labels