State-preserving container orchestration in failover scenarios
Loading...
Date
2023-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Containers have been widely adopted for deployment of high availability applications
and services. This adoption is in part due to the native support
of fault tolerance mechanisms in container orchestration frameworks such as
Kubernetes. While Kubernetes provides service replication as a fault tolerance
mechanism for stateless applications, service replication does not satisfy
requirements for stateful applications. Currently this shortcoming is addressed
by data replication in databases. This requires a tight coupling and modification
of the stateful application to support high availability. Thus, this thesis
proposes a new Checkpoint/Restore (C/R) Kubernetes operator to achieve
fault tolerance for stateful applications without any modification of the application.
The operator takes a checkpoint in a configurable interval. In case
of a fault a new application container is created automatically from the most
recent checkpoint. We compare the proposed approach with a more conventional
approach in which we pull and restore the application state from the
application through an API. We measure the overhead of both methods, the
service interruption and the recovery time in case of faults. We find the C/R
Operator has similar performance in recovery time as the traditional approach,
but does not need any application modification. The results signify C/R as a
promising technology for a fault tolerance mechanism for stateful applications.