Why Backup won't Work for Stateful containers?

September 13, 2021

There is a myth that containers are stateless. However, this is not true since a container can contain a state. The state is temporal, unique, and resides on the host machine.

Containers consist of read-only layers. Every layer holds the last changes made on the config file with configuration and installation commands.

After executing the commands from the config file, the system changes are stored in a disk layer.

A temporal writable layer is usually created on top of the previous disk image when the container is running. The writable layer is unique to every container on a specific host and can survive when the container restarts.

This writable layer does not store any stateful data, such as persistent application data. Its sole purpose is to temporarily store the data of the running application in the container.

Table of contents

  1. Introduction
  2. Prerequisites
  3. Where containers stores state
  4. Backing up and restoring a Docker container
  5. Reasons why backup solutions fail
  6. Possible alternative
  7. Conclusion

Prerequisites

To follow along, you need to have:

Where containers store state

As mentioned earlier, containers do not save the state in the container image.

Alternatively, they store the state in persistent external storage like blocks, objects, or file storage services and systems.

In most cases, enterprises use a storage array that is integrated into the Kubernetes environment using Persistent Volume Claims (PVCs).

Backing up and restoring a Docker container

Docker assists developers in automating the development and deployment process of an application.

Developers can also build a packaged environment that runs applications, which makes them more portable and lightweight.

Docker containers also assist in maintaining applications’ versions. The software that run on Docker are platform-independent.

We will assume we have a container executing in a local environment. We can take a snapshot or backup of the specified container to undo any changes or even run it in the previous timestamp in case of an emergency.

This section will cover how we can backup and restore Docker containers using inbuilt Docker commands.

Backing up a Docker container

We can back up a Docker container using the following command.

We can list all running containers and get their ids, as shown below:

$ docker ps −a

Then we will copy the container’s ID that we want to back up. To take a snapshot of the Docker container, we will execute the below command:

$ docker commit −p (ID of the CONTAINER) (BACKUP_NAME)

For instance, we can pull a WordPress Docker image using the below commands:

$ docker pull wordpress

The output will be:

Wordpress image pull

We can then list all our containers using the following command:

$ docker ps -a

The output will be:

Docker list all

We can then take the snapshot of our container image by running the below command:

$ docker commit -p 1571dbfe094f wordpress-backup

And the output will be:

$ docker commit -p 1571dbfe094f wordpress-backup
sha256:abe166f1f1ff6c59c978ab898dbc6f843c10c4a8415d7a2b012660420d205f8a

We store the container image in form of a tar file in the local storage, as illustrated below:

$ docker save -output wordpress-backup.tar wordpress-backup

Restoring a Docker container

After we have created a backup, we can restore the Docker container, as demonstrated below:

$ docker load -i wordpress-backup.tar

The output will be:

$ docker load -i wordpress-backup.tar
Loaded image: wordpress-backup:latest

We can check whether the image was restored successfully by executing the following command:

$ docker images

We can then pull back the Docker image, as highlighted below:

$ docker pull wordpress-backup:latest

After restoring the Docker image, we can use the below command to execute a restored instance of the Docker container:

$ docker run -ti wordpress-backup:latest

Reasons why backup solutions fail

Storage-based snapshots are not enough for data mobility and backup.

They are periodic, require scheduling, and do not deliver the granularity that DevOps requires today. In a fast-paced technological world, where containers regularly start and terminate as per the user’s preference, a backup snapshot is not enough.

In addition, performing container backup at the storage layer means the organization will be prone to vendor lock-ins. As the business grows, they will fail to support the agility needed in the modern world.

Also, containers are not perfect for backing up data due to the following reasons:

  • Containers are highly scalable, with numerous instances, each performing a tiny part of the same task. It means that there is no single container that can be the master in an application. Many containers may access similar persistent data each time, unlike virtual machines (VMs), where only one VM accesses data.
  • Containers are temporal and cannot be up each time backups need to be taken. This is different from virtual mchines which mostly keep running the VM machine software.

Architectural differences that come with containers demonstrate why backup solutions may fail.

A different approach to performing continuous backups of stateful application data is required.

Possible alternative

A better solution should not rely on the container for backup and replication. Also, it cannot rely on one storage solution.

Instead, it can be installed on the Kubernetes cluster as DaemonSet. The cluster offers data protection with an RPO from 5 or 10 seconds.

These DaemonSets integrate into the persistent storage to gain access to persistent data independent of any container.

Unifying all cluster nodes and the cluster API allows Zerto for Kubernetes to work more efficiently. It channels persistent data without container duplication or performance issues.

Zerto can also be integrated with clusters, making persistent data replication easier. It is storage agnostic and supports CSI-compatible block storage. This makes it ideal for data migration and mobility solutions.

Organizations should ensure they go for a solution that stores stateful data and captures the Kubernetes state for each application.

It will also enhance data protection for components like ConfigMaps and services. These components can rebuild the application when performing data recovery on the same or another cluster.

Conclusion

Many users and developers do not backup their container. Most argue that containers are stateless and cannot store data; thus, they do not require backup and recovery operations.

The container infrastructure and Kubernetes offer improved availability. Containers can be started and stopped as needed.

However, if anything happens, the entire cluster and container nodes with associated data are destroyed or lost. This means that Kubernetes, Docker, and other applications may need to be backed up.

Happy learning!


Peer Review Contributions by: Briana Nzivu


About the author

Verah Ombui

Verah Ombui is an undergraduate Computer Science student at Jomo Kenyatta University Of Agriculture and Technology. Her interests are web development, cloud computing, and data science. Verah is a technology enthusiast.

This article was contributed by a student member of Section's Engineering Education Program. Please report any errors or innaccuracies to enged@section.io.