The need to protect Kubernetes in cloud infrastructure.

With the mass adoption of container technologies, none is more significant than Kubernetes, the de facto standard cluster and workload management system for public cloud and on-premises environments.

The long-running Flexera 2022 State of the Cloud survey tracking cloud adoption and usage found that almost three-quarters of enterprises are currently using or planning to use Kubernetes, with comparable adoption rates across on-premises and managed cloud Kubernetes services.

Kubernetes services from the public cloud providers continue to gain traction with customers, and their usage has now surpassed the leading on-premises tools. Enterprises are using or planning to use cloud provider-specific tools this year from Amazon Web Services followed closely by Azure Kubernetes Service (AKS), with Google Kubernetes Engine (GKE) gaining, as well. Kubernetes (the open source distribution) and Docker remain near the top of the list, but their usage continues to diminish, especially amongst larger enterprises in favor of the cloud provider services, although hybrid cloud and on-premises environments are often used.

Kubernetes cloud services leave data and security exposures

Kubernetes cloud services are popular since they mitigate, but do not eliminate, the difficulties of operating a Kubernetes environment. Early Kubernetes adopters often confuse its inherent high-availability features and programmatic configuration interfaces as a suitable substitute for traditional backup and disaster recovery (DR) capabilities. Their faulty reasoning conflates the ability to automatically restart and replace cluster nodes and automate cluster configuration and deployment with the ability to reliably restore containerized applications and their data. While these features are invaluable for the scale-out stateless web applications for which Kubernetes was designed, they do not cover the needs of stateful enterprise applications.

The strengths of Kubernetes — self-healing nodes, automated workload deployment and rollback, auto-scaling, and load balancing — reflect its initial design parameters for stateless web services. In contrast, its weaknesses — lack of inherent data backup and DR capabilities and a multi-layer operational model for security and configuration management — require supplementary tools to make Kubernetes a robust enterprise platform.

Why data protection for Kubernetes is needed

Data protection has not always been a concern for containers as early adopters were usually stateless web applications or lift-and-shift applications, with storage outside the container environment on systems that are already running backup software. However, Kubernetes applications using persistent storage are becoming the norm as enterprises deploy production workloads, not just for application development and testing.

There are several reasons why data protection — which includes backups and storage snapshots — should be integral to the production Kubernetes application environment. These reasons or use cases include:

• Human or programmatic error that can accidentally overwrite application or configuration files.

• Security breaches and ransomware that maliciously deletes or encrypts data.

• Disasters causing large-scale outages to a facility that make it impossible to reconstitute a Kubernetes application at another location without offsite copies of the image, configuration, and application files.

• Application and environment migrations that require the same access to archived application and configuration data as a DR recovery.

• Regulatory compliance often requires the periodic and immutable capture of application data. These data backups should support retention locks to make them immutable to support retention requirements.

Why a cloud-based data protection service

Having cloud-based data protection and disaster recovery service is critical since it aligns with the growing number of cloud-based managed Kubernetes services like Amazon Elastic Kubernetes Service (EKS), AKS, and GKE. As pointed out above, between 60 and 70 percent of enterprises use or plan to use one or more of the cloud container services, for the same reason that SaaS and other managed cloud services are increasingly popular.

Since Kubernetes does not include native data protection features, organizations migrating virtualized workloads or creating new, microservices-based stateful applications must incorporate data protection and security into their Kubernetes architecture. An effective data protection service should have several properties:

• Be infrastructure and service agnostic and able to work with both on-premises software or cloud-managed services.

• Support the latest Kubernetes distributions and the Kubernetes container storage interface (CSI).

• Expose APIs that enable task automation for continuous integration and continuous delivery or CI/CD and integrate with existing infrastructure management systems.

• Enable data migration across different Kubernetes cloud and on-premises environments.

• Be proactive in detecting and alerting of suspicious activity and potential data compromise.

Why a purpose-built cloud data protection service?

The data protection tools provided by the cloud services tools do not capture all of an application’s state or information from dependent resources like databases, and they do not work across on-premises and their competitors’ environments. The open source backup tools like Velero are not designed for multi-cloud operations and require a significant amount of manual configuration to accommodate multi-cloud clusters and data restorations. Although tools like Velero are an adequate solution for one cluster, once a Kubernetes environment spreads to multiple clusters, it is almost impossible to manage. Add in multiple cloud platforms and the complexity becomes untenable.

The existing Kubernetes services and management software and services treat data protection as a separate problem despite it being a necessary part of a cloud-native enterprise architecture. Further, enterprise Kubernetes applications may have data and code-as-infrastructure dependencies that are external to the Kubernetes environment. And because of the growing use of hybrid and multi-cloud environments, a purpose-built data protection product is needed that is cloud- and Kubernetes management platform-agnostic supports multi-cloud and multi-region data storage, supports CI/CD methodologies, and enables data migration across environments.

Unlocking productivity and efficiency gains with data management

Russ Kennedy • 04th July 2023

Enterprise data has been closely linked with hardware for numerous years, but an exciting transformation is underway as the era of the hardware businesses is gone. With advanced data services available through the cloud, organisations can forego investing in hardware and abandon infrastructure management in favour of data management.