CHAPTER 1


Cloud-Native Applications: Building on Containers and Microservices

Accessing, using, and interacting with cloud-based applications, services, and resources puts organizations in a complex and interesting situation.

Behind the scenes, cloud platforms and services employ a veritable ecosystem to support ready access to virtualized applications, services, networks, platforms, and even entire infrastructures. This is orchestrated via software and configuration data running in (and across) one or more provider’s data centers.

If used correctly, numerous specific technologies come into play to give organizations flexibility and interoperability when bringing cloud-based services, storage, and networking into play. There are several key elements that anchor and inform this deliberate implementation, deployment, and management strategy.

Accessing, using, and interacting with cloud-based applications, services, and resources puts organizations in a complex and interesting situation.
Numerous specific technologies come into play to give organizations flexibility and interoperability when bringing cloud-based services, storage, and networking into play.

Containers

These lightweight runtime constructs function as discrete and separate process and resource handlers within which one or more applications or services can run. By design, containers include only the resources necessary for those applications and services. Thus, more containers can run on any given server or cluster than traditional virtual machines, because containers don’t include a full operating system or instantiate services, protocols, libraries, and functions they won’t use, unlike VMs.

Kubernetes

Kubernetes is the leading platform for container orchestration. While there are other container orchestration products, Kubernetes should be seen as the de facto standard. It’s open source, portable, and extensible, and manages containerized workloads and services with a large and growing ecosystem of tools.

Microservices

This software development method focuses on building single-function modules, each with its own well-defined interfaces and operations. These modules are then assembled and combined to build applications or services. Small and simple by design, microservices require less time and work to implement, test, maintain, and adapt. And because any microservice can be updated, tested, and deployed independently of the others, ongoing development is simpler and faster. Modern microservices are containerized, so they can run on any OS or cloud platform that supports that container type. This is a profound benefit, and explains how microservices workloads and their data can migrate among data centers, private, and public clouds with relative ease and dispatch.

DevOps

This term represents the conflation of development and operations under a single overarching development methodology. DevOps seeks to shorten the development lifecycle while also delivering features, fixes, updates, and enhancements frequently to better meet business or organizational objectives. DevOps practitioners often refer to “CI/CD,” which stands for “continuous integration/continuous delivery (or deployment, in some cases)”. Continuous integration is the process of making small updates to software and committing the changes to a centralized repository, sometimes as often as daily, to improve the product bit by bit over time. Continuous delivery is the next step in that sequence. It refers to automating application delivery into the various infrastructure pipelines for eventual release into the wild, wherever that wild is. See Figure 1.

DevOps seeks to shorten the development lifecycle while also delivering features, fixes, updates, and enhancements frequently to better meet business or organizational objectives.

Figure 1: A typical CI/CD process pipeline

Continuous integration is the process of making small updates to software and committing the changes to a centralized repository to improve the product bit by bit over time.

Cloud-Native Architectures and Technologies

Cloud-native approaches to development use containers to define and build microservices-based architectures. Because such architectures consist of re-usable modules and components that can be assembled (and later adjusted or recomposed) to deliver applications and services to end users, they’re not only immediately useful, but also flexible and adaptable in the face of change.

Organizations follow DevOps principles to guide them in designing, building, maintaining, and delivering cloud-native, containerized, and microservices-based applications and services. This approach enables organizations to meet current business objectives through streamlined, lean product development and delivery processes.

It also helps them adapt quickly to changes as they occur, to accommodate market changes, organizational change, or new tools and technologies to improve productivity and profitability.

The Cloud Native Computing Foundation defines cloud-native in this way: Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal muss and fuss.

Stateless vs. Stateful Applications

In general, the distinction between stateless and stateful refers to persistence of data or memory between transactions or instantiations. When it comes to containerized applications, for example, stateless applications do not store data, whereas stateful applications include storage access so that they can acquire prior state and data, if any, when they start up, and save existing state and data when they pause or stop.

Maintaining state allows applications to work from information, knowledge, and data acquired or generated during prior activity. Stateless applications use transitory data, where state must typically be stored in a separate backend service such as a database.

For stateless applications, storage is ephemeral. That means its contents disappear if the container stops running, or gets restarted. When they first adopt containers, organizations tend to use stateless applications because they are easily implemented and adapted in cloud-native architectures. Because they do not typically embody or incorporate more traditional monolithic code, stateless containerized apps built on microservices are also easier to move around and scale.

Stateful applications typically involve transactions, where the server processes requests based on data they provide but also uses information stored from previous requests. Thus, the server must be able to store and retain state information from the past, as well as respond to current requests on demand (see Figure 2).

Orchestration for stateful applications requires identifying the best location to run the container (or container collection) involved in its execution to best handle the application’s needs for storage, networking, and to maintain a consistent and workable I/O path. In some cases, orchestration for stateful applications might also ensure high availability, by moving containers or remounting storage volumes without making (or needing) any changes to application code.

Maintaining state allows applications to work from information, knowledge, and data acquired or generated during prior activity.

Figure 2: A comparison of a stateful vs. a stateless application

Building cloud-native applications generally involves dense conglomerations of microservices and their data.

Making Stateful Applications Work

Stateful applications generally work with an underlying storage layer through a set of application programming interfaces, or APIs. In fact, storage attributes required in a stateful containerized app pose important design and implementation decisions for organization who build and use them.

In a cloud-based environment, storage needs the same attributes that apply to the rest of that environment. As a container moves around a cluster or through the cloud, it must maintain its connection to its storage volume(s). Thus, a software layer between the application and its container, and the underlying storage layer, can automatically manage such connections, and change locations as needed (within or across clusters, availability zones, and even multiple clouds).

Building cloud-native applications generally involves dense conglomerations of microservices and their data. Making this work depends on a flexible and elastic software layer to mediate between those microservices and the underlying native storage (either on-premises or in the cloud).

Behind the scenes, cloud-native containerized environments must provide mechanisms to create persistent container storage volumes. This kind of capability involves integrating a persistent storage layer with container orchestration that uses a dynamic storage platform.

Such a platform should also comply with data security, protection, and resilience requirement for application deployment. This creates what might be called a software-defined storage platform, which microservices and their parent containers can access abstractly, while the orchestrator manages the details and the connections in the background. This also lets developers, IT, and even users (with self-service portal access) provision storage on their own without involving a storage administrator.

Kubernetes Storage Initiatives: CSI and COSI

Persistent volumes (PV) is the construct through which Kubernetes exposes permanent storage to applications and services (and their users). PV resources are available cluster-wide, and are often backed with attached external storage. In fact, Kubernetes uses control plane interfaces through orchestration to link with external storage, so storage vendors must provide volume plug-ins that work with the Kubernetes codebase (called in-tree volume plug-ins).

Such plug-ins can pose issues for storage vendors and Kubernetes developers alike. From the vendor side, it means their code has to be compiled, packaged and shipped within a Kubernetes distribution. Not only does this expose their code, it also ties it to the Kubernetes release cycle. In turn this can pose testing issues for would-be users. From the users’ perspective, it also limits their storage options to those plug-ins included with Kubernetes code base.

In-tree volume plug-ins can pose issues for storage vendors and Kubernetes developers alike.

Software-defined storage distinguishes between the storage hardware where the bits reside and the storage controller software, which manages access to storage addresses, reading and writing bits (or blocks, as is typical on solid-state devices), and integrity checks (and associated bad block lists, over provisioning, and so forth). Software-defined storage lets the storage system define and expose various types of storage to applications, such as object, block, and filesystem storage. It also manages the details behind the scenes to provide a consistent logical view of storage for application use, while handling data about where the data resides, in what format, what kind of storage units it uses, and so on.

CSI is a standard through which arbitrary block and file storage systems may be accessed within containerized workloads running on Kubernetes.

To address these issues, the Kubernetes community introduce its Container Storage Interface (CSI) in 2017. CSI is a standard through which arbitrary block and file storage systems may be accessed within containerized workloads running on Kubernetes (or other orchestrator using CSI).

CSI makes the Kubernetes storage layer open and extensible. Third-party storage providers or vendors can use it to create and share volume plug-ins to expose their storage to Kubernetes. They no longer need to include those plug-ins with the Kubernetes code base, either.

CSI does a great job with block and file storage, but as the COSI GitHub status page asserts, “primitives for file/block storage do not extend well to object storage.” (See the reference for a list of reasons why.) Thus, COSI—the Container Object Storage Interface—defines a set of abstractions to provision and manage object storage, defining a common object storage layer across multiple vendors.

The design is modeled on CSI, and has garnered support from makers of numerous open source and commercial storage systems. COSI defines a set of resources to work with object “buckets” (which are to objects as volumes are to blocks and files), to provision and manage object buckets across the data and application lifecycles. Using COSI, Kubernetes can manage object stores in a standard, native way. Storage vendors can expose their object stores via COSI, independent of the Kubernetes codebase. It’s a win-win situation.

For developers, benefits come from self-service.

Storage for Containers vs. Storage in Containers

Storage for containers exposes storage to a container or group of containers through an external mount point over a network. Sometimes known as container-ready storage, it can work with systems based on software-defined storage (SDS), network-attached storage (NAS), or storage-area networks (SANs). Container-ready storage is typically accessed via a vendor-defined or standard API.

It’s important to understand that container-ready storage may not be an ideal solution for containers and their constituent apps and services, label notwithstanding. That’s because relatively few such storage platforms have APIs that Kubernetes can use for dynamic provisioning and storage delivery.

Storage inside containers, deployed alongside containerized applications or services, works to the benefit of developers and IT admins alike. This approach containerizes storage services so they can be managed using Kubernetes orchestration and control. This approach leaves admins with less housekeeping to do (automation will handle that for them both quickly and accurately).

Because admins can run the storage platform, applications, and services on a uniform infrastructure, there’s less learning curve involved (the same tools, commands, and automation applies across the board, rather than having to learn multiple sets of same). Often there’s also less expense involved, because it’s cheaper, easier, and less time-consuming to procure more of one kind of infrastructure than less of two or more kinds.

For developers, benefits come from self-service: Rather than working through storage admins to provision their applications and services with storage, they can provision friendly and elastic containerized storage services themselves. Additionally, its APIs are usually well-defined and understood, and easy to work with, test, and deploy.

Storage in containers involves a growing set of storage classes that range across many use cases. These include boot volumes, log files (circular and linear), transactional databases, and application data using traditional file and emerging object APIs, along with backup datasets, images and snapshots plus archival holdings.

Since object data alone covers an enormous number of data types and serves analytics applications that include Elastic, Cloudera, Spark, Splunk, Vertica, Weka and more, containerized apps need storage access more than ever. CSI covers file and block storage access and management, while COSI handles object storage. But calls for storage access and services in containers is never-ending and nearly unlimited in size, scope and variety.

It’s important to understand that container-ready storage may not be an ideal solution for containers and their constituent apps and services.