Common Container Performance Issues and How to Fix Them
Containers enable a powerful DevOps approach to development. They allow developers to assemble software into easily deployable containers that perform consistently across development and production environments.
Small and lightweight, containers use fewer resources than virtual hosts. They start, stop and migrate across servers quickly, and help break down monolithic applications into smaller components in a microservices architecture.
However, not everything about containers makes life easier. Containers can introduce new challenges, such as subtle performance issues. In this article, I review some container performance challenges that I’ve faced, and explain how I worked to resolve or avoid them.
Viewing Containers as “Black Boxes”
Containers take black box development and testing to the extreme, and tend to be overlooked in terms of code reviews, internal component monitoring, or even higher-level design reviews. Each step provides a chance to ensure performance needs (i.e. SLAs) are met, to identify potential performance issues and bottlenecks as soon as possible, and improve performance before containers are deployed.
The solution is mostly cultural and process-related. Even though a container conveniently encapsulates some area of production functionality, be sure to subject it to design and code reviews along the way.
Overlooking Stress Testing
Containers support easy, quick, and dynamic spin-up and server migration, elastically scaling to meet user and system needs on demand, often in cloud environments that further support this. This can lead developers and even QA managers to believe stress testing isn’t needed—the container management tool will simply spin up additional instances. A corollary is a lack of testing in varying environments, either across providers or across multiple offerings from a single provider or datacenter.
The solution, again, is mostly cultural and process-related: Use tools to simulate large numbers of users in both random and set scenarios with patterns of usage. Don’t forget to look beyond averages, as they cover up outliers. There are tools, such as Blazemeter (see Figure 1), which do an excellent job of charting averages, and creating histograms mapped to show load over time. These tools become even more valuable when they are integrated with Application Performance Management (APM) tools that provide performance load test results and enable detailed inspection and analysis.
Additionally, performance tests should be integrated with continuous builds. It’s also useful to run random combinations of performance tests together, not just alone. The interference and resource contention generated can help simulate random simultaneous events that occur in the real world.
Container Size Issues
Container size can adversely affect migration and spin-up of container instances. Accurate testing can help identify which containers need to be broken down further to avoid penalty. Additionally, tools and techniques exist to help make the container instance migration process more efficient, such as memory offloading and paging techniques.
Lack of Standardized Container Development
Various container and microservice frameworks are affected differently in terms of performance. Container host environments, physical RAM capacity, network infrastructure, the platform OS, and cloud architecture are all variables that affect framework performance. Where possible, use consistent container frameworks and tools. Log aggregation, for example, is important when components across separate containers each log messages that are part of an application transaction. Standardizing a single log framework with container support will make this more reliable and seamless.
Dealing with Noisy Neighbors
This issue is primarily related to Container-as-a-Service deployments in the public cloud, where underlying host servers and even network infrastructure is shared in a multi-tenant environment. Although there may be little you can do to protect yourself from this, for particularly sensitive container-based services or applications, the solution may be to avoid a specific cloud provider, or negotiate an SLA to limit or avoid the issue.
Monitoring and testing during development with tools that simulate host interference and loading will alert you to the impact of this potential issue. Similarly, deploying container monitoring tools to capture noisy neighbor interference will ensure your provider is living up to its promises.
Host Resource Constraints
Going deeper into the monitoring of container host deployments, you need to be aware of activity and load on the underlying host servers, regardless of deployment. I’ve witnessed the negative impact when the underlying container host becomes resource-constrained. While a container-based approach can be lighter-weight than virtualization alone, running multitudes of containers is usually more taxing on your infrastructure than running a monolithic application.
For instance, it’s usually convenient to break out individual processes and services as separate containers. Deploying these on a limited set of physical servers, along with associated load balancing agents, monitoring services, and failover services, results in greater host resource usage and demand.
Solutions include using single-vendor solutions for the clustering and monitoring of containers, and cross-container services such as logging. Also required is a monitoring implementation that takes host and system-level statistics into account, such as host CPU usage, I/O metrics, available memory, network bandwidth, and even OS internals. Don’t wait until your application is in production and then rely on monitoring—It’s too late at that point. Instead, use tools such as JMeter and Selenium (or others specific to your platform) to accurately simulate real-world load.
Testing Container Interactions
Going deeper, many performance issues I’ve encountered involve not taking a holistic view of performance, and instead looking at performance within single containers. Testing strategies should begin with container-based resource and load testing (and production monitoring), and extend to a user’s view of the system. For instance, issues often arise when containers and systems interact, and only end-to-end testing will uncover this.
As an example, an application I worked on used many containers, but two of them interacted in a way that impacted performance—something that was difficult to uncover without proper testing. In this case, both containers dealt with filesystem-based resources, and although they each worked with different files, and in different ways, the effects one had on the OS kernel impacted the other.
This demonstrates that isolation testing of containers is not enough. Being able to run root-cause analysis on application-wide performance issues is critical. Testing interactions to know what’s happening at all layers of your deployment is also needed. Additionally, when applying root-cause analysis, Docker image creation often hinders knowing the source of the issue. To solve this, avoid creating images from running containers, or the use of the latest dependencies. Use specific container layer versions and generate images from a known container source.
Container-Related Network Demands
The sheer scale of containers in large deployments puts stress on the network. Container-based applications tend to be more distributed, relying more on network performance in a virtualized or SDN environment. Ignoring network performance (and focusing only on application and database performance, for example) can quickly become an issue.
Developers often build and unit-test containers on their local environment with bridged or NAT-based networking. This works well, but can cover up performance issues that arise when used in production data centers or cloud-based deployments. The solution is to include continuous integration testing that accurately emulates your production environment (a basic tenet of DevOps, by the way) to uncover network behavior and performance issues. It can also uncover complexities in network activity that will make monitoring more difficult.
Use a tool that lets you generate real network traffic, where you can test the effects on different network configurations before deploying (see Figure 2). For instance, you can test different network binding techniques, load balancing algorithms, clustering configurations, and cross-container communication.
Related to the above, container-based applications often derive from existing monolithic or enterprise applications. As a result, internal APIs may not be ideal for cross-container API calls that span the network. The result may be inefficient remote access, excessive round trips, or data transmitted in non-optimized formats. Beyond data and network inefficiencies, internal container processing may not have been built for remote access, resulting in high resource usage.
For instance, asynchronous API calls, when extended between containers, may result in large numbers of threads spawned for REST or other communication. Often used N+1 patterns for redundancy can also result in network performance issues with containers. This problem extends to databases, REST endpoints, web services, caching implementations, and so on.
Conclusion: A Container Performance Strategy
A good container performance strategy begins with both your application specifics and user SLA guarantees. It should include processes, such as code and container definition reviews, container-specific tests, and end-to-end user load/stress tests, as well as OS, network, and hardware metrics. Future-proof performance strategies also include hardware support, continuous integration and testing efforts, release automation, and container runtime modeling to help predict performance issues before your users experience them.
Regardless of your application, development process, or performance testing strategy, it helps to partner with a vendor and use the right tools (such as those mentioned here) to support your container performance requirements.
To learn more about how to implement a successful container performance strategy, register for CA’s upcoming webcast on March 7, Improving Performance in Contantly Changing Applications.
Blog by Eric Bruno
Eric Bruno is a writer and editor for multiple online publications with more than 20 years of experience in the information technology community. He is a highly requested moderator and speaker for a variety of conferences and other events on topics spanning the technology spectrum from the desktop to the data center. He has written articles, blogs, white papers, and books on software architecture and development topics for more than a decade. He is also an enterprise architect, developer, and industry analyst with expertise in full lifecycle, large-scale software architecture, design, and development for companies all over the globe. His accomplishments span highly distributed system development, multi-tiered web development, real-time development, and transactional software development. See his editorial work online at www.ericbruno.com.