Docker, open platform for distributed applications for developers, is one of those technologies that everyone seems to be talking about these days. There is lots of enthusiasm for Docker and for container technologies in general, but there is something else I have noticed. When I ask people if they are using Docker in production, the enthusiasm starts to wane, and the answer is often “not yet.” I wanted to share some of our own experiences with Docker. And the best way to start might be to explain what our existing environment looks like, and what the underlying motivations for us to move to Docker are.
I’ve always been a strong be believer of automated deployment and configuration management, especially for supporting a cloud service at scale. I previously worked at BladeLogic, an early pioneer in the devops space and am very familiar with existing configuration management tools like Chef and Puppet. Within Fuze we have made a large investment in Chef to automate the provisioning of and updates to our virtualized environment. A centralized devops team uses Chef recipes to build dev, QA and production environments. The same recipes are used to move application updates from dev to production. This is all working reasonably well for us, but some challenges have emerged over time.
The Challenges of Configuration Management
The main problem is that building Chef recipes can be cumbersome and time consuming. Devops needs a lot of knowledge to be effective including Ruby, infrastructure, applications, datastores and all the interdependencies between the different platform components. On the other side, dev can’t do what they want without involving devops and often feel constrained. To give an example, a practical problem we have encountered is that it is slow to stand up environments. Turning up a full stack dev environment that includes a base OS installation and chef bootstrapped personality install might take as long as an hour, a crazy amount of time when dealing with an automated process. In general what we have found is that the need to create / update / modify Chef recipes is the bottleneck in our application deployment pipeline.
The promise of Docker is that it will solve many of these problems. Docker does several things very well. It has lots of developer-friendly tools to facilitate app packaging and container creation. It is much simpler to use than Chef. Using Docker in the way it was intended redraws the boundary between dev and ops. Our idea is that we would continue to use Chef, but in a much more limited role to manage the underlying infrastructure. The majority of updates will move to Docker. This move will shift application update control back to the development org, and the existing devops team will manage the underlying infrastructure and provide Docker as a service to these devs. The key benefit we are after is an improvement in our app deployment pipeline velocity, allowing us to get updates and change to market faster than before.
Our Learning Curve
In implementing Docker to manage our application update lifecycle, we quickly found that there are significant prerequisites that need to be thought through and put in place. Some of the notable ones include a discovery and configuration service, and a secrets service. We are using Consul for service discovery and configuration. The idea behind our use of Consul is to externalize all the environment specific settings, like database connection strings, which change as you move from dev to QA to production. Then you remove these settings from your Docker containers and these values are discovered at runtime from Consul. A similar strategy is employed for secrets where we are using Vault to store passwords, encryption keys, etc. There are a lot of interesting security features in Vault that help manage the secrets in our environment. These systems allow the same Docker container to be moved between different environments, allowing for a rapid update cycle.
There are clear and undisputed benefits for using Docker in Dev and QA environments. Developers love the tooling and the speed with which they can package up their work. One note of caution though. It is increasingly easy to find prebuilt Docker containers for all sorts of things via repositories on the internet. Developers can pull a collection of Docker files to quickly assemble a service. But where did these Docker files come from? What code is running within them? Were they obtained from a trusted source?
We discovered that we needed to make sure there was still a strong governance process around vetting code that was being added to our platform, even if it comes in via Docker containers. The fact that it is in a container doesn’t change the fact that we want to make sure there isn’t any malicious code included, or that we are ok with the software licenses that come attached.
But beyond this concern, the biggest obstacle we have hit is stability. Docker is a young technology with a lot of change still happening in core parts of the platform. We have experienced many hung containers and problems with networking in particular. Based on my discussions with other CTOs, our experiences do not seem to be unique. With Docker in dev and QA, but not in production we are not able to realize much of the application update velocity improvement that originally motivated us. The only production environments that are safe to dockerize are environments where you have a truly stateless clustered set of nodes, and where the failure of one of those nodes is a non-event and it is ok for this to happen on a daily basis.
Our conclusion on Docker is that it is not yet ready for production use. That being said, we are still very bullish on the technology and just think it needs some more time to stabilize and for all the production kinks to be worked out. We are continuing to prepare for the dockerization of our environment, which I do think is inevitable. But we aren’t ready to pull the trigger just yet.