Continuing our focus on the shift to the Product IT model, this article elucidates the technological challenges that enterprises face as they strive to adopt the Product IT model.
A typical enterprise eco-system is a complex mesh of archaic, arcane and emerging technologies, making a big bang move to a new operating model untenable and unfeasible. It is important to break down such a move into smaller more manageable and traceable action items.
Based on our extensive experience of working with enterprise clients, we have broadly categorized the technological changes needed into different groups (see Figure 1).
Figure 1: Categories of technological changes needed to move to a new operating model
Microservices and containers have been a major force in the modernization of legacy applications, easing the move to the cloud eco-system. The adoption of containers has been driven by the need to create lightweight de-coupled microservices that can interact with each other using standard API’s within the constraints of service contracts. In addition to this, container orchestration platforms such as kubernetes also provide the ability to auto scale and auto heal services, leading to the Nirvana State of NoOps.
However, adopting container-based solutions is not without its fair share of challenges (see Figure 2). These challenges are not mutually exclusive and can be dependent on or closely related to one another.
Figure 2: Challenges in adopting container-based solutions
1. Skilled Resources
One of the major challenges that organizations face is attracting and retaining the right talent. With the technology being only six years old and experiencing an unprecedented rate of adoption, the demand for it outstrips the supply by several-fold. While developers will need to upskill themselves to build containerized apps, the challenge also lies in managing operations. The skill sets needed to operate container-based platforms are very different from the ones traditional operations professional possess. Strong scripting and coding skills are needed to effectively manage operations, along with knowledge of application architecture, data flow and application logging. The complexity involved is high as the approach involves packaging applications and their dependencies along with an operating system into a lightweight container. This container then runs on an orchestrated engine that is designed to ensure that the containers adhere to the pre-defined rules set for them.
Here’s an example of the skills gap. Mindtree planned an all-day recruitment drive for hiring engineers with hands-on Kubernetes /Docker skills and good understanding of the surrounding eco-system experience. Although we received many well-written resumes, we were unable to identify a single technologist with in-depth understanding of the technology.
2. Fast-evolving technology eco-system
The Docker/container ecosystem is evolving at an extremely rapid clip. As an open-source technology with immense potential, it has attracted a deluge of third party tools and services. Tools to help developers - write Docker images, deploy, configure and manage containerized workflows, and bring automation across various stages of the containerized app’s lifecycle - are emerging every day. While this might sound like a good thing for an emerging platform perspective, shortlisting the right set of technologies/tools to be used for different types of applications and devising upgrade strategies becomes highly complex. It also throws up the challenge of acquiring and retaining skilled resources with knowledge of multiple emerging technologies, as discussed earlier.
One of our client engagements involved containerizing an added value services application for a large telecom provider. The requirement was to whitelist certain predefined IPs from which the billing requests needed to originate for transactions to be considered legal. The client has already decided to go with a cloud hosted managed container platform, but most cloud platforms at that point in time were unable to provide that functionality. We therefore worked with the cloud provider and designed an innovative solution comprising a new tool to mobilize the setup.
3. Organizational inertia
Implementing containers efficiently is a big shift from the traditional ways of working that organizations are used to; it touches all the aspects of an application’s lifecycle across development and operations. It requires a rapid pace of innovation while simultaneously empowering developers and substantially reskilling ops teams. DevOps principles must be followed along with automation of CI & CD for a containerized environment to succeed.
The existing ways of working, especially the change management process, access restrictions for developers and lack of product knowledge of operations teams; are big impediments in the adoption process. For example, a change management process that includes a weekly CAB meeting and multiple approval steps from various stakeholders hampers the velocity that is needed for containerized eco-systems. Similarly restricting developer access to just writing and committing code also reduces the effectiveness and velocity of such implementations. Developers need to take responsibility for ensuring that any application logic they create runs well in production and offers the capability to quickly fix issues, if need be.
While developing a containerized application for a large hospitality player, we were faced with the challenge of enabling access to the containerized platform access for developers and empowering operations engineers to debug application issues. We resolved this challenge by enabling developer’s access to the lower environments and creating automated CI/CD pipelines wherein the developers could themselves deploy code to non-prod environments.
4. Choice of technologies
With the rapid evolution of the eco-system and plethora of options available, it becomes extremely difficult to choose the right set of technologies. Unlike in the past, there are no outright leaders - a startup might have better and more relevant container oriented tools than a technology giant.
In essence, there are multiple options for cloud-based managed container solutions: cloud-based and on premise managed kubernetes solutions, self-managed Kubernetes solutions, persistent storage, orchestration, service discovery, networking, secrets management, tracing, monitoring; and a laundry list of technologies with unique selling points. To top it all, new and better options are being built constantly.
As a result, organizations often struggle to select the right tools for their use case and workloads while ensuring that the tool also meets their future needs. Moreover, organizations like to use their existing licensed tools wherever possible, but with limited knowledge about interoperability of existing and container eco-system tools, the decision becomes a difficult one.
One of our clients - an online video streaming company that had moved to microservices using spring boot - wanted to deploy these to a cloud-hosted kubernetes platform. We were tasked with creating a service discovery engine. There were significant differences between our team members in terms of the technology we should choose. Should we use Consul, Etcd, Registrator, confd combination and zookeeper or should we use the inbuilt service discovery of Docker Swarm, Kubernetes, ECS. Choosing the right tool became an uphill task even though we were a group of technology experts.
5. Implementation strategy
Another roadblock that organizations face in container adoption is the lack of a proper implementation strategy. Container adoption has seen maximum traction at the developer and operations engineer level, but successful production implementation requires a well-thought out strategy. In most organizations the adoption is ad-hoc without a cohesive plan, with different teams choosing their own strategies. Typically, short and long term goals are not defined with effective checks and balances missing in the transition phase. Metrics to be tracked are also not defined and benchmarked.
Other issues include choosing applications without evaluating the advantages /disadvantages of moving to containers. Anticipation of workloads, need for auto scaling, auto healing, data persistence, service discovery, circuit breaking, and canary deployments are often not taken into consideration. As a result, the outcomes of the move fall short of expectations. Choosing the wrong orchestration engine or setting it up improperly can also turn into major pain points. For instance, both Kubernetes and Docker Swarm work in a master slave configuration and use multi-master setup. It’s also important to understand the consensus algorithm that elects the leaders in order to prevent issues.
For example, the RAFT consensus algorithm requires a majority or quorum of [(N/2) +1] members to agree on values proposed to the cluster.
6. Container monitoring
Container monitoring is a different animal all together and very few organization have been able to master the ephemeral, super-fast and light weight containers. Orchestrated container environments are so complex that regular monitoring solutions prove inadequate.
Let’s consider a use case to understand the level of complexity involved. Let’s say we are running a container orchestrator that has a multiple master setup wherein the master nodes must be in sync and continuously communicate with each other. In addition, these master nodes run multiple control activities for maintaining the state of the containers, performing health checks and dynamically creating new containers in case of failure. Additionally, the orchestrator also creates an overlay network, which is a distributed network for Docker daemon hosts. This network sits on top of the host-specific networks, allowing containers connected to it (including Swarm service containers) to communicate securely.
All of these factors must be taken into account when enabling monitoring.
7. Container security
Container security is another aspect that is often neglected while containerizing applications and developing an orchestration strategy and scripts. Security breach of a container is as big a concern as an OS level breach, more so in the case of privileged containers that have root access. Developers and operations engineers do not give enough thought to examining the potential vulnerabilities of the platforms. More often than not, interaction between services is not adequately tracked, making it difficult to identify erroneous interactions, security violations, and potential latency. Most Docker images are built on top of open source base light weight operating systems and the based images are shared amongst containers, as Docker uses the copy on write concept. This can potentially expose a large surface area in case of vulnerability.
While implementing a container orchestration solution for a large financial client, we figured out that the Etcd primary data store was actually storing sensitive data as unencrypted plain text on disk. Such gaps can lead to major issues if they move on to the production environment.
While the long list of challenges can seem daunting at first sight, most of the pitfalls can be avoided by thoroughly preparing and planning for the transition with a sound strategy.