Service Mesh Must Reading Guide -2nd Edition: Next Generation Micro Service Development

Key Point

Link to the original (Posted: 2021/09/09)

In the last few years, service mesh technologies have been on a long way.Service mesh plays an important role in adopting cloud natives by various organizations.By providing four major types of functions: connection, reliability, visible, and security, the service mesh has become a core component for the latest initiatives for IT organizations and infrastructure.Service mesh can implement these functions at the development and operational team at the infrastructure level, so the application team does not need to re -invent the wheels for cross -functional requirements.

Since the first edition of this article was released in February 2020, service mesh technologies have emerged with some new architecture trends, technologies, and service mesh projects in the evolving service mesh. did.

Last year, the service mesh product has evolved far beyond that from the solution of the app that was not hosted by the Kubernetes platform only with Kubernetes only with Kubernetes.Not all organizations have shifted all business and IT apps to Kubernetes cloud platforms.Therefore, since the start of service mesh, this technology needed to work in various IT infrastructure environments.

As the adoption of micro service architectures progresses, the application system has clearly separated and dispersed the types of workloads managed in cloud providers, infrastructure (Kubernetes, VM, bear metal server), regions, and service mesh integrated environments.I am.

Let's start with the history of how the service mesh was born.

Around 2016, the term "service mesh" appeared in the field of microservices, cloud computing, and DevOps.An optimistic team explained their product, Linkerd, using this term in 2016.Like many concepts in computing, it has a long history of related patterns and technology.

The appearance of service mesh was mainly due to the worst situation of IT landscape.The developer started building a distributed system using the multiple language (polyglot) approach, requiring a dynamic service discovery.The operation started using temporary infrastructure, appropriately treated inevitable communication disorders, and applied a network policy.The platform team wanted to dynamically route the system and system -driven network proxy, such as Envoy, to use the latest API -driven network proxies such as Kubernetes.

This article aims to answer questions related to the following software architects and technical leaders:What is a service mesh? How do you need a service mesh? How do you evaluate various service mesh products?

You can quickly navigate this guide using the [Table of Contents] menu at the bottom of the page.

Service mesh pattern

The service mesh pattern focuses on managing all inter -service communications in the distributed software system.

context

There are two contexts in this pattern: One is a micro service architecture pattern that builds applications by configuring multiple (ideally single -purpose -independent deployment) services.Is adopted.Next, the organization uses cloud native platform technology such as containers (such as Docker), orchestrator (Kubernetes, etc.), and gateway.

intention

The problem that the service mesh pattern is trying to solve is as follows:

structure

The service mesh pattern is mainly focused on the conventional process called "East-West" Remote Producty Jaccoon (RPC) bestraffic.A request/response type communication that is transmitted from the data center and moves between services.This is in contrast to the API gateway and edge proxy designed to process the "North-South" traffic, which is transmitted from the outside and to process the "North-South" traffic that enters the endpoints and services of the data center.

Service mesh function

Service mesh implementation usually provides one or more of the following functions:

Service mesh has the ability to classify into four areas as shown in the list below:

Let's take a look at the functions that service mesh technologies can be provided in each area.

Connectivity:

reliability:

Security:

Visuality:

Service Mesh Aciture: Look under the cover

The service mesh is composed of two high -level components, data plain and control plane.Matt Klein, the creator of ENVOY PROXY, describes the great details of the topic of "Service Mesh Data Plain and Control Plain".

Roughly speaking, data planes are responsible for "working" and "converting, transferring, and monitoring all network packets sent and receiving between network endpoints."In the latest system, data planes are usually implemented as proxy (ENVOY, HAPROXY, MOSN, etc.) and run out processes as "side cars" together with each service.LINKERD uses a micro -proproxy plan optimized for the use case of the service mesh side car.

Control plane "monitors" and obtains all individual instances of data planes (set of separated status side car proxy) and converted them into distributed systems.Control plane does not touch the packet/request in the system, but instead the human operator can provide policy and configuration to all the data planes running in the mesh.Control plane collects and unifies data plain telemetry so that operators can be used immediately.

Combining control planes and data planes gives both advantages in the sense that policy can be centrally defined and managed.At the same time, the same policy can be applied locally to each pod of the Kubernetes cluster.Policies can be associated with security, routing, circuit breakers, or monitoring.

The figure below is an excerpt from the ISTIO architecture document, and the labeled technology is unique to ISTIO, but the components are common to all service mesh implementation.

How the ISTIO architecture interacts with control planes and proxy data planes (provided: ISTIO document)

Use Case

Service mesh has various use cases that can be enabled or supported.

Dynamic service discovery and routing

Service mesh provides dynamic service discovery and traffic management, such as testing for testing (duplication), canarian release and A/B type experimental division.

Proxies used in service mesh usually recognize the "application layer" (operate with the OSI network stack layer 7).This means that you can use HTTP header or other application -layer protocol metadata data for traffic routing decisions and metrics labels.

Reliability of inter -service communication

Service mesh supports the implementation and execution of cross -field reliable requirements, such as request retry, timeout, rate restrictions, and circuit breakers.Service mesh is often used for compensation (or encapsulation) to deal with pitfalls of eight distributed computing.Service mesh can only provide wire -level reliability support (HTTP requests, etc.), and ultimately affects related businesses, such as avoiding multiple (not applicable) HTTP POST requests.Note that you need to be in charge.

Traffic observation

Since the service mesh is on the critical path of all requests processed by the system, providing additional "observation", such as distributed traces of requests, frequency of HTTP error code, global and service between services.You can do it.Although it is a very old phrase in the field of enterprise, service mesh is often proposed as a method of capturing all the data required to implement the "collective management" view of the traffic flow of the entire system.。

Communication security

The service mesh provides a service ID (via the X509 certificate), confirms that all communications are encrypted (through TLS), and the application -level service/network segmentation ("Service A" is "" Service A ".You can communicate with "Service B", but not to be "Service C"), and confirm that there is an effective user -level ID token or "passport", implementing and execution of cross -field security requirements.I will support you.

Anti -pattern

If the use of an anti -pattern appears, it is often a sign that technology is mature.Service mesh is no exception.

Too many traffic management layers (turtles that overlapped all the way)

This anti -pattern occurs when the developer does not adjust with the platform or operational team and increases existing communication processing logic with the currently implemented code via a service mesh.For example, an application that implements a retroppealies in the code in addition to the wire -level retroppealies provided by the service mesh configuration.This anti -pattern can cause problems such as transaction duplication.

Silver bullet service mesh

There is nothing like "silver bullet" in IT, but vendors sometimes seduce oil into new technologies with this label.Service mesh does not solve all communication problems with microservices, container orchestrators such as Kubernetes, or cloud networking.The service mesh is intended to make it easier for service between service (east and west), and the service mesh deployment and execution are clearly operated.

Enterprise service bus (ESB) 2.0

During the Service -oriented Architecture (SOA) era before microscopic service, an enterprise service bus (ESB) of the communication system between software components was implemented.Some people are concerned that many mistakes in the ESB era may be repeated using service mesh.

The intensive management of communication provided via ESB was clearly worthwhile.However, since the development of technology was promoted by vendors, multiple problems and high costs, such as lack of interoperability between ESBs, the industry standard custom extension (the addition of a vendor-specific configuration on WS-* compliant schemas, etc.).Has occurred.ESB vendors did not interfere with the integration of business logic into communication buses and conflicts.

Big Bang Deployment

There are temptations that believe that the big bang approach to the deployment is the easiest approach to the deployment, but it is not the case according to the Accelerate (Lean and DevOps science) and the State of DevOps Report.The complete rollout of the service mesh means that the Big Bang Deployment is very risky, as this technology is on a critical path that processes all end user requests.

Deathstar architecture

If an organization adopts a microscopic architecture, the development team starts creating new microservices or utilizing existing services in applications, the inter -service communication is an important part of the architecture.Without an excellent governance model, this could lead to different services.It is also difficult to identify which service is a problem if a problem occurs in the entire system in production.

This architecture is called "Desstar Architecture", without a service communication strategy and governance model.

For more information about the anti -patterns of this architecture, check the article about the adoption of cloud native architecture Part1, Part2, and Part3.

Domain -specific service mesh

Local implementation of service mesh and excessive optimization may make the service mesh deployment too narrow.Developers may prefer a unique business domain, but this approach is more disadvantageous than the benefits.It is not desirable to implement a fine range of service mesh, such as the organization (finance, personnel, accounting, etc.) of each business or function domain -specific service mesh.This breaks the purpose of having a common service orchestration solution such as service mesh with ability, such as enterprise -level service discovery and cross -domain service routing.

サービスメッシュ必読ガイド - 第2版: 次世代のマイクロサービス開発

Implementation of service mesh and product

The following is a list of current service mesh implementation, though not covered

In addition, other products such as Datadog have begun to integrate service mesh technologies such as Linkerd, ISTIO, Consul Connect, and AWS App Mesh.

Service mesh comparison: Which service mesh will you use?

The field of service mesh changes very fast, so it may be older to create a comparison.However, there are some comparisons.Be careful to understand the date of comparison with the bias of the source (in some case):

INFOQ recommends that service mesh recruiters always perform their own default and experiments for each product.

Service mesh tutorial

Engineers and architects who want to try multiple service mesh can be used by the following tutorials, playgrounds, and tools:

History of service mesh

INFOQ has tracked the topic currently called service mesh since the release of SMARTSTACK, which provides Airbnb's new "micro service" architecture, which provides external process service discovery mechanism (using Haproxy).increase.To date, many of the organizations that have been labeled as "unicorns" have been working on similar technology before this date.From the early 2000s, Google has developed a Stubby RPC framework that evolves into GRPC, a Google Frontend (GFE), which has its characteristics in ISTIO, and a global software road balancer (GSLB).In the early 2010s, Twitter started developing Finagle using Scala from the appearance of Linkerd service mesh.

In the latter half of 2014, Netflix released the entire JVM -based utility suite, including PRANA, a "side car" process that allows the application services described in any language to communicate with the library stand -alone stance via HTTP.。In 2016, the NGINX team began talking about the "fabric model" very similar to the service mesh, but he had to use NGINX PLUS products for the implementation.Also, Linkerd V0.2 was announced in February 2016, but the team began to call it a service mesh.

In other highlights of service mesh history, the release of ISTIO in May 2017, Linkerd 2 in July 2018.0, Consul Connect and GLOO MESH in November 2018, service mesh interface (SMI) in May 2019, Maesh (currently called Traefik Mesh) and KUMA in September 2019.

Even the service mesh that appeared outside the unicorn, such as HASHICORP's consul, was inspired by the above -mentioned technology, and the CoreOS coined Google Infrastructure (Google Infrastructure for everyone) for everyone.The purpose was to implement the concept of "GIFEE", which means.

To deepen the history of how the concept of modern service mesh has evolved, Phil Calçado has created a comprehensive article, Pattern: Service Mesh.

Service mesh standard

Service mesh technologies have undergone a major change every year for the past few years, but the standard for service mesh has not caught up with innovation.

The main standard for using service mesh solutions is the service mesh interface (SMI).The service mesh interface is a service mesh specification run on Kubernetes.Although the service mesh itself is not implemented, it defines common standards that can be implemented by various service mesh providers.

The goal of SMI API is to provide a common portable service mesh API that can be used by Kubernetes users in a way that does not depend on the provider.This allows you to define applications that use service mesh technologies without being strictly restrained by specific implementations.

SMI is basically a collection of Kubernetes Custom Resource Definition (CRD) and expanded API server.These APIs can be installed on any Kubernetes cluster and can be operated using standard tools.To activate these APIs, the SMI provider is executed in the Kubernetes cluster.

With the SMI specification, both the standardization for end users and the innovation by the service mesh technical provider can be possible.SMI enables flexibility and interoperability and covers the most common service mesh function.The current component specification focuses on the connecting aspect of the service mesh function.API specifications include the following:

The current SMI ecosystem contains a wide range of service mesh, such as istio, Linkerd, Consul Connect, and GLOO MESH.

The SMI specification is Apache License version 2.Licensed in 0.

If you want to know more about SMI specifications and API details, check the following links:

Service mesh benchmark

Service mesh performance is a standard for capturing infrastructure capacity, service mesh configuration, and workload metadata details.SMP specifications are used to capture the following details:

Linkerd team, WILLIAM MORGAN, writes about Linkerd and ISTIO performance benchmarks.There is also a 2019 article about ISTIO's best practices on the benchmark of service mesh performance.

Like other performance benchmarks, it is important to be careful not to overlook these external publications, especially by product vendors.You need to design and execute your own performance tests in the server environment and verify which product conforms to the application business requirements and non -functional requirements.

Explore the future of service mesh (possibility)

Kasun Indrasiri says "The Potential for USING A Service Mesh for Event-Driven Messaging (potential that uses service mesh in event-driven messaging)" and implements messaging support in the service mesh.The new architectural pattern, the protocol proxy side car and the HTTP bridge side car, has been described.This is an active field in the service mesh community, and the work of supporting Apache Kafka on ENVOY is attracting much attention.

Christian Posta has previously wrote about the standardization of the use of service mesh in the Towards a Unified, Standard API for Consolidation Service Meshes.This article also describes the service mesh interface (SMI) announced in 2019 by Microsoft and his Kubecon Eu partner.SMI defines a series of common portable APIs aimed at providing developers with various service mesh techniques, such as istio, Linkerd, and Consul Connect.

Topics that integrate service mesh with platform fabric can be divided into two more subtopics.

First, there is a task that is being implemented to reduce network overhead introduced by service mesh data plane.This includes a data plane development kit (DPDK).This is a user space application that "bypass a heavy layer of Linux Kernel Network Stack and communicate directly with network hardware."There is also a Linux -based BPF solution by the CILIUM team that uses the Linux kernel extension Berkley Packet Filter (ebpf) function to achieve "very efficient networking, policy application, and load dispersion function".Another team uses a network service mesh to modify the concept of service mesh to L2/L3 payloads as an attempt to "reconsider the virtualization (NFV) of the network function in a cloud native method."

Next, there are multiple initiatives to integrate the service mesh more closely with the public cloud platform, as seen in the introduction of AWS App Mesh, GCP Traffic Director, and Azure Service Fabice Mesh.

The BUOYANT team leads to the development of effective human -centered control planes in service mesh technologies.They recently released his Buoyant Cloud, his SaaS -based "team control plane" for the platform team that operates the Kubernetes.The following sections are described in detail.

Since last year, there have been several innovation in the service mesh area.Let's look at some of these innovation.

Multi -cloud, multi -cluster, multi -tenant service mesh

In recent years, a new infrastructure of multi -cloud (private, public and hybrid) by multiple different vendors (private, google, Microsoft Azure, etc.) from a single cloud solution (private or public) by adopting the cloud by various organizations.There is a change to a structure base.In order to realize integrated cloud architecture, it is important to support a variety of workloads (transactions, batches, streaming).

These businesses and non -functional requirements need to deploy service mesh solutions into heterogeneous infrastructure (bare metal, VM, Kubernetes).Service mesh auccible must be converted according to them to support these diverse workloads and infrastructure.

Technology like KUMA supports multimesh control planes and functions business applications in multi -cluster and multi -cloud service mesh environment.These solutions abstract the synchronization of a multiple zone service mesh policy and the service connection between those zones (and service discovery).

Another new trend of multi -cluster service mesh technologies is to connect applications/services from edge computing layers (IoT devices) to mesh layers.

Media service mesh

Media Streaming Mesh or Media Service Mesh developed by Cisco Systems will be used to adjust real -time applications such as multi -player games, multi -parties video conferences, and CCTV streaming using service mesh technologies on the Kubernetes cloud platform.。These applications are increasingly shifting from monolithic applications to microscopic architecture.Service mesh can support applications by providing functions such as load dispersion, encryption, and observation.

Chaos mesh

CHAOS MESH, a project to be hosted by CNCF, is an open source and cloud native chaos engineering platform for applications hosted in Kubernetes.Although it is not a direct implementation of the service mesh, CHAOS MESH enables the CHAOS engineering experiment by adjusting the fall -injection operation to the application.Fall To Inject is an important feature of service mesh technologies.

Chaos Mesh hides the details of the implementation that is the basis so that application developers can concentrate on actual chaos experiments.CHAOS MESH can be used with service mesh.Check this used case for how the team conducted a project chaos experiment using Linkerd and Chaos Mesh.

Service mesh as a service

Some service mesh vendors, such as Buoyant, provide a managed service mesh or a service mesh as a service.Earlier this year, Buoyant announced a public beta release of his SaaS application called Buoyant Cloud.This allows the customer organization to use a managed service mesh with an on -demand support function of Linkerd service mesh.

The functions provided by Buoyant Cloud solution include:

Network service mesh (NSM)

Another Cloud Native Computing Foundation Sandbox Project Network Service Mesh (NSM) offers a hybrid multi -cloud IP service mesh.NSM enables functions such as network service connection, security, and observation, which are the core functions of service mesh.NSM works with existing container network interface (CNI) implementation.

Service mesh expansion

Service mesh expansion is another area where many innovation can be seen.The development of service mesh expansion includes the following:

Service mesh operation

Another important area of service mesh adoption is the operation side of the service mesh life cycle.The configuration of the multi -cluster function, the connection of the Kubernetes workload to the server hosted by the VM infrastructure, the installation of all functions of the multi -cluster service mesh installation and the developer portal that manages the API, etc.It plays an important role in the overall deployment and support of the service mesh solution.

FAQ

What is a service mesh?

Service mesh is a technology that manages traffic between all services (east and west) of all software systems (possibly microscopic -based) software systems.We provide both functional operation that focuses on routing and other businesses, non -functional support such as the application of security policy, service quality, and rate restrictions.This is usually implemented (not exclusive) using a side car proxy that all services communicate.

How is the service mesh different from the API gateway?

See above for the definition of service mesh.

On the other hand, the API gateway manages all input (north -south) traffic in the cluster and provides additional support for communication requirements beyond the functions of functions.This functions as a single entry point to the system so that multiple APIs or services work together to provide users a uniform experience.

Does a micro service deploy need a service mesh?

Not always.Service mesh is usually deployed only if the organization has an operation complexity to the technology stack, or if there is a specific use case that can be solved.

Do I need a service mesh to implement a service discovery with a micro service?

No.Service mesh provides one way to implement service discovery.Other solutions include language -specific libraries (Ribbon, Eureka, Finagle, etc.).

Does the service mesh add overhead/delay to communication between services?

yes.The service mesh adds at least two additional network hops when the service is communicating with another service (the first is from a proxy that processes the outbound connection of the sender.Is from a proxy that processes the inbound connection of the destination).However, this additional network hop usually occurs through local hosts or loop back network interfaces, and only a slight delay (millisecond order).Experimating and understanding whether this is a target use case problem must be part of the analysis and evaluation of service mesh.

Should the service mesh be part of the Kubernetes or the "cloud native platform" where the application is deployed?

Maybe so.There is a debate in maintaining the separation of interest in the cloud native platform components (for example, Kubernetes is in charge of providing container orchestration, and service mesh is in charge of inter -service communication).However, the work of pushing the latest Platform-AS-A-Service (PaaS) product is underway.

What should I do with the implementation of service mesh, deployment, or rollout?

The best approach is to analyze various service mesh products (see above) and follow the selected mesh implementation guidelines.Generally, it is best to work with all stakeholders and deploy new technologies into production.

Can I build my own service mesh?

Yes, but the more appropriate question is, what should you do? Is the construction of a service mesh a core competition of the organization? Can it provide customers in a more effective way?Are you working to maintain your mesh, apply patches to security issues, and always update to use new technologies? Use existing solutions in the available open source and commercial service mesh products.It is probably more effective to do.

Where is the team that owns a service mesh in the software delivery organization?

Usually, the platform or operational team owns a service mesh along with Kubernetes and continuous delivery pipeline infrastructure.However, developers need to work closely for both teams to configure service mesh properties.Many organizations have created an internal platform team that provides tools and services to full -cycle -centered development teams in accordance with the leaders of cloud pioneers such as Netflix, Spotify, and Google.

Is ENVOY a service mesh?

No.ENVOY is a cloud native proxy originally designed and built by the Lyft team.ENVOY is often used as a data plane using service mesh.However, to be considered a service mesh, ENVOY must be used in combination with control planes in a technology collection to become a service mesh.Control planes can be simple, such as unified configuration file repositories and metric collectors, or comprehensive/complicated like ISTIO.

Can the words "istio" and "service mesh" be used in the same way?

No.ISTIO is a type of service mesh.Due to the popularity of ISTIO when the service mesh category appeared, some sources were confused with ISTIO and service mesh.This confusing problem is not unique to service mesh.Same tasks occurred with Docker and container technology.

Which service mesh should I use?

There is no single answer to this question.Engineers need to understand the current requirements and the skills, resources, and time that the implementation team is available.The above -mentioned service mesh comparison link is suitable for the start of the survey, but it is strongly recommended to try at least two meshes to understand the optimal product, technology, and workflow for the organization.

Can I use a service mesh other than Kubernetes?

yes.Many service mesh allows you to install and manage control planes related to data plane proxy in various infrastructure.HASHICORP's consul is the most well -known example, and istio is used experimental in Cloud Foundry.

Additional resource

Glossary

API Gateway: Manage all input (north -south) traffic in the cluster and provide additional support for communication requirements beyond the function.This functions as a single entry point to the system, enables multiple APIs or services to work together to provide users a uniform experience.

Consul: HASHICORP GO -based service mesh.

Container: Container is a standard unit of software that packages code and all dependencies, so applications are quickly and reliable from a computing environment to another computing environment.

Control Plain: Acquires all individual instances of data plane (proxy) and converts it into a distributed system that can be visualized and controlled by operators.

Circuit breaker: processes disability or timeout when connecting to a remote service.It helps to improve the stability and resilience of the application.

Data Plain: A proxy that converts, transfer, and monitor all network packets sent and received between the service network endpoint.

The Docker: Docker container image is a lightweight and stand -alone software package that includes everything required to execute the application (code, runtime, system tool, system library, setting).

East and West (East-West) Traffic: Data Center, Network, or Kubernetes Cluster.In the conventional network diagram, traffic between services (inside the data center) was drawn from left to right (east to west).

Envoy Proxy: Open source edges and service proxies designed for cloud native applications.ENVOY is often used as a data plane for service mesh implementation.

Ingress traffic: Network traffic transmitted from the outside of the data center, network, or Kubernetes cluster.

ISTIO: C ++ (data plane) and GO -based service mesh.Originally, it was created by Google and IBM in cooperation with Lyft's ENVOY team.

Kubernetes: Container orchestration and scheduling framework hosted by CNCF developed by Google.

KUMA: Kong's GO -based service mesh.

Linkerd: Twitter A service mesh using Rust (data plane) and GO (control plane) derived from the early JVM -based communication framework.

Maesh: Traefik API GO -based service mesh, a maintainer of gateway.

MOSN: (ENVOY) A Go -based proxy of the Ant Financial team that implements XDS API.

North-South (North-South) Traffic: Network traffic (or input) to enter (or input) in the data center, network, or Kubernetes cluster.Conventionally, the network diagram was created using an input traffic (north to south) that flows into the data center at the top of the page (north to south).

Proxy: A software system that functions as an intermediary between endpoint components.

Segmentization: Divide the network or cluster into multiple subnetworks.

Service mesh: Distributed (east -west) traffic (east -west) traffic (east and west) of distributed (possibilities for microservices).We provide both functional operations such as routing and non -functional support such as the application of security policy, service quality, and rate restrictions.

Service mesh interface (SMI): A standard interface of service mesh deployed in Kubernetes.

Service Mesh Policy: Specifications of how to communicate with each other/endpoint collection and how to communicate with other network endpoints.

Side car: Additional processes, services, or containers are deployed (think of motorcycles) deployment patterns.

Single Pane of Glass: A UI or management console displaying data of multiple sources on an integrated display.

Traffic shaping: Change the traffic flow of the entire network, such as rate restrictions and load restrictions.

Traffic shifting: Transition of traffic from a certain place to another place.

Traffic split: Make users can gradually guide the percentage of traffic between various services.Used by clients such as input controllers and service mesh side cars to divide the transmission traffic into various destinations.

The role of modern software architects is constantly changing.Submit the newsletter of INFOQ's Software Architects and proceed.

About the author

SRINI Penchikala is a senior IT architect based in Austin, Texas.He has more than 25 years of experience in software architecture, design and development, and is currently focusing on cloud native architectures, microservices and service mesh, cloud data pipelines, and continuous delivery.Penchikala wrote Big-Data Processing with Apache Spark and co-authored MANNING's SPRING ROOO IN ACTION.He frequently gives a lecture at a conference and publishes some articles on various technical websites on big data trenas.