In this article, we will dive deep into the internals of gRPC architecture. We will discuss how gRPC and Protocol Buffers work together internally to provide a framework for Remote Procedure Calls (RPC).
Before understanding the internals of gRPC, let’s discuss the OSI model, TCP 3-way handshake, and HTTP/2 protocol which serves as the basic building block for subsequent discussions.
OSI Model
OSI Model stands for Open Source Interconnection. It is a conceptual model that describes how two hosts in the network communicate with each other using standard protocols.
The model includes 7 layers with each layer catering to its own core functionality.
- Application Layer (L7) Layer: This layer receives user input in the form of data and converts it into a binary format. HTTP is the most popular protocol in the Application Layer.
- Presentation Layer(L6) Layer: This layer is responsible for protocol conversions, data formatting, translation, and compression of data.
- Session Layer(L5) Layer: This layer is responsible for Domain Name Resolution(DNS), connection establishment, and termination.
- Transport Layer(L4) Layer: This layer receives data from the L5 layer and breaks them into smaller chunks called
Segments
. EachSegment
is enriched with a source and destination port. This layer is calledEnd-to-End delivery
and deliversSegments
from a source process running on a particular port to a destination process running on a particular port. TCP and UDP are the prominent protocols in this layer. - Network Layer(L3) Layer: This layer receives
Segments
from L4 layer and createPackets
. EachPacket
is enriched with a source and destination IP address. This layer is calledHost-to-Host delivery
and deliversPackets
from the source host to the destination host. - Data Link(L2) Layer: This layer receives
Packets
from L3 layer and createsFrames
. EachFrame
is enriched with a source and destination MAC address. This layer is calledHop-to-Hop delivery
and deliversFrames
from one device to another within the same network. - Physical(L1) Layer: This layer receives
Frames
in the form of bits and converts them into analog or digital signals propagating them through channels such as coaxial cable, fiber optic cable, DSL cable, etc…
On the sender side, the data moves from the L7 layer to the L1 layer, and the resultant signals are propagated through one of the channels to the destination. On the receiver side, the exact reverse happens.
Transmission Control Protocol (TCP)
TCP is a transport layer protocol that provides an abstraction of a reliable network over an unreliable channel. A TCP connection starts with a 3-way handshake before any data exchange between the client and the server can happen.
- The client picks a random sequence number
x
and sends aSYN
packet with additional metadata. - The server receives the
SYN
packet and incrementsx
by 1, determines its own sequence numbery
, and sends aSYN ACK
packet back to the client with additional metadata. - The client receives the
SYN ACK
packet, increments bothx
andy
, and completes the handshake by sending theACK
packet to the server.
Since a 3-way handshake is a very expensive operation, the client and server establish the connection once and reuse the same connection for subsequent data transfer requests.
This is the reason why it is recommended to implement connection pooling for remote network calls.
HTTP/2
HTTP/1.1 introduced persistent connections where-in a TCP connection can be used across multiple HTTP requests.
HTTP/1.1 also introduced request pipelining where-in the clients can fire multiple requests to the server before receiving the response within the same TCP connection.
But it did not allow multiple responses to be multiplexed on the same connection forcing each response to be returned sequentially. This phenomenon is called HoL (Head-of-Line) blocking.
HTTP/2 is a major advancement in the HTTP protocol after HTTP/1.1 providing multiple features such as Binary Framing Layer, Multiplexing, Header Compression, Server Push, etc…
Binary framing layer
HTTP/2 retained the semantics of HTTP/1.1 on HTTP methods, status codes, and headers but introduced a new Binary Framing Layer that dictates how the message is encapsulated and transmitted between the client and the server.
The following are the major components of this layer:
- Stream: Stream is a virtual channel within a TCP connection that allows the bi-directional flow of bytes between the client and the server. Each stream has a unique identifier to identify the self which is used to sequence them on the destination.
- Frame: A Frame is the smallest communication unit in HTTP/2 which includes a type, a stream identifier, and an optional payload. There are different types of frames which are covered in the next section.
- Message: A message maps to a logical Request or Response. Each message is broken down into multiple frames.
Following are the different types of supported frames:
- DATA: Frame that carries the actual data.
- HEADERS: Frame that carries headers containing metadata.
- PRIORITY: Frame that carries priority information of the resource.
- SETTINGS: Frame that configures how two endpoints should communicate.
- RST_STREAM: Frame that signals abnormal termination of the stream.
- PING: Frame that is used to measure Round Trip Time (RTT) and acts as a health check.
- GOAWAY: Frame that informs the peer to stop creating streams for the connection.
- WINDOW_UPDATE: Frame that implements flow control per stream and connection.
- PUSH_PROMISE: Frame that notifies the peer in advance of the stream the sender intends to initiate.
Multiplexing
HTTP/2 enabled full request and response multiplexing in a single TCP connection through its Binary Framing Layer.
Within a single TCP connection, multiple logical pipelines — Streams
are created one for each HTTP request and response. Streams
facilitates the bi-directional flow of messages between the client and the server.
Once the Stream
is created, every HTTP request — Message
is broken down into multiple Frames
and transported between the client and the server within that Stream
thus enabling multiple HTTP requests and responses to be multiplexed within a single TCP connection.
Header Compression
HTTP/1.1 transfers headers between the client and the server in plain-text format. Additionally, it transfers a lot of commonly used headers in every request and response resulting in wasted bandwidth.
HTTP/2 addressed this problem by using a header compression algorithm called HPACK.
HPACK uses a dictionary data structure at both the client and the server to avoid transmission of redundant headers and compress the headers sent over the wire.
HPACK replaces entire key-value pairs with lightweight encoded indexes from the dictionary data structure and transmits the encoded index instead of complete key-value pairs, thus saving the network bandwidth. For a detailed explanation of HPACK, refer to this Link.
Server Push
HTTP/2 allows the server to push content to the client before the client requests them.
The server sends a PUSH_PROMISE
frame which signals the server’s intent to push resources to the client, which contains only the HTTP headers. The client can decide to decline the push streams completely. Once the client receives the PUSH_PROMISE
frames, the server pushes the resources to the client in DATA
frames. Thus, HTTP/2 enabled a bi-directional flow of data between the client and the server enabling the server to push data to the client.
gRPC Introduction
RPC (Remote Procedure Call) is a communication protocol where-in a process can request information from another process residing in a different host without understanding the intrinsic network details.
The invocation of remote methods on the server resembles a local function call.
gRPC is an open-source, high-performance, inter-process communication RPC framework built on top of HTTP/2 by Google. It uses Protocol Buffers as the message interchange format which is a platform-neutral, extensible framework for serializing and deserializing structured data.
In the following section, we will discuss the important terminologies and components of gRPC.
Interface Definition Language
Building a gRPC application starts with defining a Service Interface.
The Service Interface Definition dictates the remote methods exposed by the server along with the Request and Response contracts for each method.
The language that is specified in the Service Interface Definition is called Interface Definition Language(IDL). gRPC uses Protocol Buffers as the IDL.
service HealthcheckService {
rpc Ping (PingRequest) returns (PingResponse) {}
}
message PingRequest {
string name = 1;
}
message PingResponse {
string message = 1;
}
Above is an example of an IDL which defines a Service HealthCheckService
which is basically a collection of remote methods.
The service defines a single method Ping
which takes PingRequest
as Request and returns PingResponse
as Response.
PingRequest
and PingResponse
are called Messages which are the data structures exchanged between the client and the server containing a series of name-value pairs called fields.
Each field is associated with a unique number that is used to identify them in the binary format. In the above example,
PingRequest
includes a fieldname
of typestring
with a unique number1
.PingResponse
includes a fieldmessage
of typestring
with a unique number1
.
Code Generator
Once the IDL is defined, the next step is to use a language-specific Protocol Buffer compiler to generate appropriate data structures for Messages in a specific programming language.
It provides a mechanism for serializing, deserializing, populating, and retrieving those data structures in that specific programming language.
Ex: If we use Protocol Buffer compiler for Java, PingRequest
and PingResponse
are converted into appropriate Java classes with suitable getters and setters and appropriate mechanisms for serializing and deserializing them in Java.
Since Service Definition Interface is an extension of Protocol Buffer, a special gRPC plugin generates client and server-side code.
The server-side code is called the server skeleton which provides abstract functions for the remote methods defined in IDL which can be overridden with our custom business implementation.
The client-side code is called the client stub which provides abstractions for the same remote methods defined in IDL simplifying the invocation of remote methods implemented in the server.
Communication Patterns
gRPC provides four communication patterns:
Unary RPC
where-in the client sends a single Request to the remote server and waits for the Response from the server.Server Side Streaming RPC
where-in the client sends a single Message to the server and the server returns back a sequence of Messages.Client Side Streaming RPC
where-in the client sends a sequence of Messages and the server returns back a single Response.Bi directional Streaming RPC
where-in both the client and server can send a sequence of Messages in either direction but the client has to initiate the Request.
Interceptors
gRPC lets developers intercept the execution of remote methods by injecting common logic such as logging, authentication, etc… using Interceptors.
These interceptors can be applied on either the client side or the server side. Additionally, these interceptors can be applied before the invocation or after the invocation.
Metadata is auxiliary information about a particular gRPC call structured in the form of a list of key-value pairs which are heavily used inside the interceptors.
gRPC Internals
The previous section discussed the basics of gRPC and commonly used terminologies. This section covers the various stages involved in remote communication between a client and a server.
Client Stub Invocation
The gRPC plugin bundled with the language-specific Protocol Buffer compiler takes an IDL and generates language-specific client abstractions for the remote methods defined in the IDL. These abstractions are called client stubs and simplify remote method invocations on the server.
The client application invokes the remote method through the client stub resembling a local function call.
Message Encoding
Once the client hands over the control to the client stub, the client stub is responsible for converting the Message into a byte array. This conversion is called encoding/marshaling.
Each remote method in the service definition includes two data structures: A Request Message exchanged between the client and the server and a Response Message exchanged between the server and the client.
Each Message contains a series of name-value pairs called fields with each field having a unique number to identify them in the binary format.
Composition of Message
A Message is composed of a series of tag and value pairs for each field terminating with a 0
indicating the end of the message.
Each tag is composed of two fields: Field Index and Wire Type.
- The Field Index is the unique identifier assigned to each field in the message in IDL.
- The Wire Type is a unique integer value that identifies the data type and is transmitted over the wire. It has a
1:1
correlation with the field data type defined in the IDL
The below diagram explains the correlation between the field data type and wire data type:
Once the Field Index and Wire Type are derived, the tag value is derived using the following formulae:
TagValue = (FieldIndex << 3) | WireType
The above formulae essentially indicate that the Wire Type occupies the last 3
bits of the tag and the Field Index occupy the first X
bits except the last 3
bits.
Encoding algorithms
Once the tag value is determined for every field in the Message, the next step is to encode the value for each field in the Message.
Protocol Buffers support different encoding mechanisms for different field data types. The field types are categorized into different groups and each group uses a different encoding mechanism.
Varints
Varints
also called Variable length integers is an encoding algorithm that uses variable bytes to encode integers. The number of bytes allocated for each value is not fixed and it depends on the integer value itself.
In a varint
encoding scheme, if the Most Significant Bit (MSB) is set, it indicates there are more bytes to come and it acts as an indicator to encounter more bytes during parsing. The next 7
bits encapsulate the value which is stored as 2’s complement of the data.
The data types which use Varints
are int32
, int64
, uint32
, uint64
, bool
and enum
ZigZag encoding
Signed integers sint32
and sint64
which are used to represent positive and negative integers using the ZigZag encoding mechanism.
In a ZigZag encoding scheme, signed integers are converted to unsigned integers in a zigzag way through negative and positive integers.
The negative original values are mapped to odd positive values and positive original values are mapped to even positive values as shown in the above diagram.
Post this mapping, the encoding gives a positive number irrespective of the original value and this value is subject to varint
encoding scheme.
sint32
and sint64
are recommended for negative numbers because varint
encoding for a negative number needs more bytes to represent than an equivalent positive number.
Therefore, by using ZigZag encoding, it converts negative values into positive values thereby reducing the number of bytes.
Non-varint encoding
Non-varint
encoding mechanism allocates a fixed number of bytes for the actual integer value.
There are two data types that fall into non-varint
encoding:
- 64-bit data type such as
fixed64
,sfixed64
anddouble
. - 32-bit data type such as
fixed32
,sfixed64
andfloat
.
UTF-8 encoding
String data types are encoded using UTF-8
character encoding. The encoded value of the string includes two components:
- Length of the string which is a
varint
encoded length. - UTF-8 encoded string value.
Sub-message
Sub-messages are messages having nested messages in themselves. They also fall into wire type 2
similar to a string which consists of two components: the length of the sub-message followed by the encoding of its constituent messages and their fields.
Repeated fields
Repeated fields also fall into wire type 2
that includes the length of the list followed by the encoding of its individual records in the list following a similar pattern as the sub-message.
Similarly, Maps are encoded like a repeated field where the key and value are converted into a message with two fields each.
Packaging Message
Once the field tags are derived and values for each field are encoded, the tag and value for each field are concatenated together to form a byte stream for the entire Message ending with a tag value 0
.
Once the encoded message is ready, the client stub computes the size of the Message and adds it before the Message. This technique is called the Length prefix framing technique.
- The first byte indicates whether the message is compressed or not. A value of
1
indicates that the message is compressed and the compression algorithm is part of theMessage-Encoding
header. - The next
4
bytes determines the size of the message. - The last part includes the encoded message.
HTTP/2 Request Builder
Once the encoded Message is ready, the next step is to construct an HTTP/2 Request.
The client stub creates a HTTP POST
request with an encoded Message. All requests in gRPC are HTTP POST
requests with Content-Type application/grpc
.
A Request message consists of three major components: Request Headers, Length-prefixed Messages, and End of Stream.
- Client stub initiates the remote call to the server once the client sends Request Headers.
- Once the headers are sent, then the length-prefixed messages are sent.
- Once all the messages are sent, finally, the End of Stream flag is sent to notify the end of the request.
The above diagram depicts the header composition of a gRPC Request
- By default, all gRPC methods are sent as
HTTP POST
requests to the server. - The scheme can be
http
orhttps
depending on whether TLS is enabled. - The path header is a concatenation of
ServiceName
andMethodName
separated by/
. In this case, it is a concatenation ofHealthCheckService
andPing
. Authority
indicates the hostname of the remote server.grpc-timeout
defines the request timeout sent by the client to the server which the server uses for setting the execution timeout on the server side. If this timeout is not set, it is assumed to be an infinite timeout.Content-Type
for gRPC requests isapplication/grpc
andgrpc-encoding
scheme isgzip
.
Header names beginning with :
are reserved headers and HTTP/2 mandates them to appear before other headers.
There are two broad categories of headers supported in HTTP/2:
- Call definition headers: They are the headers supported by HTTP/2 and they begin with
:
. - Custom metadata: Arbitrary set of key-value pairs defined by the L7 layer and they should appear after the call definition headers.
This is how the request headers are constructed for a gRPC request.
Data Transport
Once the request headers are constructed, the client initiates the remote communication by sending the headers to the server. The request headers are sent in HEADERS
frame of HTTP/2.
Once the request headers are sent, the client stub sends the length-prefixed encoded message as DATA
frames. If the encoded message does not fit into one DATA
frame, it is broken down into multiple length-prefixed DATA
frames.
When all the DATA
frames are sent over the wire to the server, the client sends an END_STREAM
flag on the lastDATA
frame indicating the end of the message.
The client initiates a TCP 3-way handshake to establish the connection before exchanging any frames. These frames eventually move from L4 to the L1 layer of the OSI model before being transmitted over the wire.
Server Stub Invocation
The gRPC plugin bundled with the language-specific Protocol Buffer compiler takes an IDL and generates a language-specific server skeleton for the remote methods defined in the IDL.
The server skeleton includes abstract method implementations which are overridden by the custom business logic.
Once the request message is received on the server, the server parses the path
header to determine the Service
and Method
that should be invoked.
In this case, the Service
is HealthCheckService
and the remote Method
is Ping
. The server hands over the request to the appropriate Service
and remote Method
.
Message Decoding
Once the Service
and remote Method
are identified by the server, the server tries to convert the request in the byte array format into a Message for execution. This process is called decoding/unmarshalling.
In our example, the byte array has to be converted into a PingRequest
for executing the Ping
method.
The initial step in decoding is examining the compressed flag in the message. If the flag is set to 1
, Message-Encoding
header is used to determine the algorithm to decompress the compressed message. Otherwise, it is treated as an uncompressed message.
Next, the length of the message is used to parse the actual content of the message. The actual message is a concatenated sequence of tags and values for each field in the message defined in the IDL.
The server skeleton includes a parser that reads the sequence of tags and values for each field and constructs the actual Message.
Following is the sequence of actions that are performed to construct the actual Message:
Tag Parsing
The first step in parsing the message is to parse the tag to determine the wire type and unique field identifier.
- Since each tag is a
varint
type, the parser checks whether the MSB bit is set to0
or1
. - If the bit is set to
0
, it parses the last3
bits to determine the wire type. - If the bit is set to
1
, it parses the subsequent bytes till it finds a byte whose MSB is set to0
which is the last byte of the tag. Once it finds the last byte of the tag, it parses the last3
bits to determine the wire type. - Once the wire type is determined, the rest of the bits in the tag are used to determine the unique field identifier using the formulae described above.
Value parsing
Once the wire type is determined, parses know the data type of the field. The wire type specifies the decoding algorithm to use for unmarshaling the value and how many bytes to read past the tag for the actual value.
The unique field identifier is used to map the field identifier in the Request to the field defined in the IDL.
The parser repeats this process of parsing the tag and its corresponding value until it reaches the end of the message.
Once it reaches the end of the message, the value is converted into an appropriate data structure in that programming language. In our example, PingRequest
will be constructed from the byte array.
Remote Method Invocation
Once the request message is decoded to an appropriate data structure in the programming language, the remote method performs business logic and generates a response message.
The response message is encoded using the same technique described above for the request message converting the response message into a byte array.
The response message includes three components: Response Headers, Length Prefixed Message, and Trailers.
Response Headers are sent as HEADERS
frame which includes the HTTP status code,
Content Type
and grpc encoding
.
Once the server sends Response Headers, length prefixed messages are sent as HTTP/2 DATA
frames. If the length prefixed message does not fit into one DATA
frame, it spans multiple DATA
frames.
Finally, trailers are sent as HEADERS
frames to indicate the end of the response. The end of the response stream is indicated by setting the END_STREAM
flag in trailer headers. The trailer includes grpc-status
and grpc-message
. The list of grpc-status
can be found in Link.
This completes a simple request-response cycle in gRPC communication.