As we know that in gRPC programming we usually first create a gRPC client which encapsulates the protobuf generated code, and then we use the grpc.Dial() method to create a grpc.ClientConn instance, and pass it to the gRPC client to communicate with upstream service.

I was wondering if I can reuse the grpc.ClientConn instance between multiple gRPC client instances and thus between multiple concurrent calls to the same upstream service. Yeah, gRPC don’t make me disappointed, it is supporting working in this manner when I found this ticket: https://github.com/grpc/grpc-go/issues/682.

While the ticket didn’t explain the deeper details, and I didn’t get satisfiable result after googled. So I decide to explore the implementation details by myself.

The grpc.ClientConn is indeed a “super instance” that encapsulates many details, the most important thing it encapsulates is the load balancer, it’s the key point for a single grpc.ClientConn instance to support concurrent calls.

So let’s look into balancer.

Balancer

grpc.Dial() accepts a batch number of DialOptions, two options of them are WithBalancer() and WithBalancerName(). WithBalancer() is called the v1 balancer interface, it used to pass a self-implemented balancer. While it was deprecated later since gRPC thinks that the gRPC package should provide the standard implemention for different algorithm types balancers. So we don’t need to implement balancers by ourselves anymore and can just use WithBalancerName() to choose the appropriate one.

I read in gRPC source code that `WithBalancerName()` will also be deprecated soon and be substituted by `WithDefaultServiceConfig()`, but I don't want to talk about `WithDefaultServiceConfig()` because I don't know what the fuck it is.

The gRPC package provides many balancers implementation now, can check them in balancer/ subdirectoy: https://godoc.org/google.golang.org/grpc#pkg-subdirectories.

The balancers in the subdirectory are all contains a init() function to register itself into the global map of the balancer sub-package in initialization time, so that in later program code the WithBalancerName() can found the balancers.

Now there comes two questions:

1. What if I doesn't pass `WithBalancerName()` option when calling `grpc.Dial()`? Which balancer will be used?
This looks not a worth discussing question, since there's an anwser in our mind: there must be a default one if we don't pass explicitly. But I want to list this question here since it's the case in Grab.
2. Now that we got the balancer, where the upstream service addresses? 

First question, if we don’t pass WithBalancerName(), a default balancer called PickFirstBalancer will be used, maybe because it exists early in the package, it’s source file is not in the balancer/ subdirectory, it’s in grpc package’s root directory. This balancer is behaves just like the name indicating: always returns the first unit to the caller.

It’s a bit strange why we need such a balancer since it doesn’t do any balancing at all, actually it can be used on the scenery when there’s the upstream service has only one address or the upstream service hold a dns policy that only want the callers call the first resolved ip at certain time.

In Grab, we use this balancer because we have an upper level load balancer(integrated with our self-implemented connection pool) and we really pass only one target address to the balancer when calling grpc.Dial().

While our self-implemented connection pool will call `grpc.Dial()` multiple times then create multiple `grpc.ClientConn` instances  for a single target address.

Now comes the second question, in gRPC package the balancer don’t accept an address list, instead it depends on a resolver to provide the target addresses to it. The resolver also responsible for notifying the load balancer if the target addresses change. I don’t want to dig into balancer related code deeply, because the balancer part code in gRPC is really a piece of shit! But you should still able to know the core things after you finish reading this article.

Resolver

Resolver is used to parse the target address from dns(or any other forms of target name) to a balancer understandable address. And resolver is just like balancer, it has it’s own sub-directory in gRPC package called resolver/, and it also contains several official resolvers so you don’t need to implement by yourself.

grpc.Dial() has a bit intelligent to choose the appropriate resolver for you based on your passed target string, e.g. if the target string is in format of “dns:///” then the dns resolver in resolver/ sub-package will be used.

There’s also a default resolver be used if you don’t specifying which resolver to use and the target string format doesn’t fit into any predefined one when calling grpc.Dial(), it’s called “Passthrough” resolver. “Passthrough” means it just pass the address unchanged to load balancer. This is also the case at Grab, at Grab we simply passes the upstream host’s IP address to grpc.Dial().

Now I don’t mean to show the detailed code of resolver/ since it as shit like as the balancer/, just to show a core produre for you.

When the resolver is ready, in “passthrough” case it is ready at the time of being creation since it doesn’t need to do resolving work, it will call grpc.ClientConn super instance’s updateResolverState method with the addresses, which will then pass these addresses to the load balancer’s updateResolverState method, note that if you doesn’t specify any balancer explicitly at grpc.Dial() and the load balancer is nil, the default load balancer “PickFirstBalancer” will be created first at this step, then the addresses are passed to balancer’s updateResolverState.

The balancer’s updateResolverState will continue to send the addresses to a channel called resolverUpdateCh, a goroutine called “balancer watcher” which was created at the same time when the balancer was created, will read from this channel and updates the balancer’s address set dynamically, thus implement dynamic updating addresses and dynamic load balancing.

The “balancer watcher” goroutine then continue to call the balancer’s HandleResolvedAddrs, in where the connection to upstream was real created. In here the grpc.ClientConn.newAddrConn() method will be called with the many addresses, the returned instance is the addrConn instance, but it is called the “SubConn” instance. The “SubConn” is belonging to the balancer, but you can also regard the “SubConn” instance belonging to grpc.ClientConn super instance.

Next step is to call addrConn.connect() method, this is the method really create the lowlevel connection to target, the created result is a grpc transport client which essentially a http2 client.

Reference

1. https://github.com/grpc/grpc/blob/master/doc/load-balancing.md