Plasma GitLab Archive
Projects Blog Knowledge

Module Rpc_proxy


module Rpc_proxy: sig .. end
RPC proxies


The Rpc_proxy module provides an improved reliability layer on top of Rpc_client. This layer especially features:
  • automatic connection management: TCP connections are started and terminated as needed
  • multiple connections can be held in parallel to a remote server to increase concurrency on the server
  • failover to other servers when the orignal servers time out
  • support for an initial ping at connection establishment time to test the availability of the connection
  • retransmission of idempotent RPC calls
Proxies can only handle stream connections (TCP and Unix Domain). Also, the remote endpoints must already be specified by socket addresses. (No portmapper and other indirect lookup methods.)

The proxy functionality is implemented in two layers, the managed clients, and the managed sets. The former layer can handle only one TCP connection (with reconnect), whereas the latter is able to manage a bunch of connections to the same service. Both layers can profit from a reliability cache that knows which services had errors in the past.

See below for a tutorial.

There is also a blog article explaining RPC proxies: The next server, please!

module ReliabilityCache: sig .. end
module ManagedClient: sig .. end
module ManagedSet: sig .. end

The Rpc_proxy tutorial

Managed clients

A normal RPC client has a very limited lifecylce: It is created, then a connection is made to an RPC service, messages are exchanged, and finally the connection is terminated. After that the client becomes unusable. In short, it is "use once" client.

In contrast to this, managed clients can be recycled. This is especially useful for dealing with socket errors, and connection terminations triggered by the RPC server.

How to use managed clients: For a normal RPC client the generator ocamlrpcgen creates all required glue code to easily start RPC calls. For example, if a file proto.x is taken as input for ocamlrpcgen, a piece of code doing a call could look like:

 
      let client =
        Proto_clnt.PROG.VERS.create_client connector protocol
      let result =
        Proto_clnt.PROG.VERS.procedure client argument
   

(Here, PROG, VERS, procedure are just placeholders for the name of the program, the version identifier, and the procedure name.)

For RPC proxies, however, this is slightly more complicated. ocamlrpcgen does not produce a managed client that is ready for use. Instead, only a functor is provided that can take the Rpc_proxy.ManagedlClient module as input:

      module M = Proto_clnt.Make'PROG(Rpc_proxy.ManagedClient)

      let esys =
        Unixqueue.create_unix_event_system()
      let mclient_config =
        Rpc_proxy.ManagedClient.create_mclient_config
          ~programs:[ Proto_clnt.PROG.VERS._program ]
          () in
      let mclient =
        Rpc_proxy.ManagedClient.create_mclient mclient_config connector esys
      let result =
        M.VERS.procedure mclient argument
   

(The functor approach has been chosen, because it gives the user more flexibility - it is possible to apply the functor on other implementations of improved clients than Rpc_proxy.ManagedClient.)

Note that esys is always explicit, even in the case the user only performs synchronous calls - the user should create a new esys then, pass it to mclient, and ignore it otherwise.

Now, how does the recycling feature work? The managed client can be in one of three states:

  • `Down: The client is not connected. This is the initial state, and the state after errors and terminated connections (no matter whether triggered by the client or by the server)
  • `Connecting: The client is busy (re)connecting (only used in some cases)
  • `Up sockaddr: The client is connected and has the socket address sockaddr
The state can be queried with Rpc_proxy.ManagedClient.mclient_state. When it is `Down, the next RPC call automatically starts the reconnect to the service. When the connection is established, the call is done, and the messages are exchanged that are representing the call. After that, the state remains `Up after the call.

When the call stops because of an error, the error is reported to the user in the normal way, and the client is shut down, i.e. after an error the state is `Down. If the user decides to try the call again, the client automatically reconnects following the outlined rules. Note that managed clients never automatically retry calls by themselves.

When the TCP connection is regularly shut down (either by the server or by the client calling Rpc_proxy.ManagedClient.shut_down), the client state is changed to `Down at the next opportunity. Especially a server-driven shutdown may first be detected when the next RPC call is tried on the connection. This may or may not lead to an error depending on the exact timing. In any way, the connection is finally established again.

Of course, managed clients must be shut down after use, because there is no other (automatic) way of recognizing that they are no longer used. Call Rpc_proxy.ManagedClient.shut_down for this.

Managed client also have a few more features that can be enabled in mclient_config, especially:

  • Initial ping: This means that the TCP connection is tested before being used for user operations. The test is done by pinging the service once (via the RPC null procedure). This is recommended because some connectivity problems can first be detected when the TCP connection is actually used.
  • Idle timeout: The TCP connection is closed after it is idle for some period of time. "Idle" means here that nothing is being transmitted, and that no response from the server is expected. The connection is closed at the first opportunity. The user should be aware that this can only happen when the event loop for esys is running. Especially for synchronous calls this is typically not the case, so one would have to call Unixqueue.run esys now and then to create opportunities for detecting the idle timeout.
  • Reliability cache: The cache object counts errors, and can disable certain service endpoints if they only produce errors. This mostly makes sense when there are alternative endpoints, i.e. in the context of a managed set (see below).


Managed Sets

Managed sets are another layer on top of the managed clients. These sets are able to manage several connections where each is implemented as managed client. The connections can go to the same server endpoint in order to parallelize RPCs at the client side, or to several server endpoints that provide the same service. The latter can be used for client-driven load balancing, and for client-driven failover management of HA setups (HA = high availability).

For creating a managed set, the code looks like

      module M = Proto_clnt.Make'PROG(Rpc_proxy.ManagedClient)

      let esys =
        Unixqueue.create_unix_event_system()
      let mclient_config =
        Rpc_proxy.ManagedClient.create_mclient_config
          ~programs:[ Proto_clnt.PROG.VERS._program ]
          () in
      let mset_config =
        Rpc_proxy.ManagedSet.create_mset_config
          ~mclient_config
          () in
      let services =
        [| connector, n_connections; ... |] in
      let mset =
        Rpc_proxy.ManagedSet.create_mset 
          mset_config 
          services
          esys in
      let mclient, idx =
        Rpc_proxy.ManagedSet.mset_pick mset in
      let result =
        M.VERS.procedure mclient argument
    

The managed clients are internally created by the set - one only should pass in mclient_config so the set knows what kind of client is preferred. For the simple application of maintaining several connections to the same server, one would create the mset with a one-element service array:

       let services =
          [| connector, n_connections |]
    

where connector describes the server port, and n_connections is the maximum number of connections to create and maintain. The Rpc_proxy.ManagedSet.mset_pick function creates internally up to n_connections managed clients, and returns one of them. By default, it is not guaranteed that the client is idle (meaning no previous call is pending) - if the connections are all already busy, mset_pick starts returning busy connections (but the least busy one first).

There are a number of options allowing to modify the default behavior:

  • One can enforce that only idle clients are returned by mset_pick. To do this, pass the argument ~mset_pending_calls_max:1 to Rpc_proxy.ManagedSet.create_mset_config. It can then happen that no client is idle, and mset_pick will raise Rpc_proxy.ManagedSet.Cluster_service_unavailable.
  • If the services array has more than one element, they are considered as equivalent service endpoints. mset_pick will pick one of the endpoints. There are two policies controlling the selection: With ~policy:`Balance_load it is aimed at sending roughly the same number of calls to all endpoints. With ~policy:`Failover the services are assigned precedences by the position in the array (i.e. the first service is used as long as possible, then the second service is used, etc.). The policy argument is again to be passed to Rpc_proxy.ManagedSet.create_mset_config.
Of course, managed sets must be shut down after use, because there is no other (automatic) way of recognizing that they are no longer used. Call Rpc_proxy.ManagedSet.shut_down for this.

Caching reliability data

The cache allows to disable certain hosts or ports when the error counter reaches a limit. The service is disabled for a limited time span. This is especially useful when there is an alternate port that can jump in for the failing one, i.e. when the services array of a managed set has two or more elements.

There is a single global cache object, but one can also create specific cache objects. Generally, cache objects can be shared by many managed clients and managed sets. The hope is that sharing is useful because more data can be made available to users of services. If you do not want to use the global cache object, you can create your own, and configure it in mclient_config.

The global cache object is automatically used when nothing else is specified. The global cache object is by default configured in a way so it does not have any effect, though. So we have to change this in order to enable the cache:

     let rcache_config =
       Rpc_proxy.ReliabilityCache.create_rcache_config
        ~policy:`Independent
        ~threshold:3
        () in
     Rpc_proxy.ReliabilityCache.set_global_rcache_config rcache_config
   

This means that 3 errors in sequence disable a service port. `Independent means that each port is handled independently in this respect.

At the first time, the port is only disabled for one second. The duration of the time span is increased by each additional error until it reaches 64 seconds. These durations can be changed, of course.

As the impact of changing the global cache object is sometimes unpredictable, one can also create a private cache object (Rpc_proxy.ReliabilityCache.create_rcache). Another way is to derive a semi-private object from the global one. This means that the error counters are global, but the interpretation can be set individually in each use. This would look like:

    let rcache_config =
      Rpc_proxy.ReliabilityCache.create_rcache_config
        ~policy:`Independent
        ~threshold:3
        () in
    let rcache =
      Rpc_proxy.ReliabilityCache.derive_rcache
        (Rpc_proxy.ReliabilityCache.global_rcache())
        rcache_config in
    ...
    let mclient_config =
      Rpc_proxy.ManagedClient.create_mclient_config
        ...
        ~rcache
        ...
        ()
    

Idempotent calls

In the layer of managed sets there is some limited support for automatically repeating failing idempotent RPC calls.

Instead of calling the RPC with

      let mclient, idx =
        Rpc_proxy.ManagedSet.mset_pick mset in
      let result =
        M.VERS.procedure mclient argument
    

one uses

      let result =
        Rpc_proxy.ManagedSet.idempotent_sync_call
          mset
          M.VERS.procedure'async
          argument
    

The effet is that Rpc_proxy.ManagedSet.idempotent_sync_call repeats automatically the call when an error occurs. It is assumed that the call is idempotent so it can be repeated without changing the meaning.

The call may be repeated several times. This is configured in the managed set mset (parameter mset_idempotent_max).

Note that one has to pass the asynchronous version (suffix 'async) of the RPC wrapper even when doing a synchronous call.

Also see the documentation for Rpc_proxy.ManagedSet.idempotent_async_call.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml