Contents
Ocamlnet already includes Netplex adapters for Nethttpd (the HTTP daemon), and RPC servers. It is likely that more adapters for other network protocols will follow.
Netplex can bundle several network services into a single system of components. For example, you could have an RPC service that can be managed over a web interface provided by Nethttpd. Actually, Netplex focuses on such systems of interconnected components. RPC plays a special role in such systems because this is the network protocol the components use to talk to each other. It is also internally used by Netplex for its administrative tasks.
In the Netplex world the following words are preferred to refer to the parts of a Netplex system:
In order to create a web server, this main program and the following configuration file are sufficient. (You find an extended example in the "examples/nethttpd" directory of the Ocamlnet tarball.)
let main() =
(* Create a parser for the standard Netplex command-line arguments: *)
let (opt_list, cmdline_cfg) = Netplex_main.args() in
(* Parse the command-line arguments: *)
Arg.parse
opt_list
(fun s -> raise (Arg.Bad ("Don't know what to do with: " ^ s)))
"usage: netplex [options]";
(* Select multi-processing: *)
let parallelizer = Netplex_mp.mp() in
(* Start the Netplex system: *)
Netplex_main.startup
parallelizer
Netplex_log.logger_factories
Netplex_workload.workload_manager_factories
[ Nethttpd_plex.nethttpd_factory() ]
cmdline_cfg
;;
Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
main();;
The configuration file:
netplex {
controller {
max_level = "debug"; (* Log level *)
logging {
type = "stderr"; (* Log to stderr *)
}
};
service {
name = "My HTTP file service";
protocol {
(* This section creates the socket *)
name = "http";
address {
type = "internet";
bind = "0.0.0.0:80"; (* Port 80 on all interfaces *)
};
};
processor {
(* This section specifies how to process data of the socket *)
type = "nethttpd";
host {
(* Think of Apache's "virtual hosts" *)
pref_name = "localhost";
pref_port = 80;
names = "*:0"; (* Which requests are matched here: all *)
uri {
path = "/";
service {
type = "file";
docroot = "/usr";
media_types_file = "/etc/mime.types";
enable_listings = true;
}
};
};
};
workload_manager {
type = "dynamic";
max_jobs_per_thread = 1; (* Everything else is senseless *)
min_free_jobs_capacity = 1;
max_free_jobs_capacity = 1;
max_threads = 20;
};
}
}
As you can see, the main program is extremely simple. Netplex includes support for command-line parsing, and the rest deals with the question which Netplex modules are made accessible for the configuration file. Note that detailed information about the config file is available in the Netplex Administration Guide.
Here, we have:
Netplex_mp.mp
which implements
multi-processing. (Btw, multi-processing is the preferred
parallelizing technique in Netplex.) Replace it with
Netplex_mt.mt
to get multi-threading.Netplex_log.logger_factories
are the list of all predefined
logging mechanisms. The configuration file can select one of these
mechanisms.Netplex_workload.workload_manager_factories
are the list of
all predefined worload management mechanisms. The configuration
file can select one of these mechanisms per service.Nethttpd_plex.nethttpd_factory
as the only
service processor.Here, we have:
controller
section sets the log level and the logging method.
The latter is done by naming one of the logger factories as
logging type
. If the factory needs more parameters to create
the logger, these can be set inside the logging
section.service
there is a name
(can be freely chosen), one
or several protocol
s, a processor
, and a workload_manager
.
The protocol
section declare which protocols are available and
to which sockets they are bound. Here, the "http" protocol (name can again
be freely chosen) is reachable over TCP port 80 on all network
interfaces. By having multiple address
sections, one can bind
the same protocol to multiple sockets.processor
section specifies the type
and optionally a
lot of parameters (which may be structured into several sections).
By setting type
to "nethttpd" we select the
Nethttpd_plex.nethttpd_factory
to create the processor
(because "nethttpd" is the default name for this factory).
This factory now interprets the other parameters of the processor
section. Here, a static HTTP server is defined that uses /usr
as document root.workload_manager
section says how to deal with
parallely arriving requests. The type
selects the dynamic
workload manager which is configured by the other parameters.
Roughly said, one container (i.e. process) is created in advance
for the next network connection ("pre-fork"), and the upper limit
of containers is 20.
If you start this program without any arguments, it will immediately
fail because it wants to open /etc/netplex.conf
- this is the
default name for the configuration file. Use -conf
to pass the
real name of the above file.
Netplex creates a directory for its internal processing, and this is
by default /tmp/netplex-<id>
(where id is a hex sequence derived from
the file location of the configuration). You can change this directory by
setting the socket_directory
parameter in the controller
section.
In this directory, you can find:
netplex.controller
which refers to the controller
component.netplex-admin
. You can use it to send control messages to Netplex
systems. For example,
netplex-admin -list
outputs the list of services. A more detailed list can be obtained with
netplex-admin -containers
The command
netplex-admin -shutdown
shuts the system (gracefully) down. It is also possible to broadcast messages to all components:
netplex-admin name arg1 arg2 ...
It is up to the components to interpret these messages.
Netplex uses a generalized pre-fork process model. Let me explain this model a bit, as it is important to know it in order to understand Netplex fully.
The most trivial form of a multi-process Unix server is the post-fork model. Although it is not used by Netplex, it is the model explained in many books, and it is what many people think a Unix server has to look like. Actually, the post-fork model has lots of limitations, and is not suited for high-performance servers.
In the post-fork model, the master process accepts new network connections in an endless loop, and whenever a new connection is established, a sub process (container process) is spawned that processes the network traffic. There is a serious logical limitation, and a performance limitation:
This is achieved by letting the containers themselves accept the new
connections instead of the master process. In the Unix process model
it is possible that server sockets are shared by several processes,
and every container is allowed to accept the next connection. However,
the containers should cooperate, and avoid that several containers call
Unix.accept
at the same time (as this leads to performance problems
when a container must be able to watch several ports for new
connections - a problem we do not discuss here). There are many ways to
organize
such cooperation, and for simplicity, Netplex implements this by
exchanging RPC messages with the master process, the controller.
Effectively, the controller has the task of scheduling which of the
containers accepts the next arriving network connection.
What actually happens is the following. We assume here that we have a number of idle container processes that could accept the next connection.
when_done
function must be called upon connection termination
is that a control message must be sent to the controller.
The server sockets are always created by the controller at program startup. This is a strict requirement because only this ensures that the created container processes share the same sockets.
The sockets are descibed in the protocol
section of the
configuration file. For an Internet socket this section looks like
protocol {
name = "<name>";
address {
type = "internet";
bind = "<address>:<port>";
};
};
The <name>
is only important when there are several protocols in order
to distinguish between them. The <address>
can be:
192.168.76.23
), and IPv6 addresses must be enclosed in brackets
(e.g. [fe80::250:56ff:fec0:1]
).0.0.0.0
to bind the socket to all IPv4
network interfaces, or the special IPv6 address [::0]
to bind it
to all IPv6 network interfaces.<port>
must be the port number or 0 to use an anonymous port.
For a local (Unix domain) socket, the protocol
section looks like
protocol {
name = "<name>";
address {
type = "local";
path = "<path>";
};
};
where the <path>
is the filename of the socket.
One can have several address
sections to create several sockets for the
same protocol.
For detailed documentation, see The protocol subsection.
A Netplex system consists of exactly the services that are enumerated in the config file. This means it is not sufficient to build in support for a service into the program, one must also activate it in the config file. This gives the end user of the program a lot of flexibility when running the system: By simply changing the config file one can enable or disable services. It is also possible to run the same program binary several times with different config files.
The services are implemented by processors, which are user-defined
objects that handle the network connection after it is accepted by
the component. The processor
section of the service selects the
processor by name, and optionally passes configuration parameters to
it:
processor {
type = "<name>";
... further parameters allowed ...
}
The mentioned name of the processor type is used to find the so-called
factory for the processor (an object with a create_processor
method).
All factories must be available at Netplex startup so the library
knows which factories exist when the config file is interpreted
(the factories are an argument of Netplex_main.startup
).
Processor objects are somewhat strange in so far as they exist both in the controller and in the container processes. In particular, these objects are created by the controller, and they are duplicated once for all container processes when these are actually created.
The processor objects (of type Netplex_types.processor
) consist of
a number of methods. We have already seen one of them, process
,
which is called in the container process when a new connection is
accepted. The other methods are called at other points of interest
(see Netplex_types.processor_hooks
for more details):
Methods called on the controller instance of the processor
post_add_hook
is immediately called after the addtion of the
service to the controller.post_rm_hook
is immediately called after the removal of the
service from the controller.pre_start_hook
is called just before the next container process
is spawned.post_finish_hook
is called after termination of the container.
post_start_hook
is called just after the container process
has been created, but now for the copy of the processor object
that lives in the container process. This is a very useful hook
method, because one can initialize the container process
(e.g. prepare database accesses etc.).pre_finish_hook
is called just before the container process
will (regularly) terminated.receive_message
is called when a message from another
container arrives.receive_admin_message
is called when a message from the
administrator arrives.shutdown
is called when the shutdown notification arrives.
The shutdown will lead to the termination of the process
when all network connections managed by Unixqueue are finished.
This method must terminate such connections if they have been
created in addition to those Netplex manages. The shutdown
notification is generated whenever a container needs to be
stopped, for example when it has been idle for too long and is
probably not needed right now (workload-induced shutdown), or
when the whole system is stopped (administrative shutdown).system_shutdown
is another shutdown-related notification.
It is only emitted if the whole Netplex system is going to be
stopped. In this case, all containers first receive the
system_shutdown
notifications, so they can prepare the real
shutdown that will happen soon. At the time the system_shutdown
is emitted, the whole system is still up and running, and so every
action is still possible. Only after all containers have finished
their system_shutdown
callbacks, the real shutdown begins, i.e.
shutdown
notifications are sent out.global_exception_handler
is called for exceptions falling through
to the container, and is the last chance to catch them.If multi-threading is used instead of multi-processing, there is only one instance of the processor that is used in the controller and all containers.
Using predefined processor factories like Nethttpd_plex.nethttpd_factory
is very easy. Fortunately, it is not very complicated to define a
custom adapter that makes an arbitrary network service available as
Netplex processor.
In principle, you must define a class for the type
Netplex_types.processor
and the corresponding factory implementing
the type Netplex_types.processor_factory
. To do the first,
simply inherit from Netplex_kit.processor_base
and override the
methods that should do something instead of nothing. For example,
to define a service that outputs the line "Hello world" on the
TCP connection, define:
class hello_world_processor : processor =
let empty_hooks = new Netplex_kit.empty_processor_hooks() in
object(self)
inherit Netplex_kit.processor_base empty_hooks
method process ~when_done container fd proto_name =
Unix.clear_nonblock fd;
let ch = Unix.out_channel_of_descr fd in
output_string ch "Hello world\n";
close_out ch;
when_done()
method supported_ptypes = [ `Multi_processing; `Multi_threading ]
end
The method process
is called whenever a new connection is made.
The container
is the object representing the container where the
execution happens (process
is always called from the container).
In fd
the file descriptor is passed that is the (already accepted)
connection. In proto_name
the protocol name is passed - here it is
unused, but it is possible to process the connection in a way that
depends on the name of the protocol.
Note that the file descriptors created by Netplex are in non-blocking
mode. It is, however, possible to switch to blocking mode when this is
more appropriate (Unix.clear_nonblock
).
The argument when_done
is very important. It must be called by
process
! For a synchronous processor like this one it is simply called
before process
returns to the caller.
For an asynchronous processor (i.e. a processor that handles several
connections in parallel in the same process/thread), when_done
must
be called when the connection is fully processed. This may be at any
time in the future.
The class hello_world_processor
can now be turned into a factory:
class hello_world_factory : processor_factory =
object(self)
method name = "hello_world"
method create_processor ctrl_cfg cfg_file cfg_addr =
new hello_world_processor
end
As you see, one can simply choose a name
. This is the type of
the processor
section in the configuration file, i.e. you need
...
service {
name = "hello world sample";
...
processor {
type = "hello_world"
};
...
}
...
to activate this factory for a certain service definition. Of course,
the instantiated hello_world_factory
must also be passed to
Netplex_main.startup
in order to be available at runtime.
The create_processor
method simply creates an object of your class. The
argument ctrl_cfg
is the configuration of the controller (e.g.
you find there the name of the socket directory). In cfg_file
the object is passed that accesses the configuration file as
tree of parameters. In cfg_addr
the address of the processor
section is made available, so you can look for additional
configuration parameters.
You may wonder why it is necessary to first create empty_hooks
.
The hook methods are often overridden by the user of processor
classes. In order to simplify this, it is common to allow the
user to pass a hook object to the processor object:
class hello_world_processor hooks : processor =
object(self)
inherit Netplex_kit.processor_base hooks
method process ~when_done container fd proto_name = ...
method supported_ptypes = ...
end
Now, the user can simply define hooks as in
class my_hooks =
object(self)
inherit Netplex_kit.empty_processor_hooks()
method post_start_hook container = ...
end
and pass such a hook object into the factory.
Workload managers decide when to start new containers and when to stop
useless ones. The simplest manager is created by the
Netplex_workload.constant_workload_manager_factory
. The user
simply defines how many containers are to be started. In the
config file this is written as
workload_manager {
type = "constant";
threads = <n>;
}
where <n>
is the number of containers > 0. Often this manager is used
to achieve n=1, i.e. to have exactly one container. An example would
be a stateful RPC server where it is important that all network connections
are handled by the same process. (N.B. n=1 for RPC servers does not
enforce that the connections are serialized because Ocamlnet RPC servers
can handle multiple connections in parallel, but of course it is enforced
that the remote procedures are invoked in a strictly sequential way.)
If n>1, it is tried to achieve that all containers get approximately the same load.
If processes die unexpectedly, the constant workload manager starts new components until the configured number of processes is again reached.
The dynamic workload manager (created by
Netplex_workload.dynamic_workload_manager_factory
) is able to start
and stop containers dynamically. There are a few parameters that control
the manager. A "thread" is here another word for a started container.
A "job" is an established network connection. Using this terms, the
task of the workload manager is to decide how many threads are needed
to do a varying number of jobs. The parameters now set how many jobs
every thread may execute, and how quickly new threads are created or
destroyed to adapt the available thread capacity to the current job
load.
If the service processor can only accept one network connection after the other (like Nethttpd_plex), the only reasonable setting is that there is at most one job per thread. If one configures a higher number in this case, unaccepted network connections will queue up resulting in poor performance.
If the service processor can handle several connections in parallel it is possible to allow more than one job per thread. There is no general rule how many jobs per thread are reasonable, one has to experiment to find it out. In this mode of having more than one job per thread, Netplex even allows two service qualities, "normal" and "overload". If possible, Netplex tries to achieve that all containers deliver normal quality, but if the load goes beyond that, it is allowed that containers accept more connections than that. This is called an overload situation. Often it is better to allow overload than to refuse new connections.
The dynamic workload manager is enabled by the section
workload_manager {
type = "dynamic";
... parameters, see below ...
}
The required parameters are:
max_threads
: How many containers can be created at maximum for
this service.max_jobs_per_thread
: How many jobs every container can execute
at maximum. The upper limit for the number of jobs is thus
max_threads * max_jobs_per_thread
.min_free_job_capacity
: This parameter controls how quickly new
containers are started when the load goes up. It is tried to
ensure that there are as many containers so this number of jobs
can be additionally performed. This parameter must
be at least 1.max_free_job_capacity
: This parameter controls how quickly
containers are stopped when the load goes down. It is tried to
ensure that unused containers are stopped so the capacity for
additional jobs is not higher than this parameter.
This parameter must be greater or equal than
min_free_job_capacity
.
recommended_jobs_per_thread
: The number of jobs a container
can do with normal service quality. A higher number is considered
as overload.Another parameter is:
inactivity_timeout
: If a container idles longer than this number
of seconds and is not needed to ensure min_free_job_capacity
it is
shut down. Defaults to 15 seconds.
There are two kinds of messages one can send to Netplex containers:
normal messages come from another Netplex container, and admin messages
are sent using the netplex-admin
command.
Messages have a name and a (possibly empty) list of string parameters. They can be sent to an individual receiver container, or to a number of containers, even to all. The sender does not get an acknowledgment when the messages are delivered.
Messages can e.g. be used
In order to receive a normal message, one must define the
receive_message
method in the processor object, and to receive an
admin message, one must define the receive_admin_message
method.
A normal message is sent by the container method send_message
.
The receiver is identified by the service name, i.e. all containers
with the passed name get the message. The name may even contain
the wildcard *
to select the containers by a name pattern.
An admin message is sent using the netplex-admin
command.
There are a few predefined messages understood by all containers. See The netplex-admin command for a list.
In general, messages starting with "netplex." are reserved for Netplex itself.
Log messages can be written in the containers. The messages are first
sent to the controller where they are written to stderr, to files, or
to any object of the type Netplex_types.logger
. That the messages
are first sent to the controller has a lot of advantages: The messages
are implicitly serialized, no locking is needed, and it is easy to
support log file rotation.
In order to write a log message, one needs the container object.
The module Netplex_cenv
always knows the container object of the
caller, to get it:
let cont = Netplex_cenv.self_cont()
If you call self_conf
outside a container, the exception
Netplex_cenv.Not_in_container_thread
is raised. This is e.g. the
case if you call it from the pre_start
or post_finish
callbacks.
Logging is now done by
let cont = Netplex_cenv.self_cont() in
cont # log level message
where level
is one of `Debug
, `Info
, `Notice
, `Warning
, `Err
,
`Crit
, `Alert
, `Emerg
, and message
is a string. The levels are
the same as for syslog.
You can also call Netplex_cenv.log
and Netplex_cenv.logf
,
which simply use self_cont
to get the container and call its log
method to write the message.
The config file controls what to do with the log messages. The easiest way is to send all messages to stderr:
controller {
max_level = "debug"; (* Log level *)
logging {
type = "stderr"; (* Log to stderr *)
}
};
Further types of logging are documented in Configuration: The controller section.
There are various built-in debug logging streams:
Netplex_container.Debug.enable
: Logs the perspective of the
container. Logged events are e.g. when connections are accepted,
and when user-defined hook functions are invoked. These messages
are quite interesting for debugging user programs.Netplex_controller.Debug.enable
: Logs the perspective of the
controller. The events are e.g. state changes, when containers
are started, and scheduling decisions. This is less interesting
to users, but might nevertheless worth activating it.Netplex_workload.Debug.enable
: Outputs messages from the
workload manager.Netlog.Debug
is used - which has
the advantage that messages can also be generated when no Netplex logger
is available.
As already noted above, it is possible to use Netplex with two concurrency models, namely multi-threading and multi-processing. In the first case, for every container a new thread is started, and in the second case, a new process.
You should know that multi-threading in OCaml has some limitations. In particular, it is not possible to exploit more than one CPU core of the system (although the threads are real kernel threads). Because of this, multi-processing is also supported by Netplex. It does not have this deficiency, because every process executes an independent OCaml runtime.
Nevertheless, it can be quite useful to just start helper threads, even if the main concurrency model is multi-processing. The question is how to do, and where the traps are.
First, one warning ahead: Once you have started threads, it is no longer safe to spawn new processes. This is a general difficulty of the POSIX system API, and cannot be worked around in OCaml programs (to some extent it is possible in C). The immediate consequence is that you must not start threads in the controller process (assumed you have multi-processing). The controller process continuously forks new container processes, and having threads could be deadly for it. For the API exposed by Netplex this means that doing such is not supported at all - all controller-specific APIs assume that they are only called from the same thread.
Ok, so you just don't this. However, is it allowed to have additional
threads in containers? This is in deed not problematic. Just be sure that
you start these threads after the container process has been created, e.g.
from post_start_hook
. In order to make life simpler, a number of APIs
have now been made thread-safe (since Ocamlnet-3.5):
Netplex_types.container
Netplex_cenv
Netplex_semaphore
, Netplex_mutex
,
and Netplex_sharedvar
Note that all extra threads are implicitly killed when the
container process finishes. If you don't like this and want to
terminate them in a more controlled manner, you can do this from the
pre_finish_hook
. If your main concurrency model is multi-threading,
though, no such killing will occur - your helper threads will simply
continue running when the container is finished. (Unfortunately, this
difference between the two models is unavoidable.)
A short description how to build systems of RPC services is given
in Netplex RPC systems.