class type mapred_job =object..end
method custom_params : string listmethod check_config : mapred_env -> mapred_job_config -> unitmethod pre_job_start : mapred_env -> mapred_job_config -> unitmethod post_job_finish : mapred_env -> mapred_job_config -> unitmethod input_record_io : mapred_env ->
mapred_job_config -> Mapred_io.record_rw_factorymethod output_record_io : mapred_env ->
mapred_job_config -> Mapred_io.record_rw_factorymethod internal_record_io : mapred_env ->
mapred_job_config -> Mapred_io.record_rw_factorymethod map : mapred_env ->
mapred_job_config ->
task_info ->
Mapred_io.record_reader -> Mapred_io.record_writer -> unitmethod sorter : mapred_env ->
mapred_job_config -> float -> sorterMapred_sorters. The float is the factor for the sort buffer,
and it should be between 0.0 and 1.0.method extract_key : mapred_env -> mapred_job_config -> string -> int * int(index,len). Here, index is the byte in the record
where the key starts, and len is the length of the key in bytes.
This method is always called by
first evaluating let f = job#extract_key me jc, and then
calling f line for each input line. Because of this, it is
possible to factor initializations out as in
method extract_key me jc =
...; (* init stuff *)
(fun line -> ... (* real extraction *) )
Before Plasma-0.6, extract_key returned the key directly as
string.
method partition_of_key : mapred_env ->
mapred_job_config -> string -> int -> int -> intpartition_of_key me jc s p l:
Determines the partition of a key (which is supposed to occupy the
range p to p+l-1 of s). Can be something simple like
(Hashtbl.hash key) mod partitions, or something more
elaborated. This method is always called by
first evaluating let f = job#partition_of_key me jc, and then
calling f s p l for each input line. Because of this, it is
possible to factor initializations out as in
method partition_of_key me jc =
...; (* init stuff *)
(fun s p l -> ... (* real extraction *) )
method reduce : mapred_env ->
mapred_job_config ->
task_info ->
Mapred_io.record_reader -> Mapred_io.record_writer -> unitmethod combine : mapred_env ->
mapred_job_config ->
task_info ->
(Mapred_io.record_reader -> Mapred_io.record_writer -> unit) optionNote that Plasma allows it that the combiner gets data from several partitions!
If no combiner is needed, just define this method as
method combine _ _ _ = None
In this case, the internal shuffles just copy the input to the
output.