module DSeq:sig..end
It is here required that the underlying stores are files!
It is not possible to distribute operations on notebooks.
type config
val create_config : ?name:string ->
?task_files:string list ->
?bigblock_size:int ->
?map_tasks:int ->
?merge_limit:int ->
?split_limit:int ->
?partitions:int ->
?enhanced_mapping:int ->
?phases:Mapred_def.phases ->
?report:bool ->
?report_to:Netchannels.out_obj_channel ->
?keep_temp_files:bool -> Mapred_def.mapred_env -> configmapred_env,
and are effectively taken from the .conf files. It is possible, though,
to override the values.val get_rc : config -> Mapred_io.record_config
Another way for getting the record config is Mapred_def.get_rc.
type 'a result
val get_result : 'a result -> 'aval stats : 'a result -> Mapred_stats.statsval job_id : config -> stringclass type mapred_info =object..end
val mapl : (mapred_info -> 'a -> 'b list) Mapred_rfun.rfun ->
'a Mapred_toolkit.Place.t ->
'b Mapred_toolkit.Place.t ->
config ->
('b, [ `W ]) Mapred_toolkit.Seq.seq list resultmapl m pl_in pl_out conf: Runs a map-only job. This means that
the records from pl_in are piped through the function m, and
the result is written into new files in pl_out.
The created files are also returned in the output sequences.
val mapl_sort_fold : mapl:(mapred_info -> 'a -> 'b list) Mapred_rfun.rfun ->
hash:(mapred_info -> 'b -> int) Mapred_rfun.rfun ->
cmp:(mapred_info -> 'b -> 'b -> int) Mapred_rfun.rfun ->
initfold:(mapred_info -> int -> 'c) Mapred_rfun.rfun ->
fold:(mapred_info -> 'c -> 'b -> 'c * 'd list)
Mapred_rfun.rfun ->
?finfold:(mapred_info -> 'c -> 'd list) Mapred_rfun.rfun ->
partition_of:(mapred_info -> 'b -> int) Mapred_rfun.rfun ->
?initcombine:(mapred_info -> 'e) Mapred_rfun.rfun ->
?combine:(mapred_info -> 'e -> 'b -> 'e * 'b list)
Mapred_rfun.rfun ->
?fincombine:(mapred_info -> 'e -> 'b list)
Mapred_rfun.rfun ->
'a Mapred_toolkit.Place.t ->
'd Mapred_toolkit.Place.t ->
config ->
'b Mapred_toolkit.Place.codec ->
('d, [ `W ]) Mapred_toolkit.Seq.seq list resultmapl_sort_fold <args> pl_in pl_out conf int_codec: This is
map/reduce. The records from pl_in are mapped/sorted/reduced
and finally written into new files in pl_out. There are a
number of named arguments defining the job:
mapl maps the elements of the inputshash returns the hash integer required for sorting (see below)cmp compares two mapped elementsinitfold initializes a reducer (the int argument is the
partition number)fold accu x processes the record x, and returns (accu',out)
where out is a list of records to outputfinfold is called at the end of a reducerpartition_of returns the partition number of a mapped recordinitcombine initializes a combinercombine accu x processes the record x in the combiner, and
returns (accu',out) where out is a list of records to output.
It is required that initcombine is also set if combine is
used.fincombine is called at the end of a combinerhash, and only if such integers are equal,
the two elements are compared in detail by calling cmp.
See Mapred_sorters for useful definitions of hash and
cmp.
The int_codec is used for representing intermediate files
(output of the map phase, and input/output of the shuffle phases).