module Mapred_tasks: sig
.. end
Representation of tasks
type
file_fragment = string * int64 * int64
Filename, start block pos, length in blocks. If the length is 0,
this means "till end of file"
type
file = file_fragment list
type
file_tag = [ `Tag of string ]
Tags may be attached to files. If not used, `Tag ""
is the value
type
locality = [ `Any | `Dn_identity of string ]
For some outputs, it is possible to request where the file is
placed.
`Dn_identity s
: The file blocks are allocated on the datanode
with identity s
, if possible
`Any
: no such request is done, any place is equally good
type
map_task = {
|
map_input : file ; |
|
map_output_prefix : string ; |
|
map_output_suffix : string ; |
|
map_id : int ; |
|
map_best_hosts : Unix.inet_addr list ; |
}
type
sort_task = {
|
sort_input : file ; |
|
sort_input_del : bool ; |
|
sort_output : string ; |
|
sort_output_locality : locality ; |
|
sort_id : int ; |
}
type
shuffle_task = {
|
shuffle_input : (file * int * int) list ; |
|
shuffle_input_del : bool ; |
|
shuffle_output : (string * int * int * locality) list ; |
|
shuffle_partitions : int * int ; |
|
shuffle_coverage : int * int ; |
|
shuffle_reduce : bool ; |
|
shuffle_round : int ; |
|
shuffle_avg_part_width : float ; |
}
Shuffling means to rearrange the output of the map/sort tasks
into a set of partitions. The map tasks are numbered 0 to n-1
(via
map_id
) and for each task there is a file that serves here as input.
The partitions are numbered 0 to m-1
, and finally there is
for each partition exactly one output file.
type
emap_task = {
}
Enhanced map tasks do not only map, but also prepartition the result
and sort it
type
task = [ `Cleanup
| `Emap of emap_task
| `Map of map_task
| `Shuffle of shuffle_task
| `Sort of sort_task ]
type
task_ok = {
}
type
task_result = [ `Corrupt_input of file list
| `Error of string
| `Ok of task_ok
| `Retry_later ]
-
`Ok ok
: Task is successful.
ok.ok_files
is the list of created
files (or file fragments), and
ok.ok_counters
returns
statistics
`Retry_later
: Task cannot be started because of lack of resources.
No files have been created (or these are already deleted). The task
should be again tried when it is suspected that resources have been
freed.
`Corrupt_input files
: Inputs files
of the task were not readable.
The files should be created again and the task retried. No files
have been created (or these are already deleted).
`Error msg
: Fatal error msg
. This is an internal error or
an error from user code
module Ord: sig
.. end
val encode_task : task -> string
val decode_task : string -> task
val encode_task_result : task_result * string list -> string
val decode_task_result : string -> task_result * string list
val lock_name_of_task_id : task -> string
The name used for forming the name of the lock files
val string_of_task_id : task -> string
val print_task_id : Netchannels.out_obj_channel -> task -> int -> unit
val print_task : Netchannels.out_obj_channel -> task -> int -> unit
val print_file : ?tag:string ->
Netchannels.out_obj_channel -> file -> int -> unit