module Mapred_tasks: sig .. end
Representation of tasks
type file_fragment = string * int64 * int64
Filename, start block pos, length in blocks. If the length is 0,
this means "till end of file"
type file = file_fragment list
type file_tag = [ `Tag of string ]
Tags may be attached to files. If not used, `Tag "" is the value
type locality = [ `Any | `Dn_identity of string ]
For some outputs, it is possible to request where the file is
placed.
`Dn_identity s: The file blocks are allocated on the datanode
with identity s, if possible
`Any: no such request is done, any place is equally good
type map_task = {
|
map_input : file; |
|
map_output_prefix : string; |
|
map_output_suffix : string; |
|
map_id : int; |
|
map_best_hosts : Unix.inet_addr list; |
}
type sort_task = {
|
sort_input : file; |
|
sort_input_del : bool; |
|
sort_output : string; |
|
sort_output_locality : locality; |
|
sort_id : int; |
}
type shuffle_task = {
|
shuffle_input : (file * int * int) list; |
|
shuffle_input_del : bool; |
|
shuffle_output : (string * int * int * locality) list; |
|
shuffle_partitions : int * int; |
|
shuffle_coverage : int * int; |
|
shuffle_reduce : bool; |
|
shuffle_round : int; |
|
shuffle_avg_part_width : float; |
}
Shuffling means to rearrange the output of the map/sort tasks
into a set of partitions. The map tasks are numbered 0 to n-1 (via
map_id) and for each task there is a file that serves here as input.
The partitions are numbered 0 to m-1, and finally there is
for each partition exactly one output file.
type emap_task = {
}
Enhanced map tasks do not only map, but also prepartition the result
and sort it
type task = [ `Cleanup
| `Emap of emap_task
| `Map of map_task
| `Shuffle of shuffle_task
| `Sort of sort_task ]
type task_ok = {
}
type task_result = [ `Corrupt_input of file list
| `Error of string
| `Ok of task_ok
| `Retry_later ]
-
`Ok ok: Task is successful.
ok.ok_files is the list of created
files (or file fragments), and
ok.ok_counters returns
statistics
`Retry_later: Task cannot be started because of lack of resources.
No files have been created (or these are already deleted). The task
should be again tried when it is suspected that resources have been
freed.
`Corrupt_input files: Inputs files of the task were not readable.
The files should be created again and the task retried. No files
have been created (or these are already deleted).
`Error msg: Fatal error msg. This is an internal error or
an error from user code
module Ord: sig .. end
val encode_task : task -> string
val decode_task : string -> task
val encode_task_result : task_result * string list -> string
val decode_task_result : string -> task_result * string list
val lock_name_of_task_id : task -> string
The name used for forming the name of the lock files
val string_of_task_id : task -> string
val print_task_id : Netchannels.out_obj_channel -> task -> int -> unit
val print_task : Netchannels.out_obj_channel -> task -> int -> unit
val print_file : ?tag:string ->
Netchannels.out_obj_channel -> file -> int -> unit