module Mapred_def: sig .. end
Defining the map/reduce job
class type mapred_env = object .. end
type designation = [ `Deep_dir | `File | `Flat_dir ]
How to determine the files of
input_dir:
`File: Interpret input_dir as a single file (not as directory)
`Flat_dir: Take the files in input_dir, except those that start
with a dot or an underscore
`Deep_dir: Take the files in input_dir and all inner directories.
Again files are ignore when they start with a dot or underscore.
Directories starting with dot or underscore are completely ignored.
In the future, there will be more types of designation, such as
regular expression-based ones.
type phases = [ `Map | `Map_sort | `Map_sort_reduce ]
Which phases are enabled:
`Map: Only the map phase is executed. The output directory will
contain files mapped_#_# where "#" is a number.
`Map_sort: The mapped files are sorted. This generates files
sorted_# where "#" is a number.
`Map_sort_reduce: The sorted files are shuffled and reduced.
This generates files partition_# where "#" is a number.
This is the default.
class type mapred_job_config = object .. end
class type sorter = object .. end
Sorters are filled with records, and when the sorter is full, the
records are sorted by key, and written to a destination.
class type task_info = object .. end
class type mapred_job = object .. end
val get_rc : mapred_env -> int -> Mapred_io.record_config
get_rc me bigblock_size: Returns a record config for the given
environment and the suggested size of the bigblock