module Mapred_def: sig .. end
Defining the map/reduce job
class type mapred_env = object .. end
type designation = [ `Deep_dir | `File | `Flat_dir ]
How to determine the files of
input_dir:
`File: Interpret input_dir as a single file (not as directory)
`Flat_dir: Take the files in input_dir, except those that start
with a dot or an underscore
`Deep_dir: Take the files in input_dir and all inner directories.
Again files are ignore when they start with a dot or underscore.
Directories starting with dot or underscore are completely ignored.
In the future, there will be more types of designation, such as
regular expression-based ones.
type phases = [ `Map | `Map_sort | `Map_sort_reduce ]
Which phases are enabled:
`Map: Only the map phase is executed. The output directory will
contain files mapped_#_# where "#" is a number.
`Map_sort: The mapped files are sorted. This generates files
sorted_# where "#" is a number.
`Map_sort_reduce: The sorted files are shuffled and reduced.
This generates files partition_# where "#" is a number.
This is the default.
class type mapred_job_config = object .. end
class type sorter = object .. end
Sorters are filled with records, and when the sorter is full, the
records are sorted by key, and written to a destination.
class type task_info = object .. end
class type mapred_job = object .. end
val get_rc : mapred_env -> int -> Mapred_io.record_config
get_rc me bigblock_size: Returns a record config for the given
environment and the suggested size of the bigblock
val get_job_local_dir : mapred_env -> mapred_job_config -> string
Returns the directory in the local filesystem where the files
configured in
task_files can be found. The task implementations
can use this directory also for other purposes, e.g. temporary
files. The directory exists for the lifetime of the job.
Note that this directory is only created when needed.
Same as
Mapred_taskfiles.taskfile_manager.local_directory.
val get_job_log_dir : mapred_env -> mapred_job_config -> string
Returns the directory in the local filesystem where
log files can be placed. These files are automatically moved
to the
log_dir in PlasmaFS when the job is finished.
Note that this directory is only created when needed.
Same as
Mapred_taskfiles.taskfile_manager.log_directory.