class type mapred_job_config = object
.. end
method name : string
A name for identifying this job, e.g. in log files
method job_id : string
An automatically generated name. This can be considered as
unique. It is not possible to override this name.
method input_dir : string
This plasma directory contains the input files
method output_dir : string
This plasma directory will get output files. It should exist,
and it should be empty
method work_dir : string
This plasma directory is used for temporary files. It should
exist, and it should be empty
method log_dir : string
This plasma directory is used for log files. It should exist,
and it should be empty
method task_files : string list
These files are copied at job start to the "local" directory
for this job on all task nodes. This should be regular files
only.
method bigblock_size : int
Map/reduce processes files in units of bigblocks. The size of
bigblocks can be chosen as multiples of the size of filesystem
blocks. The value
bigblock_size
is in bytes. The maximum
size of records (line length) is also
bigblock_size
. Reasonable
values are in the multi-megabyte range, e.g. 16M.
This value must not exceed mr_buffer_size
or mr_buffer_size_tight
.
method map_tasks : int
The number of map tasks that should not be exceeded. It is tried
to hit this number, but it may be possible that not enough map
tasks can be generated.
method merge_limit : int
How many files are merged at most by a shuffle task
method split_limit : int
How many files are created by a shuffle task. This is not a
strict limit - actually the scheduler plans with slightly
larger split limits near the end of the job.
method partitions : int
The number of partitions = number of reduce tasks
method custom : string -> string
Get a custom parameter or raise Not_found