Module Mapred_def

module Mapred_def: sig .. end

Defining the map/reduce job

class type mapred_env = object .. end

type designation = [ `Deep_dir | `File | `Flat_dir ]

How to determine the files of input_dir:

`File: Interpret input_dir as a single file (not as directory)
`Flat_dir: Take the files in input_dir, except those that start with a dot or an underscore
`Deep_dir: Take the files in input_dir and all inner directories. Again files are ignore when they start with a dot or underscore. Directories starting with dot or underscore are completely ignored.

In the future, there will be more types of designation, such as regular expression-based ones.

type phases = [ `Map | `Map_sort | `Map_sort_reduce ]

Which phases are enabled:

`Map: Only the map phase is executed. The output directory will contain files mapped_#_# where "#" is a number.
`Map_sort: The mapped files are sorted. This generates files sorted_# where "#" is a number.
`Map_sort_reduce: The sorted files are shuffled and reduced. This generates files partition_# where "#" is a number. This is the default.

class type mapred_job_config = object .. end

class type sorter = object .. end

Sorters are filled with records, and when the sorter is full, the records are sorted by key, and written to a destination.

class type task_info = object .. end

class type mapred_job = object .. end

val get_rc : mapred_env -> int -> Mapred_io.record_config

get_rc me bigblock_size: Returns a record config for the given environment and the suggested size of the bigblock

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge