Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_def


module Mapred_def: sig .. end
Defining the map/reduce job

class type mapred_env = object .. end
type designation = [ `Deep_dir | `File | `Flat_dir ] 
How to determine the files of input_dir:

  • `File: Interpret input_dir as a single file (not as directory)
  • `Flat_dir: Take the files in input_dir, except those that start with a dot or an underscore
  • `Deep_dir: Take the files in input_dir and all inner directories. Again files are ignore when they start with a dot or underscore. Directories starting with dot or underscore are completely ignored.
In the future, there will be more types of designation, such as regular expression-based ones.
type phases = [ `Map | `Map_sort | `Map_sort_reduce ] 
Which phases are enabled:

  • `Map: Only the map phase is executed. The output directory will contain files mapped_#_# where "#" is a number.
  • `Map_sort: The mapped files are sorted. This generates files sorted_# where "#" is a number.
  • `Map_sort_reduce: The sorted files are shuffled and reduced. This generates files partition_# where "#" is a number. This is the default.

class type mapred_job_config = object .. end
class type sorter = object .. end
Sorters are filled with records, and when the sorter is full, the records are sorted by key, and written to a destination.
class type task_info = object .. end
class type mapred_job = object .. end
val get_rc : mapred_env -> int -> Mapred_io.record_config
get_rc me bigblock_size: Returns a record config for the given environment and the suggested size of the bigblock
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml