Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_def

module Mapred_def: sig .. end
Defining the map/reduce job

class type mapred_env = object .. end
type designation = [ `Deep_dir | `File | `Flat_dir ] 
How to determine the files of input_dir:

  • `File: Interpret input_dir as a single file (not as directory)
  • `Flat_dir: Take the files in input_dir, except those that start with a dot or an underscore
  • `Deep_dir: Take the files in input_dir and all inner directories. Again files are ignore when they start with a dot or underscore. Directories starting with dot or underscore are completely ignored.
In the future, there will be more types of designation, such as regular expression-based ones.
type phases = [ `Map | `Map_sort | `Map_sort_reduce ] 
Which phases are enabled:

  • `Map: Only the map phase is executed. The output directory will contain files mapped_#_# where "#" is a number.
  • `Map_sort: The mapped files are sorted. This generates files sorted_# where "#" is a number.
  • `Map_sort_reduce: The sorted files are shuffled and reduced. This generates files partition_# where "#" is a number. This is the default.

class type mapred_job_config = object .. end
class type sorter = object .. end
Sorters are filled with records, and when the sorter is full, the records are sorted by key, and written to a destination.
class type task_info = object .. end
class type mapred_job = object .. end
val get_rc : mapred_env -> int -> Mapred_io.record_config
get_rc me bigblock_size: Returns a record config for the given environment and the suggested size of the bigblock
val get_job_local_dir : mapred_env -> mapred_job_config -> string
Returns the directory in the local filesystem where the files configured in task_files can be found. The task implementations can use this directory also for other purposes, e.g. temporary files. The directory exists for the lifetime of the job. Note that this directory is only created when needed. Same as Mapred_taskfiles.taskfile_manager.local_directory.
val get_job_log_dir : mapred_env -> mapred_job_config -> string
Returns the directory in the local filesystem where log files can be placed. These files are automatically moved to the log_dir in PlasmaFS when the job is finished. Note that this directory is only created when needed. Same as Mapred_taskfiles.taskfile_manager.log_directory.
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml