Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_job_config


module Mapred_job_config: sig .. end
Extract job configuration, and marshalling

type m_job_config 
val extract_job_config : Netplex_types.config_file ->
(string * string) list ->
string list -> Mapred_def.mapred_job_config * m_job_config
let (jc, mjc) = extract_job_config cf args custom_params:

Extracts the job configuration from cf. The association list args may contain overrides (leftmost value is taken).

Returns the configuration as object jc, and in a marshallable representation mjc.

val mapred_job_config : m_job_config -> Mapred_def.mapred_job_config
Returns the config as object
val marshal : m_job_config -> string
val unmarshal : string -> m_job_config
Marshal and unmarshal

The config file must look like (it can also contain unrelated entries):

       netplex {
         ...
         mapredjob {
           <name> = <value>;
           ...
         }
       }
    

The possible names are the method names of Mapred_def.mapred_job_config. The values should have the right type.

Example:

       netplex {
         mapredjob {
            name = "my_job";
            input_dir = "/input";
            output_dir = "/output";
            work_dir = "/work";
            log_dir = "/log";
            bigblock_size = 65536;
            map_tasks = 100;
            merge_limit = 4;
            split_limit = 4;
            partitions = 20;
         }
       }
    

Some settings have default values:

  • name is set to an automatically generated name
  • bigblock_size is 16M
  • map_tasks is 0 (meaning a good value is computed at runtime)
  • merge_limit and split_limit are 4

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml