Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_job_config


module Mapred_job_config: sig .. end
Extract job configuration, and marshalling

type m_job_config 
val extract_job_config : Netplex_types.config_file ->
(string * string) list ->
string list -> Mapred_def.mapred_job_config * m_job_config
let (jc, mjc) = extract_job_config cf args custom_params:

Extracts the job configuration from cf. The association list args may contain overrides (leftmost value is taken).

Returns the configuration as object jc, and in a marshallable representation mjc.

val mapred_job_config : m_job_config -> Mapred_def.mapred_job_config
Returns the config as object
val marshal : m_job_config -> string
val unmarshal : string -> m_job_config
Marshal and unmarshal
val update_job_config : ?name:string ->
?input_dir:string ->
?input_dir_designation:Mapred_def.designation ->
?output_dir:string ->
?work_dir:string ->
?log_dir:string ->
?task_files:string list ->
?bigblock_size:int ->
?map_tasks:int ->
?merge_limit:int ->
?split_limit:int ->
?partitions:int ->
?enhanced_mapping:int ->
?phases:Mapred_def.phases ->
?custom:(string * string) list ->
m_job_config -> m_job_config
Updates the values that are actually passed as arguments
val test_job_config : unit -> m_job_config
Returns a test object

The config file must look like (it can also contain unrelated entries):

       netplex {
         ...
         mapredjob {
           <name> = <value>;
           ...
         }
       }
    

The possible names are the method names of Mapred_def.mapred_job_config. The values should have the right type.

Example:

       netplex {
         mapredjob {
            name = "my_job";
            input_dir = "/input";
            input_dir_designation = "deep_dir";
            output_dir = "/output";
            work_dir = "/work";
            log_dir = "/log";
            bigblock_size = 65536;
            map_tasks = 100;
            merge_limit = 4;
            split_limit = 4;
            partitions = 20;
            phases = "map_sort_reduce";
         }
       }
    

All settings have default values:

  • name is set to an automatically generated name
  • bigblock_size is 16M
  • map_tasks is 0 (meaning a good value is computed at runtime)
  • merge_limit and split_limit are 4
  • partitions is 1
  • phases is `Map_sort_reduce
  • The directories are initialized to names that are guaranteed not to exist

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml