Plasma GitLab Archive
Projects Blog Knowledge

Class type Mapred_def.mapred_job


class type mapred_job = object .. end

method custom_params : string list
The list of allowed custom parameters
method check_config : mapred_env -> mapred_job_config -> unit
Check the config. If not ok, this method can raise exceptions to stop everything
method pre_job_start : mapred_env -> mapred_job_config -> unit
This is run by the job process before the first task is started
method post_job_finish : mapred_env -> mapred_job_config -> unit
This is run by the job process after the last task is finished
method map : mapred_env ->
mapred_job_config ->
task_info ->
Mapred_io.record_reader -> Mapred_io.record_writer -> unit
The mapper reads records, maps them, and writes them into a second file.
method extract_key : mapred_env -> mapred_job_config -> string -> string
Extracts the key from a record. This method is always called by first evaluating let f = job#extract_key me jc, and then calling f line for each input line. Because of this, it is possible to factor initializations out as in

	   method extract_key me jc =
              ...; (* init stuff *)
              (fun line -> ...  (* real extraction *) )

method partition_of_key : mapred_env -> mapred_job_config -> string -> int
Determines the partition of a key. Can be something simple like fun k -> (Hashtbl.hash k) mod partitions, or something more elaborated. This method is always called by first evaluating let f = job#partition_of_key me jc, and then calling f line for each input line. Because of this, it is possible to factor initializations out as in

	   method partition_of_key me jc =
              ...; (* init stuff *)
              (fun line -> ...  (* real extraction *) )

method reduce : mapred_env ->
mapred_job_config ->
task_info ->
Mapred_io.record_reader -> Mapred_io.record_writer -> unit
The reducer reads all the records of one partition, and puts them into an output file.
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml