Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_sched


module Mapred_sched: sig .. end
Scheduler

type plan_config 
val configure_plan : ?keep_temp_files:bool ->
Mapred_def.mapred_job_config ->
Mapred_config.mapred_config -> plan_config
configure_plan jc conf

Parameters:

  • keep_temp_files: if true, temporary files created during the map/reduce execution are not immediately deleted

type plan 
A plan contains:
  • tasks
  • dependencies between tasks
  • whether a task is done or not done

val create_plan : Plasma_client.plasma_cluster -> plan_config -> plan
Creates a plan. The plan is empty.
val bigblock_size : plan -> int
The effective size of bigblocks. This is the value passed to configure_plan (via jc) rounded up to the next multiple of blocks.
val add_inputs : plan -> unit
Add the input files, and create map tasks. This involves the analysis of the block layout of the input files (expensive operation).
val add_map_output : plan ->
int ->
(Mapred_tasks.file_tag * Mapred_tasks.file) list -> Unix.inet_addr -> unit
Add these files as output of this map/emap task identified by this ID

The IP addr points to the machine that executed the map or emap task (which is also the likely storage for the files)

val plan_complete : plan -> bool
Respond whether everything is planned
val complete_inputs : plan -> unit
Declare that no more inputs will be added. This triggers the rest of the graph construction.
val executable_tasks : plan -> Mapred_tasks.task list
Returns all tasks that are executable. This list can be quite long! The order of this list suggests an optimal order of execution (from a "static" point of view)
val hosts : plan -> (string * Unix.inet_addr) list
returns all configured task server hosts
val mark_as_finished : plan -> Mapred_tasks.task -> unit
Marks this task as finished
val mark_as_started : plan ->
Mapred_tasks.task -> Unix.inet_addr -> int -> bool -> unit
Marks this task as started, together with the node where it is running, and the runtime task ID and the misallocation flag
val remove_marks : plan -> Mapred_tasks.task -> unit
Revokes mark_as_started or mark_as_finished
val task_depends_on_list : plan -> Mapred_tasks.task -> Mapred_tasks.task list
Returns the list of tasks a given task depends on
val plan_finished : plan -> bool
Whether everything is done
val n_running : plan -> int
val n_finished : plan -> int
val n_total : plan -> int
Stats of tasks
val avg_running : plan -> float
Average number of running tasks
val cluster : plan -> Plasma_client.plasma_cluster
Just return the cluster passed in
val print_plan : plan -> unit
Debug printing to stdout
val generate_svg : plan -> string
Return the graph as SVG (once plan complete)
val task_stats : plan -> Mapred_tasks.task -> int * int
Returns:
  • number of input files
  • number of non-local input files
Raise Not_found if the task has never been started
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml