Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_sched

module Mapred_sched: sig .. end

type plan_config 
val configure_plan : ?keep_temp_files:bool -> string list -> plan_config
configure_plan hostlist
type plan 
A plan contains:
  • tasks
  • dependencies between tasks
  • whether a task is done or not done

val create_plan : Plasma_client.plasma_cluster -> plan_config -> plan
Creates a plan. The plan is empty.
val add_inputs : plan -> Mapred_def.mapred_job -> unit
Add the input files, and create map tasks. This involves the analysis of the block layout of the input files (expensive operation).
val add_map_output : plan ->
Mapred_def.mapred_job ->
Mapred_tasks.map_task -> Mapred_tasks.file list -> unit
Add these files as output of this map task
val plan_complete : plan -> bool
Respond whether everything is planned
val complete_inputs : plan -> Mapred_def.mapred_job -> unit
Declare that no more inputs will be added. This triggers the rest of the graph construction.
val executable_tasks : plan -> Mapred_tasks.task list
Returns all tasks that are executable. This list can be quite long!
val hosts : plan -> (string * Unix.inet_addr) list
returns all configured task server hosts
val mark_as_finished : plan -> Mapred_tasks.task -> unit
Marks this task as finished
val mark_as_started : plan -> Mapred_tasks.task -> unit
Marks this task as started
val task_depends_on_list : plan -> Mapred_tasks.task -> Mapred_tasks.task list
Returns the list of tasks a given task depends on
val plan_finished : plan -> bool
Whether everything is done
val n_running : plan -> int
val n_finished : plan -> int
val n_total : plan -> int
Stats of tasks
val cluster : plan -> Plasma_client.plasma_cluster
Just return the cluster passed in
val print_plan : plan -> unit
Debug printing to stdout
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml