Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_sched


module Mapred_sched: sig .. end
Scheduler

type plan_config 
val configure_plan : ?keep_temp_files:bool ->
Mapred_def.mapred_job_config ->
Mapred_config.mapred_config -> plan_config
configure_plan jc conf

Parameters:

  • keep_temp_files: if true, temporary files created during the map/reduce execution are not immediately deleted

type plan 
A plan contains:
  • tasks
  • dependencies between tasks
  • whether a task is done or not done

val create_plan : Plasma_client.plasma_cluster -> plan_config -> plan
Creates a plan. The plan is empty.
val bigblock_size : plan -> int
The effective size of bigblocks. This is the value passed to configure_plan (via jc) rounded up to the next multiple of blocks.
val add_inputs : plan -> unit
Add the input files, and create map tasks. This involves the analysis of the block layout of the input files (expensive operation).
val add_map_output : plan -> Mapred_tasks.map_task -> Mapred_tasks.file list -> unit
Add these files as output of this map task
val plan_complete : plan -> bool
Respond whether everything is planned
val complete_inputs : plan -> unit
Declare that no more inputs will be added. This triggers the rest of the graph construction.
val executable_tasks : plan -> Mapred_tasks.task list
Returns all tasks that are executable. This list can be quite long! The order of this list suggests an optimal order of execution (from a "static" point of view)
val hosts : plan -> (string * Unix.inet_addr) list
returns all configured task server hosts
val mark_as_finished : plan -> Mapred_tasks.task -> unit
Marks this task as finished
val mark_as_started : plan -> Mapred_tasks.task -> unit
Marks this task as started
val remove_marks : plan -> Mapred_tasks.task -> unit
Revokes mark_as_started or mark_as_finished
val task_depends_on_list : plan -> Mapred_tasks.task -> Mapred_tasks.task list
Returns the list of tasks a given task depends on
val plan_finished : plan -> bool
Whether everything is done
val n_running : plan -> int
val n_finished : plan -> int
val n_total : plan -> int
Stats of tasks
val cluster : plan -> Plasma_client.plasma_cluster
Just return the cluster passed in
val print_plan : plan -> unit
Debug printing to stdout
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml