module Mapred_sched:sig
..end
type
plan_config
val configure_plan : ?keep_temp_files:bool ->
planning_capacity:float ->
internal_suffix:string ->
output_suffix:string ->
Mapred_def.mapred_job_config ->
Mapred_config.mapred_config -> plan_config
configure_plan jc conf
Parameters:
keep_temp_files
: if true, temporary files created during the
map/reduce execution are not immediately deletedplanning_capacity
: how many cores are totally available. This
parameter can be retrieved at runtime via
Mapred_job_exec.planning_capacity
.internal_suffix
: This is the filename suffix added to names
of intermediate filesoutput_suffix
: This is the filename suffix added to names
of final filestype
plan
val create_plan : ?dn_identities:string list ->
Mapred_fs.filesystem -> plan_config -> plan
dn_identities
: This is an optional list of datanode identities.
These are used in some circumstances as preferences for data
blocks (currently only for the output of emap jobs)val bigblock_size : plan -> int
configure_plan
(via jc) rounded up to the next multiple of blocks.val add_inputs : plan -> unit
val add_map_output : plan ->
int ->
(Mapred_tasks.file_tag * Mapred_tasks.file) list -> Unix.inet_addr -> unit
The IP addr points to the machine that executed the map or emap task
(which is also the likely storage for the files)
val plan_complete : plan -> bool
val complete_inputs : plan -> unit
val executable_tasks : plan -> Mapred_tasks.task list
val hosts : plan -> (string * Unix.inet_addr) list
val mark_as_finished : plan -> Mapred_tasks.task -> unit
val mark_as_started : plan ->
Mapred_tasks.task -> Unix.inet_addr -> int -> bool -> unit
val remove_marks : plan -> Mapred_tasks.task -> unit
mark_as_started
or mark_as_finished
val task_depends_on_list : plan -> Mapred_tasks.task -> Mapred_tasks.task list
val plan_finished : plan -> bool
val n_running : plan -> int
val n_finished : plan -> int
val n_total : plan -> int
val avg_running : plan -> float
val avg_runnable : plan -> float
val avg_runqueue : plan -> float
val round_points : plan -> Mapred_tasks.task -> float
val greediness_points : plan -> Mapred_tasks.task -> float
If the (imagined) completion of the task t
enabled that other
tasks could be run, t
gets as many points as tasks would be newly
runnable. The idea here is to prefer tasks that make the most other
tasks runnable ("greediness").
If t
does not yet make a task u
runnable, but just fulfills
one more precondition among others, t
does not get a whole point for
u
but just the fraction r/n
where r
are the fulfilled preconditions
of u
, and n
are all preconditions of u
.
val print_plan : Netchannels.out_obj_channel -> plan -> unit
val generate_svg : plan -> string
val task_stats : plan -> Mapred_tasks.task -> int * int
Not_found
if the task has never been started