Plasma GitLab Archive
Projects Blog Knowledge

Class type Mapred_def.sorter

class type sorter = object .. end
Sorters are filled with records, and when the sorter is full, the records are sorted by key, and written to a destination.

The amount of data that can be put into a sorter is limited by max_sort_size. It is the task of the caller to stick to this limit.

Informational methods

method name : string
The name of the sorter, for debugging
method eff_sort_limit : int
The sort limit after applying the factor, in bytes
method sort_lines_limit : int -> int
Maximum number of lines for a given sort limit in bytes


method set_key_extraction : (string -> int * int) -> unit
Sets the function for extracting the key. This is normally set to what extract_key returns, see below.

Controlling methods

method put_records : string Queue.t -> unit
Hands over the records in the queue to the sorter. The queue is empty when the method returns.
method sort : Mapred_io.record_writer -> unit
Sorts the records, and writes them to the passed writer. After sort, the sorter is cleared, and can be used for new data.
method sort_time : float
Returns the duration of the last sort (w/o writing) in seconds
method close : unit -> unit
Deallocates resources

Exposition of the ordering

method hash : string -> int -> int -> int
hash record key_index key_len: Takes key = String.sub record index key_len as key, and returns the hash value. (If the sorter does not work with hashes, the method must return 0 instead.)

Hash values must be between 0 and 2^30-1 (i.e. 30 bits max).

method cmp : string -> int -> int -> string -> int -> int -> int
cmp s1 i1 l1 s2 i2 l2: Compare two keys as the sorter does. The first key is passed in by the string s1 and the start index i1 and the length l1. The other key is made available via s2, i2, and l2.

cmp is only called if cmphash returns 0.

It is essential that cmp does not store the passed strings s1 and s2 anywhere.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml