module Seq:sig
..end
A sequence is always bound to a store. The store is the entity which is modified by sequence operations. The sequence as such behaves as if it were immutable (like a list).
All the following operations are only executed locally (no
distribution).
type ('a, 'b)
seq
(el,mode) seq
contains elements of type el
,
and can be accessed according to mode
:mode=[`R]
: the sequence is read-onlymode=[`W]
: the sequence is write-onlymode=[`R|`W]
: the sequence is read-writeexception End_of_sequence
val read : 'a Mapred_toolkit.Store.store -> ('a, [ `R ]) seq
This is not allowed for write-only stores!
val extend : 'a Mapred_toolkit.Store.store -> ('a, [ `W ]) seq
val reread : ('a, 'b) seq -> ('a, [ `R ]) seq
Store.read
val store : ('a, 'b) seq -> 'a Mapred_toolkit.Store.store
val notebook : unit -> ('a, [ `R | `W ]) seq
val is_empty : ('a, [> `R ]) seq -> bool
val hd : ('a, [> `R ]) seq -> 'a
val tl : ('a, [> `R ] as 'b) seq -> ('a, 'b) seq
End_of_sequence
if applied to an empty sequenceval add : 'a ->
('a, [> `W ] as 'b) seq -> ('a, 'b) seq
let s2 = add x s1
: Adds the element to the underlying store,
and returns a new sequence where the new element is visible at
the end. This means the last element of s2
is now x
. The
view provided by s1
does not change.
It is a requirement that the last element of s1
is the last
element of the underlying store. Otherwise the add fails.
val addl : 'a list ->
('a, [> `W ] as 'b) seq -> ('a, 'b) seq
val append : ('a, [> `W ] as 'b) seq ->
('a, [> `R ]) seq -> ('a, 'b) seq
let s3 = append s1 s2
: Adds all elements visible via s2
to the
end of the store of s1
, and returns a new seq s3
where the
new elements are visible. The views provided by s1
and s2
do
not change.val flush : ('a, 'b) seq -> unit
val length : ('a, 'b) seq -> int
val length64 : ('a, 'b) seq -> int64
val hdl : int -> ('a, [> `R ]) seq -> 'a list
val iter : ('a -> unit) -> ('a, [> `R ]) seq -> unit
iter f s
: iterates over the elements of s
and calls f x
for
every element x
val fold : ('a -> 'b -> 'a) -> 'a -> ('b, [> `R ]) seq -> 'a
fold f a0 s
: if the sequence contains the elements
[x0; x1; ...; x(n-1)]
this function computes
f (f (f a0 x0) x1 ... ) x(n-1)
val map : ('a -> 'b) ->
('a, [> `R ]) seq ->
('b, [> `W ] as 'c) seq -> ('b, 'c) seq
let s3 = map f s1 s2
: Maps the elements of s1
and adds them
to s2
val mapl : ('a -> 'b list) ->
('a, [> `R ]) seq ->
('b, [> `W ] as 'c) seq -> ('b, 'c) seq
let s3 = map f s1 s2
: Maps the elements of s1
to lists and adds them
to s2
val sort : hash:('a -> int) ->
cmp:('a -> 'a -> int) ->
('a, [> `R ]) seq ->
('a, [> `W ] as 'b) seq -> ('a, 'b) seq
sort ~hash ~cmp s1 s2
: Sorts the elements of s1
and appends
the sorted result to s2
, and returns this sequence.
It is first sorted by the value hash x
of each element of s1
.
If this results in the same value for two elements x
and y
,
these elements are directly compared with cmp x y
.
The function hash
may return a non-negative 30 bit wide integer.
cmp
can return 0 meaning equality, a positive number meaning
that that the first argument is bigger, or a negative number
meaning that the second argument is bigger.
It is required that cmp x y = 0
implies hash x = hash y
.
Currently, only in-memory sorting is implemented.
val mapl_sort_fold : mapl:('a -> 'b list) ->
hash:('b -> int) ->
cmp:('b -> 'b -> int) ->
initfold:(int -> 'c) ->
fold:('c -> 'b -> 'c * 'd list) ->
?finfold:('c -> 'd list) ->
partition_of:('b -> int) ->
partitions:int ->
'a Mapred_toolkit.Place.t ->
'd Mapred_toolkit.Place.t -> ('d, [ `W ]) seq list
mapl_sort_fold ... p_in p_out
:
This is a locally running version of Mapred_toolkit.DSeq.mapl_sort_fold
,
useful for testing map/reduce programs on small amounts of
data. The following description is formal; for an easier
to understand version look at the tutorial Plasmamr_toolkit
.
The function returns a list of sequences
[s0; s1; ...; s(P-1)]
where P=partitions
. Each s(k)
is the result of applying
a fold pass to p(k)
which is defined below. The list
[p0; p1; ...; p(P-1)]
are the sequences containing the sorted and mapped input,
split into P partitions. In particular, the partition
p(k)
contains the sorted elements x
where
partition_of x = k
. The elements x
are taken from
the set M = { mapl(y) | y in p_in}
.
The sorting criterion is defined by hash
and cmp
. In order
to decide the order of the elements x
of p(k)
, the elements
are first compared by hash x
. If for two elements x1
and
x2
the value hash x1
is smaller than hash x2
, x1
is
sorted before x2
. If hash x1
is bigger than hash x2
,
x1
is sorted after x2
. If the hash values are identical,
the function cmp x1 x2
is called, and if a negative value
is returned, x1
is sorted before x2
, and if a positive
value is returned, x1
is sorted after x2
. The result 0
means that x1
and x2
are considered equal.
hash
must return non-negative 30 bit integers.
It is required that cmp x y = 0
implies hash x = hash y
.
The fold pass is defined as follows: The sequence s(k)
is computed from p(k)
by the following scheme. Assumed
that
s(k) = [x0; x1; ...; x(n-1)]
we define
p(k) = out0 @ out1 @ ... @ out(n)
where the out(i)
part lists are defined by:
let a0 = initfold k
let (a1, out0) = fold a0 x0
let (a2, out1) = fold a1 x1
...
let (a(n), out(n-1)) = fold a(n-1) x(n-1)
let out(n) = finfold a(n)
If finfold
is not passed as argument, it is assumed to
be finfold = fun _ -> []
.
As a side effect, for each output sequence s(k)
a new
file is created in pl_out
.