module Seq:`sig`

..`end`

A sequence is a view on a store. It is possible to iterate over the elements of the sequence in normal order. One can also add new elements to the end of a sequence, and get a new sequence where the new elements are found at the end.

A sequence is always bound to a store. The store is the entity which is modified by sequence operations. The sequence as such behaves as if it were immutable (like a list).

All the following operations are only executed locally (no
distribution).

`type ``('a, 'b)`

seq

A sequence

`(el,mode) seq`

contains elements of type `el`

,
and can be accessed according to `mode`

:`mode=[`R]`

: the sequence is read-only`mode=[`W]`

: the sequence is write-only`mode=[`R|`W]`

: the sequence is read-write

`exception End_of_sequence`

`val read : ``'a Mapred_toolkit.Store.store -> ('a, [ `R ]) seq`

Returns the sequence view such that the first element of the sequence
is the first element of the store, i.e. reads the store from the
beginning.

This is not allowed for write-only stores!

`val extend : ``'a Mapred_toolkit.Store.store -> ('a, [ `W ]) seq`

Returns the sequence view on the end of the store. The returned seq
is always empty. By adding new elements to the sequence, new elements
are added to the store, though.

`val reread : ``('a, 'b) seq -> ('a, [ `R ]) seq`

reads the elements from the store obtained by calling

`Store.read`

`val store : ``('a, 'b) seq -> 'a Mapred_toolkit.Store.store`

The underlying store of the sequence

`val notebook : ``unit -> ('a, [ `R | `W ]) seq`

creates an in-memory sequence (not linked to a file)

`val is_empty : ``('a, [> `R ]) seq -> bool`

tests whether the sequence is empty

`val hd : ``('a, [> `R ]) seq -> 'a`

returns the head element of the sequence

`val tl : ``('a, [> `R ] as 'b) seq -> ('a, 'b) seq`

returns the tail of the sequence

hd and tl raise

`End_of_sequence`

if applied to an empty sequence`val add : ``'a ->`

('a, [> `W ] as 'b) seq -> ('a, 'b) seq

`let s2 = add x s1`

: Adds the element to the underlying store,
and returns a new sequence where the new element is visible at
the end. This means the last element of `s2`

is now `x`

. The
view provided by `s1`

does not change.
It is a requirement that the last element of `s1`

is the last
element of the underlying store. Otherwise the add fails.

`val addl : ``'a list ->`

('a, [> `W ] as 'b) seq -> ('a, 'b) seq

adds several elements

`val append : ``('a, [> `W ] as 'b) seq ->`

('a, [> `R ]) seq -> ('a, 'b) seq

`let s3 = append s1 s2`

: Adds all elements visible via `s2`

to the
end of the store of `s1`

, and returns a new seq `s3`

where the
new elements are visible. The views provided by `s1`

and `s2`

do
not change.`val flush : ``('a, 'b) seq -> unit`

Flushes the underlying store

`val length : ``('a, 'b) seq -> int`

`val length64 : ``('a, 'b) seq -> int64`

`val hdl : ``int -> ('a, [> `R ]) seq -> 'a list`

returns the first up to n elements as list

`val iter : ``('a -> unit) -> ('a, [> `R ]) seq -> unit`

`iter f s`

: iterates over the elements of `s`

and calls `f x`

for
every element `x`

`val fold : ``('a -> 'b -> 'a) -> 'a -> ('b, [> `R ]) seq -> 'a`

`fold f a0 s`

: if the sequence contains the elements
`[x0; x1; ...; x(n-1)]`

this function computes
```
f (f (f a0 x0) x1 ... ) x(n-1)
```

`val map : ``('a -> 'b) ->`

('a, [> `R ]) seq ->

('b, [> `W ] as 'c) seq -> ('b, 'c) seq

`let s3 = map f s1 s2`

: Maps the elements of `s1`

and adds them
to `s2`

`val mapl : ``('a -> 'b list) ->`

('a, [> `R ]) seq ->

('b, [> `W ] as 'c) seq -> ('b, 'c) seq

`let s3 = map f s1 s2`

: Maps the elements of `s1`

to lists and adds them
to `s2`

`val sort : ``hash:('a -> int) ->`

cmp:('a -> 'a -> int) ->

('a, [> `R ]) seq ->

('a, [> `W ] as 'b) seq -> ('a, 'b) seq

`sort ~hash ~cmp s1 s2`

: Sorts the elements of `s1`

and appends
the sorted result to `s2`

, and returns this sequence.
It is first sorted by the value `hash x`

of each element of `s1`

.
If this results in the same value for two elements `x`

and `y`

,
these elements are directly compared with `cmp x y`

.

The function `hash`

may return a non-negative 30 bit wide integer.
`cmp`

can return 0 meaning equality, a positive number meaning
that that the first argument is bigger, or a negative number
meaning that the second argument is bigger.

It is required that `cmp x y = 0`

implies `hash x = hash y`

.

Currently, only in-memory sorting is implemented.

`val mapl_sort_fold : ``mapl:('a -> 'b list) ->`

hash:('b -> int) ->

cmp:('b -> 'b -> int) ->

initfold:(int -> 'c) ->

fold:('c -> 'b -> 'c * 'd list) ->

?finfold:('c -> 'd list) ->

partition_of:('b -> int) ->

partitions:int ->

'a Mapred_toolkit.Place.t ->

'd Mapred_toolkit.Place.t -> ('d, [ `W ]) seq list

`mapl_sort_fold ... p_in p_out`

:
This is a locally running version of `Mapred_toolkit.DSeq.mapl_sort_fold`

,
useful for testing map/reduce programs on small amounts of
data. The following description is formal; for an easier
to understand version look at the tutorial `Plasmamr_toolkit`

.

The function returns a list of sequences

` [s0; s1; ...; s(P-1)] `

where `P=partitions`

. Each `s(k)`

is the result of applying
a fold pass to `p(k)`

which is defined below. The list

` [p0; p1; ...; p(P-1)] `

are the sequences containing the sorted and mapped input,
split into P partitions. In particular, the partition
`p(k)`

contains the sorted elements `x`

where
`partition_of x = k`

. The elements `x`

are taken from
the set `M = { mapl(y) | y in p_in} `

.

The sorting criterion is defined by `hash`

and `cmp`

. In order
to decide the order of the elements `x`

of `p(k)`

, the elements
are first compared by `hash x`

. If for two elements `x1`

and
`x2`

the value `hash x1`

is smaller than `hash x2`

, `x1`

is
sorted before `x2`

. If `hash x1`

is bigger than `hash x2`

,
`x1`

is sorted after `x2`

. If the hash values are identical,
the function `cmp x1 x2`

is called, and if a negative value
is returned, `x1`

is sorted before `x2`

, and if a positive
value is returned, `x1`

is sorted after `x2`

. The result 0
means that `x1`

and `x2`

are considered equal.

`hash`

must return non-negative 30 bit integers.
It is required that `cmp x y = 0`

implies `hash x = hash y`

.

The fold pass is defined as follows: The sequence `s(k)`

is computed from `p(k)`

by the following scheme. Assumed
that

` s(k) = [x0; x1; ...; x(n-1)] `

we define

` p(k) = out0 @ out1 @ ... @ out(n) `

where the `out(i)`

part lists are defined by:

```
let a0 = initfold k
let (a1, out0) = fold a0 x0
let (a2, out1) = fold a1 x1
...
let (a(n), out(n-1)) = fold a(n-1) x(n-1)
let out(n) = finfold a(n)
```

If `finfold`

is not passed as argument, it is assumed to
be `finfold = fun _ -> []`

.

As a side effect, for each output sequence `s(k)`

a new
file is created in `pl_out`

.