Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_fields

module Mapred_fields: sig .. end
Access to record fields


For line-structured files



Here, separator characters are used to mark where one field ends and the next begins. If, for instance, the record is "foo\tbar", the first field would be "foo" and the second "bar" (using TAB as separator).
val text_field_range : ?pos:int ->
?len:int -> ?sep:char -> ?field:int -> ?num:int -> string -> int * int
let (p,l) = text_field_range s: Extracts a field from s, and returns in p the byte position where the field begins, and in l the byte length. This means one can get the contents of the field with String.sub s p l.

By default, the first field is selected.

If several fields are extracted, p is the position where the first field starts, and p+l-1 is the position where the last field ends.

  • pos and len: These arguments may restrict the part of s where to extract fields
  • sep: The separator character, by default TAB
  • field: Which field to extract. This number counts from 1. Defaults to 1.
  • num: How many fields to extract. Defaults to 1.

val text_field : ?pos:int ->
?len:int -> ?sep:char -> ?field:int -> ?num:int -> string -> string
Like text_field_range, but returns the contents and not the range
val text_fields_range : ?pos:int ->
?len:int -> ?sep:char -> ?field:int -> ?num:int -> string -> (int * int) list
Like text_field_range, but returns the ranges (p,l) for each field separately
val text_fields : ?pos:int ->
?len:int -> ?sep:char -> ?field:int -> ?num:int -> string -> string list
Like text_field_range, but returns the contents of each field

For files with variable-size records



This is a binary encoding of fields. Each field is preceded by a length field. The length field can be a single byte (0-254), or the length field is the byte 255 followed by eight bytes encoding the length in big endian order.
val var_field_range : ?pos:int -> ?len:int -> ?field:int -> string -> int * int
let (p,l) = var_field_range s: Extracts a field from s, and returns in p the byte position where the field begins, and in l the byte length. This means one can get the contents of the field with String.sub s p l.

By default, the first field is selected.

  • pos and len: These arguments may restrict the part of s where to extract fields
  • field: Which field to extract. This number counts from 1. Defaults to 1.

val var_field : ?pos:int -> ?len:int -> ?field:int -> string -> string
Like var_field_range, but returns the contents and not the range
val var_fields_range : ?pos:int -> ?len:int -> ?field:int -> ?num:int -> string -> (int * int) list
Like var_field_range, but returns the ranges (p,l) for each field separately
val var_fields : ?pos:int -> ?len:int -> ?field:int -> ?num:int -> string -> string list
Like var_field_range, but returns the contents of each field
val var_concat : string list -> string
Returns a string with the encoded fields

Auxiliary functions


val escape : string -> string
val unescape : string -> string
Encode bytes that cannot be directly put into text fields with escape sequences
val encode_alt64 : string -> string
val decode_alt64 : string -> string
Encode/decode a binary string using the "ALT-64" method. This is essentially BASE-64 with a different alphabet.

The alphabet is chosen such that

 alt64(x) > alt64(y) <=> x > y 

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml