module Plasma_inodecache:The inodecache stores the metadata found in inodes, namely thesig
..end
inodeinfo
struct, and subsets of the block list.There is only one inodecache for all transactions - caching makes most sense if one transaction can directly profit from the knowledge gained by previous or parallel transactions. As PlasmaFS partly isolates transactions from each other, it is possible that one transaction sees a different version of the metadata than another transaction. There is the question how to deal with this.
First, we ensure that only metadata can be put into the cache where we know it is the current committed version, or at least was in the past. Non-committed data should not go into the cache - this would break fundamental assumptions about the transaction isolation.
Second, the version stored in the cache can only be replaced by a newer version. When a transaction keeps an old view and cannot yet see this newer version, the cache is useless for this transaction, and the server must be directly contacted to get information about the historical view. There is no disadvantage for the transaction, it is only slightly slower.
Effectively, a transaction can only profit from the cache if the transaction needs the newest version of the metadata.
For clean semantics data retrieved from the inodecache must only be
used in transactions that only read data and never write. This
affects the functions get_ii_via_cache_e
and get_blocks_via_cache_e
.
Otherwise there is the danger that the cache returns the newest data
which might not be the view the transaction has.
The functions that put data into the cache can be called from any
kind of transaction. The functions implement criterions that reliably
recognize cachable data and drop private views of transactions.
This affects the functions get_ii_e
and get_blocks_e
.
How to use cached data
By nature, the queried data is already outdated when it is returned to the caller. This is true for this cache, but it is also true for many server calls if the transaction is not kept open.
This problem does not make it impossible to use the data. It depends very much on the field, though. Generally, the inodeinfo struct is at most refreshed once per second. The blocklists are only refreshed if there is a new version of the inodeinfo struct (so this is also limited to one refresh per second). If the application can live with this delay, there is no problem at all.
Many applications need only exact metadata the first time an inode
is (re-)opened. To get this, it can be enforced to reload the
metadata with get_ii_e
.
There is a special problem with the blocklists. If an old version of the blocklist is used there is the danger that the wrong blocks are accessed. Of course, this is way more incorrect than just an mtime which is one second off. There are two ways to cope with the problem:
check_up_to_date_e
.type
ic_manager
val create_manager : Unixqueue.event_system -> ic_manager
create_manager esys
: Creates a new manager. Initially the cache
is disabled (behaving as if not caching anything).val enable : ic_manager -> Rpc_proxy.ManagedClient.t -> unit
mc
is used
to access the server-side inodecache functionality
(is_up_to_date)val disable : ic_manager -> unit
class type lazy_transaction =object
..end
get_tid_e
is called.
val get_ii_e : ic_manager ->
Rpc_proxy.ManagedClient.mclient ->
int64 -> int64 -> Plasma_rpcapi_aux.rinodeinfo Uq_engines.engine
get_ii_e icm mc tid inode
: Gets the inodeinfo
for inode
using
transaction tid
and the managed client mc
. The data is directly
requested from the namenode server (uncached data). The result
is returned, but also put into the cache if possible.
Caching is possible if:
val get_ii_from_cache_e : ic_manager ->
int64 -> Plasma_rpcapi_aux.inodeinfo option Uq_engines.engine
get_ii_from_cache_e icm inode
: Looks the inodeinfo
up for inode
.
If nothing is found in the cache, None
is returned. If an entry is
found, it is returned as Some ii
. Aged entries are additionally
validated, and if the validation fails, None
is returned.
This function never requests data from the server. All data comes from the cache, and this means it was retrieved using a historic transaction. Although the data is validated in some cases, it is generally not safe to assume it is the most recent version.
Validation means:
val get_ii_via_cache_e : ic_manager ->
Rpc_proxy.ManagedClient.mclient ->
lazy_transaction ->
int64 -> Plasma_rpcapi_aux.rinodeinfo Uq_engines.engine
get_ii_via_cache_e icm mc lt inode
: First tries to get the
inodeinfo
for inode
from the cache (as in get_ii_from_cache_e
),
and if this fails, requests the inodeinfo
from the server
(as in get_ii_e
). In the latter case, lt
may be used to obtain
the transaction ID tid
. It can also happen, though, that the
transaction of a concurrently running get_ii_via_cache_e
is
actually used for requesting the value from the server.
Note that this function has a somewhat strange semantics. If the
result comes from the cache, the latest known committed version is
returned - even if the transaction tid
modified this version
already. However, if the result comes from the namenode, the
version as viewed by the transaction is returned. Because of this
asymmetry it is only safe to use this function if it is known that
the transaction did not modify the inode before.
val invalidate_ii : ic_manager -> int64 -> unit
invalidate_ii icm inode
: Removes information about this inode from
the cacheval check_up_to_date_e : ic_manager ->
Rpc_proxy.ManagedClient.mclient ->
lazy_transaction ->
int64 -> int64 -> bool Uq_engines.engine
check_up_to_date_e icm mc lt inode seqno
: Checks whether seqno
is
the most recent committed sequence number of inode
. If true
,
this condition held at the time this function started execution,
but needs not hold anymore at the time the function returns.
If false
, seqno
was already old at call time, or an error occurred.
The lt
object must return a transaction that was never used
for any kind of data modification.
This function also works if the cache is disabled, but is slower
in this case.
type
bl_cache
val create_bl_cache : Rpc_proxy.ManagedClient.mclient ->
lazy_transaction -> int64 -> bl_cache
create_bl_cache mc lt inode
:
Create a cache for this transaction. This transaction should only
be used for reads (otherwise the cache will have strange semantics).val get_blocks_e : bl_cache ->
int64 -> int64 -> int64 -> Plasma_rpcapi_aux.rblocklist Uq_engines.engine
get_blocks_e blc index number req_seqno
: Gets the blocklist
for inode
for the index range index
to index+number-1
.
The transaction tid
is used for the server request.
The blocks are always retrieved from the server, never from the cache. The result, however, is put into the cache.
It is reasonable to check the presence of the inodeinfo prior to
calling this function (via get_ii_e
or get_ii_via_cache_e
).
In addition to filling the cache, also old entries are removed from
the cache.
val get_blocks_from_cache_e : bl_cache ->
int64 -> Plasma_rpcapi_aux.blocklist option Uq_engines.engine
get_blocks_from_cache_e blc index
: Tries to get the
blocks for index
from the cache for inode
. If nothing is found
in the cache, None
is returned. If an entry is
found, it is returned as Some bl
. Aged entries are additionally
validated, and if the validation fails, None
is returned.
This function never requests data from the server. All data comes from the cache, and this means it was retrieved using a possibly historic transaction. Although the data is validated in some cases, it is generally not safe to assume it is the most recent version.
See also get_ii_from_cache_e for how the data is validated.
val get_blocks_via_cache_e : bl_cache ->
int64 -> Plasma_rpcapi_aux.rblocklist Uq_engines.engine
get_blocks_via_cache_e blc index
: First tries to get the
blocklist for index
and inode
from the cache
(as in get_blocks_from_cache_e
),
and if this fails, requests the blocklist from the server
(as in get_blocks_e
).val snapshot_blocks_e : ?append:bool ->
bl_cache ->
Plasma_rpcapi_aux.inodeinfo -> Plasma_rpcapi_aux.rvoid Uq_engines.engine
`ok
on success. If `econflict
,
the snapshot must be repeated because the file was modified in
the meantime.
After enabling the snapshot feature, the functions
get_blocks_from_cache
and get_blocks_via_cache_e
always
respond from their cached data.
If append
, only the last block of the file is included in the
snapshot (see Plasma_client.snapshot
for explanations).
val override_blocks : bl_cache ->
int64 -> int64 -> Plasma_rpcapi_aux.blockinfo list -> unit
override_blocks blc block n blocklist
:
Puts these blocks into the cache, overriding whatever is there.
Useful for snapshots.val forget_blocks : bl_cache -> int64 -> int64 -> unit
forget_blocks blc block n
: Forgets these blocksval notify_got_eio : bl_cache -> unit
val expand_blocklist : Plasma_rpcapi_aux.blockinfo list -> Plasma_rpcapi_aux.blockinfo list
(The other functions already return expanded lists.)