Reader & Distributed Scan

Reader — Snapshot-Following Read Service

Reader is not a per-write real-time view. It reads from materialized global snapshots, so visibility advances only when:

  • writers create and materialize a newer snapshot, and
  • the reader refreshes (auto or manual) to that snapshot.

This makes Reader ideal for low-latency serving on a stable view, but with freshness bounded by snapshot cadence.

Opening a Reader

use cobble::{ReadOptions, Reader, ReaderConfig, VolumeDescriptor};

let read_config = ReaderConfig {
    volumes: VolumeDescriptor::single_volume("file:///tmp/my-db"),
    total_buckets: 1024,
    ..ReaderConfig::default()
};

let mut reader = Reader::open_current(read_config)?;

Point Lookup

let value = reader.get(0, b"user:1")?;
let metrics = reader.get_with_options(
    0,
    b"user:1",
    &ReadOptions::for_column_in_family("metrics", 0),
)?;

Column Families

Reader, ReadOnlyDb, Db, and SingleDb all keep routing bucket-only. Plain get / scan calls use the default family; named families are chosen through options:

use cobble::{Config, ReadOnlyDb, ReadOptions, ScanOptions};

let read_only = ReadOnlyDb::open_with_db_id(config, snapshot_id, db_id)?;
let value = read_only.get_with_options(
    0,
    b"key",
    &ReadOptions::for_column_in_family("metrics", 0),
)?;
let scanner = read_only.scan_with_options(
    0,
    b"a".as_ref()..b"z".as_ref(),
    &ScanOptions::for_column(0).with_column_family("metrics"),
)?;

Refreshing and Visibility

Call refresh() to pick up a newer materialized snapshot:

reader.refresh()?;

In open_current mode, reads may auto-check pointer changes, throttled by reader.reload_tolerance_seconds (default: 10s). If writers snapshot every minute, reader-visible data can lag by up to about one snapshot interval plus refresh tolerance.

Reader Config

Parameter Default Description
reader.pin_partition_in_memory_count 1 Number of partition snapshots pinned in memory
reader.block_cache_size 512 MB Block cache size for reader
reader.reload_tolerance_seconds 10 Minimum interval between snapshot reload checks

ReadOnlyDb — Snapshot-Based Read on one shard

ReadOnlyDb opens a specific snapshot for read-only access on one shard (corresponding to a single Db):

use cobble::{Config, ReadOnlyDb, ReadOptions, ScanOptions};

let read_only = ReadOnlyDb::open_with_db_id(config, snapshot_id, db_id)?;
let value = read_only.get_with_options(
    0,
    b"key",
    &ReadOptions::for_column_in_family("metrics", 0),
)?;
let scanner = read_only.scan_with_options(
    0,
    b"a".as_ref()..b"z".as_ref(),
    &ScanOptions::for_column(0).with_column_family("metrics"),
)?;

Distributed Scan

Cobble supports distributed scan operations where work is split across multiple workers. This follows a plan → split → scan execution model.

1. Create a Scan Plan

A ScanPlan is generated from a global snapshot manifest. The plan is still bucket-only; column families are selected later when you create the scanner through ScanOptions:

use cobble::ScanPlan;

let plan = ScanPlan::new(global_manifest);

2. Generate Splits

Each split represents a unit of work (currently one shard = one split):

let splits = plan.splits();
// splits.len() == number of shards in the snapshot

3. Dispatch and Execute

Splits are serializable and can be sent to distributed workers:

use cobble::ScanOptions;

let scan_options = ScanOptions::for_column(0).with_column_family("metrics");

for split in splits {
    // Each worker opens a scanner from its split
    let scanner = split.create_scanner(config.clone(), &scan_options)?;

    for row in scanner {
        let (bucket, key, columns) = row?;
        // process...
    }
}

If you want a non-default family, this is the step where you must specify it. create_scanner(...) clones the full ScanOptions into the ScanSplitScanner, so the chosen column_family keeps taking effect inside each worker-side scanner.

Raw distributed scan rows include the owning bucket together with key and columns. When a worker needs to resume or repartition one split around a concrete row boundary, call split.split_after(bucket, key) and use the returned before / after splits.

Scan Options

Parameter Default Description
read_ahead_bytes 0 Read-ahead buffer size (0 = disabled)
column_indices None Column projection — only read specified columns
column_family None Column family name; omitted means the default family

Column Projection

To read only specific columns, use ScanOptions. Projection indices are interpreted inside the selected column family:

let mut opts = ScanOptions::default();
opts.column_indices = Some(vec![0, 2]); // only columns 0 and 2

Bounded Scan

Bounds are set on the plan and copied to each split:

use cobble::ScanPlan;

let plan = ScanPlan::new(global_manifest)
    .with_start(b"start_key".to_vec())   // inclusive
    .with_end(b"end_key".to_vec());      // exclusive

Copyright © Cobble contributors. Distributed under the Apache-2.0 License.

This site uses Just the Docs, a documentation theme for Jekyll.