How reads work

Understand how reads flow through Crunch.

Reads through Crunch are simple and familiar. Your applications GET (i.e. read) data from S3 (or GCS) using the Granica S3-compatible API in the exact same way they would using the vanilla S3/GCS SDK. And Crunch returns your data regardless whether it has been crunched. Enabling your custom applications to write (and read) through Granica is a simple process, typically requiring only a single-line code change. Follow this Get Started guide to learn how to integrate the Granica API and thus Crunch into your apps.

Crunch lexicon

  • Crunched buckets are those which have been put under active management, processing and monitoring by Crunch. A bucket is in the state "crunched" immediately after you run the granica crunch command.
  • Crunched objects are those objects in a crunched bucket which have been processed by Crunch.
  • Vanilla buckets are those which have not been crunched.
  • Vanilla objects are those which have not been crunched. All objects in a vanilla bucket are by definition vanilla objects. Object in a crunched bucket that are yet to be processed (i.e. crunched!) are also vanilla objects.
  • Ingested objects are those which have been crunched and reduced and/or intelligently batched.
note

All reads from crunched buckets must use the Granica API as Crunch needs to hydrate and/or de-batch ingested data before returning it to your applications.

Read workflow

Read workflow

1. When an application or user issues read (GET) requests intended for a Crunched data bucket, those reads are directed to a load-balanced Read replica component of Crunch running in the same local availability zone (AZ). The Read replica maintains a cache for frequently accessed objects, which eliminates associated GET costs and retrieval latency for cache hits.

2a. When serving a read request for objects that have been ingested, the Read replica GETs the ingested data from the Reduced data bucket and reconstructs the original objects from it. This process includes lossless de-batching and/or hydration of reduced objects.

2b. When serving a read request for objects that have not been ingested, the Read replica seamlessly performs a passthrough of the GET request and retrieves the vanilla objects from the Crunched data bucket.

note

Objects may not be ingested due to Crunch Policy, or simply because they have not yet been crunched.

3. The original data is sent back as a normal S3 API response.

tip

To avoid cross-AZ data transfer charges for reads be sure to deploy Granica into the same AZ as your applications.

Fast repeated range reads

Crunch can speed repeated range reads by up to 72% faster than vanilla S3/GCS. This can make a significant difference in effective throughput for workloads which manipulate objects with a header consisting of metadata and offsets and the remainder consisting of actual payload data. For these scenarios, accessing a specific portion of the total payload data involves an initial range read on the header to find the offsets and then another range read to retrieve the actual payload data. A common pattern is to have many such reads on the object coming from multiple clients, requiring repeated range reads of the header and slowing down aggregate throughput.

Repeated range reads

Crunch transparently caches the header to eliminate the latency of underlying GETs to S3/GCS. Data reduction further increases effective throughput by reducing the physical payload size (and time) to retrieve from S3/GCS, even after factoring in time to hydrate the payload data.

See also