Skip to content

Online/Offline consistency guarantees - Join and Fetcher? #891

@anovv

Description

@anovv

Hi, I was going through Chronon docs trying to understand details on how the online/offline consistency is guaranteed but could not find exact details.

I'm particularly interested in 3 scenarios:

  1. Joining data in online vs offline modes
  2. Fetching data from KVStore vs Fetching data from batch store (Hive?)
  3. Stateless request-time user-defined transforms (on-demand features) - both online and offline

Details:

  1. I'm interested in Chronon-level Join operator/function implementation details: in streaming case join is fundamentally different (the most common approach is windowed join afaik) from batch join, hence running those on the same data may produce different results.
    My question is how does Chronon guarantee consistency between online and offline data, specifically for joins? Does it use Kappa architecture (e.g running the same streaming pipeline for offline data)? If so, what kind of streaming join is used? I'd like to understand this in-depth for both Spark Structured Streaming and Flink engines.

  2. For fetching/loading online/offline data: my understanding is that when executed in offline mode Chronon dumps resulting data in Hive, for online data goes to KVStore. Is there any guarantee that if I load data at specific timestamp from offline store (Hive) I'll get the exact same result as if I fetched KVStore at this exact timestamp? If so, how does it work exactly?

  3. Does Chronon allow any last-mile request-time user-defined stateless transformations (in Tecton those are called on-demand features, e.g. getting user's request time at millisecond granularity). If so, how are these computed at online and offline and same question w.r.t. data consistency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions