Feature request: volume anomalies pre-aggregate data before computing statistics #1278
Closed
garfieldthesam
started this conversation in
Product and features
Replies: 1 comment
-
Closing and turning this into an issue instead |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I need to run volume anomaly tests on very large tables. However, I cannot performantly do so because the compiled query does not pre-aggregate the row count data. For example this is what the first CTE looks like for a test run on my Databricks cluster:
For large tables (especially ones that are very wide), this is prohibitively expensive for 2 reasons:
select *
fetches all table columns, which isn't necessary in principle for creating a time series; this is especially costly for columnar databasesAs a result we've had to create a cumbersome system where we create derived data quality metrics tables summarizing the large table's metrics each day, and then run elementary
column
tests on those.I'd like to request a rearchitecture of how the volume anomalies code works to improve performance for large tables.
Beta Was this translation helpful? Give feedback.
All reactions