-
Notifications
You must be signed in to change notification settings - Fork 187
Open
2 / 32 of 3 issues completedDescription
Deletion vectors is an optimization feature that can be enabled on Delta Lake tables and Iceberg tables. They allow DELETE and UPDATE operations to mark existing rows as removed or changed without rewriting the Parquet file. Hudi may soon support a similar representation for deletion vectors.
Currently, XTable does not support handling and translating the deletion files between formats. This means that XTable cannot preserve the deletion vectors when converting a table from one format to another, resulting in incomplete translation and/or incorrect results. This feature request is to add support for deletion vector translation in XTable.
The proposed steps to implement the first phase of this feature are:
- Update Delta Lake version to 2.4+ (and spark to 3.4+) #340
- Detect the presence and format of deletion files in the Iceberg source table (positional deletes). #341
- Detect the presence and format of deletion files in the Delta lake source table. #342
- Add Deletion vector data file type #343
- Add a data structure to link deletion vector files to data files of the table #344
- Read and translate the deletion vectors in Iceberg source table to XTables internal representation (positional deletes) #346
- Read and translate the deletion vectors in Delta source table to XTables internal representation #345
- Write the deletion vectors to the Delta target table #347
- Write the deletion vectors to the Iceberg target table #348
heroldus, srpconfluent, prasanna-ds, hjohnss6, ron-damon and 2 more
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels