Skip to content

UPDATE COLUMNS FROM: bump _row_last_updated_at_version for matched rows #418

@jerryjch

Description

@jerryjch

Summary

UPDATE COLUMNS FROM commits a new dataset version but does not advance
_row_last_updated_at_version on rows whose column values were rewritten.

ALTER TABLE ... UPDATE COLUMNS ... FROM (UpdateColumnsBackfill in lance-spark)
commits through Lance CommitBuilder and Transaction as a normal Update with
UpdateMode.RewriteColumns. The table version increases, but per-row change-data
metadata _row_last_updated_at_version can stay the same as before the commit
(for example still equal to _row_created_at_version), even though data in the
updated columns changed.

Expected behavior

From Lance row lineage and change-data feed (CDF) docs, _row_last_updated_at_version
is the dataset version at which the row was last modified. If a write creates a
new dataset version and changes visible row data for matched rows, those rows
should get _row_last_updated_at_version set to that new version.
_row_created_at_version should stay at the version where the row first appeared.

Actual behavior

After UPDATE COLUMNS FROM, rows that had columns rewritten can still show the
same _row_last_updated_at_version as before, while the dataset version has moved
forward on commit.

Reproduction

  1. Create a Lance table with stable row IDs enabled (enable_stable_row_ids).
  2. Insert several rows (e.g. id 1, 2, 3) so CDF columns exist; note dataset version V0 and
    _row_created_at_version / _row_last_updated_at_version for each row.
  3. Run ALTER TABLE ... UPDATE COLUMNS ... FROM with a source view that updates only one row
    (e.g. id = 2); leave id 1 and 3 unchanged in the source.
  4. Read _row_last_updated_at_version for id = 2: it may still equal the pre-update value (or
    match created_at only) even though the dataset version advanced past V0.
  5. id 1 and 3 should not incorrectly bump.

Note

This ticket is on top of the following:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions