You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
This revision adds the procedure `yb_index_check()` that checks whether the given index is consistent with its base relation or not. This operation doesn't support GIN indexes yet.
#### Context
An index row has the following components:
- index attributes: key and non-key columns (if any)
- ybbasectid: system attribute storing the ybctid of base relation row
- ybuniqueidxkeysuffix: system attribute, present only for unique indexes (it is non-null only when the index is in null-are-distinct mode and the key columns contain at least one NULL)
The structure of an index row is as follows (in `<DocKey>` -> `<DocValue>` format):
- Non-unique index: (key cols, ybbasectid) -> (non-key cols)
- Unique index: (key cols, ybuniqueidxkeysuffix) -> (non-key cols, ybbasectid)
####Consistency Definition
An index row is consistent if all of its attributes are consistent. An index attribute is consistent if its value and the corresponding base relation value are
- for key attributes: binary equal (semantic equality without binary equality runs the risk of allowing multiple index rows for a given base table row if the key column can have multiple binary representations).
- for non-key attributes: binary or semantically equal.
Note: if both the values are NULL, they are consistent.
####Consistency Check
Index consistency check is done in two steps:
# Check for spurious index rows
# Check for missing index rows
**Part 1: Check for spurious index rows**
Here, we check if the index contains a row that it should not. To do this:
For every index row, fetch the row in the base table (filtered by partial index predicate) such that baserow.ybctid == indexrow.ybbasectid. If such a row doesn’t exist, use a baserow with all NULL values. The result will be the same as a LEFT join on indexrow.ybbasectid = baserow.ybctid with the index table on the left (if the index was a regular relation).
Fetch the following columns as the join targetlist:
- from index row: ybbasectid, index attributes, ybuniqueidxkeysuffix (only for unique indexes)
- from base table row: ybctid, columns/expressions corresponding to index attributes
On the joined result, make the following checks:
# ybbasectid should be non-null
# ybbasectid should be equal to ybctid
# index attributes and the corresponding base relation column/expression should be consistent as per the above definition
# for unique indexes, ybuniqueidxkeysuffix should be non-null iff the index uses null-are-distinct mode and key columns contain at least one null. When non-null, it should be equal to ybbasectid
If the above checks pass for every row in the index, it implies that the index does not contain any spurious rows. This can be proved by contradiction as follows:
Let’s assume that the above checks passed for every row in the index, yet it contains a spurious row, namely indexrow1. This index row must satisfy the following:
- indexrow1.ybbasectid != null (from check #1)
- base table has a row, namely baserow, such that baserow.ybctid == indexrow1.ybbasectid (otherwise ybctid would be null and check #2 would have failed)
- index attributes of indexrow1 are consistent with baserow (from check #3)
- If the index is unique, indexrow1.ybuniqueidxkeysuffix is either null or equal to ybbasectid, depending on the index mode and key cols (from check #4)
The above shows that indexrow1 has a valid counterpart in the baserow. Given this, the only possible reason why indexrow1 should not have been present in the index is that another index row, namely indexrow2, must exist such that the pair (indexrow2, baserow) also satisfies the above checks.
We can say that indexrow1 and indexrow2
- have the same ybbasectid (baserow.ybctid == indexrow2.ybbasectid == indexrow1.ybbasectid).
- have binary equal values for key columns. This is because key cols of both index rows are binary equal to the corresponding baserow values (from check #3 and definition of consistency).
- have identical ybuniqueidxkeysuffix (it depends on index type, mode, and key cols - all of these are already established to be the same for the two index rows).
The DocKey of the index row is created by a subset of (key cols, ybbasectid, ybuniqueidxkeysuffix). Each component is identical for the two index rows, implying identical DocKeys. This is not possible because DocDB does not allow duplicate DocKeys. Hence, such an indexrow1 does not exist.
**Part 2: Check for missing index rows**
This part checks if no entries are missing from the index. Given that it is already established that the index does not contain any spurious rows, it suffices to check if the index row count is what it should be. That is, for every qualifying row in the base table (filtered by partial index predicate), the index should contain one row.
####Implementation
- To fetch the index row and the corresponding base table tow efficiently, batch nested loop join is used (details below)
- Both parts of the check use a single read time. This works out of the box because the entire check is executed as a single YSQL statement.
**Batch Nested Loop Join usage**
Batchable join clauses must be of the form `inner_indexed_var = expression on (outer_vars)` and the expression must not involve functions.
To satisfy the above requirement,
- join condition: baserow.ybctid == indexrow.ybbasectid.
- outer subplan: index relation scan
- inner subplan: base relation scan. BNL expects an index on the var referenced in the join clause (ybctid, in this case). So, a dummy primary key index object on the ybctid column is temporarily created (not persisted in the PG catalog). Like any other PK index in YB, this index points to the base relation and doesn't have a separate docdb table. Because such an index object doesn't actually exist, the planner was bypassed and the join plan was hardcoded.
Jira: DB-15118
Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgRegressYbIndexCheck
Reviewers: amartsinchyk, tnayak
Reviewed By: amartsinchyk
Subscribers: smishra, yql
Differential Revision: https://phorge.dev.yugabyte.com/D41376
0 commit comments