Skip to content

Commit 10de037

Browse files
committed
[#25820] YSQL: Index Consistency Checker
Summary: This revision adds the procedure `yb_index_check()` that checks whether the given index is consistent with its base relation or not. This operation doesn't support GIN indexes yet. #### Context An index row has the following components: - index attributes: key and non-key columns (if any) - ybbasectid: system attribute storing the ybctid of base relation row - ybuniqueidxkeysuffix: system attribute, present only for unique indexes (it is non-null only when the index is in null-are-distinct mode and the key columns contain at least one NULL) The structure of an index row is as follows (in `<DocKey>` -> `<DocValue>` format): - Non-unique index: (key cols, ybbasectid) -> (non-key cols) - Unique index: (key cols, ybuniqueidxkeysuffix) -> (non-key cols, ybbasectid) ####Consistency Definition An index row is consistent if all of its attributes are consistent. An index attribute is consistent if its value and the corresponding base relation value are - for key attributes: binary equal (semantic equality without binary equality runs the risk of allowing multiple index rows for a given base table row if the key column can have multiple binary representations). - for non-key attributes: binary or semantically equal. Note: if both the values are NULL, they are consistent. ####Consistency Check Index consistency check is done in two steps: # Check for spurious index rows # Check for missing index rows **Part 1: Check for spurious index rows** Here, we check if the index contains a row that it should not. To do this: For every index row, fetch the row in the base table (filtered by partial index predicate) such that baserow.ybctid == indexrow.ybbasectid. If such a row doesn’t exist, use a baserow with all NULL values. The result will be the same as a LEFT join on indexrow.ybbasectid = baserow.ybctid with the index table on the left (if the index was a regular relation). Fetch the following columns as the join targetlist: - from index row: ybbasectid, index attributes, ybuniqueidxkeysuffix (only for unique indexes) - from base table row: ybctid, columns/expressions corresponding to index attributes On the joined result, make the following checks: # ybbasectid should be non-null # ybbasectid should be equal to ybctid # index attributes and the corresponding base relation column/expression should be consistent as per the above definition # for unique indexes, ybuniqueidxkeysuffix should be non-null iff the index uses null-are-distinct mode and key columns contain at least one null. When non-null, it should be equal to ybbasectid If the above checks pass for every row in the index, it implies that the index does not contain any spurious rows. This can be proved by contradiction as follows: Let’s assume that the above checks passed for every row in the index, yet it contains a spurious row, namely indexrow1. This index row must satisfy the following: - indexrow1.ybbasectid != null (from check #1) - base table has a row, namely baserow, such that baserow.ybctid == indexrow1.ybbasectid (otherwise ybctid would be null and check #2 would have failed) - index attributes of indexrow1 are consistent with baserow (from check #3) - If the index is unique, indexrow1.ybuniqueidxkeysuffix is either null or equal to ybbasectid, depending on the index mode and key cols (from check #4) The above shows that indexrow1 has a valid counterpart in the baserow. Given this, the only possible reason why indexrow1 should not have been present in the index is that another index row, namely indexrow2, must exist such that the pair (indexrow2, baserow) also satisfies the above checks. We can say that indexrow1 and indexrow2 - have the same ybbasectid (baserow.ybctid == indexrow2.ybbasectid == indexrow1.ybbasectid). - have binary equal values for key columns. This is because key cols of both index rows are binary equal to the corresponding baserow values (from check #3 and definition of consistency). - have identical ybuniqueidxkeysuffix (it depends on index type, mode, and key cols - all of these are already established to be the same for the two index rows). The DocKey of the index row is created by a subset of (key cols, ybbasectid, ybuniqueidxkeysuffix). Each component is identical for the two index rows, implying identical DocKeys. This is not possible because DocDB does not allow duplicate DocKeys. Hence, such an indexrow1 does not exist. **Part 2: Check for missing index rows** This part checks if no entries are missing from the index. Given that it is already established that the index does not contain any spurious rows, it suffices to check if the index row count is what it should be. That is, for every qualifying row in the base table (filtered by partial index predicate), the index should contain one row. ####Implementation - To fetch the index row and the corresponding base table tow efficiently, batch nested loop join is used (details below) - Both parts of the check use a single read time. This works out of the box because the entire check is executed as a single YSQL statement. **Batch Nested Loop Join usage** Batchable join clauses must be of the form `inner_indexed_var = expression on (outer_vars)` and the expression must not involve functions. To satisfy the above requirement, - join condition: baserow.ybctid == indexrow.ybbasectid. - outer subplan: index relation scan - inner subplan: base relation scan. BNL expects an index on the var referenced in the join clause (ybctid, in this case). So, a dummy primary key index object on the ybctid column is temporarily created (not persisted in the PG catalog). Like any other PK index in YB, this index points to the base relation and doesn't have a separate docdb table. Because such an index object doesn't actually exist, the planner was bypassed and the join plan was hardcoded. Jira: DB-15118 Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressYbIndexCheck Reviewers: amartsinchyk, tnayak Reviewed By: amartsinchyk Subscribers: smishra, yql Differential Revision: https://phorge.dev.yugabyte.com/D41376
1 parent 4f39c31 commit 10de037

File tree

29 files changed

+1465
-26
lines changed

29 files changed

+1465
-26
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Copyright (c) YugabyteDB, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
4+
// in compliance with the License. You may obtain a copy of the License at
5+
//
6+
// http://www.apache.org/licenses/LICENSE-2.0
7+
//
8+
// Unless required by applicable law or agreed to in writing, software distributed under the License
9+
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
10+
// or implied. See the License for the specific language governing permissions and limitations
11+
// under the License.
12+
//
13+
package org.yb.pgsql;
14+
15+
import org.junit.Test;
16+
import org.junit.runner.RunWith;
17+
import org.yb.YBTestRunner;
18+
/**
19+
* Runs the pg_regress test suite on YB code.
20+
*/
21+
@RunWith(value=YBTestRunner.class)
22+
public class TestPgRegressYbIndexCheck extends BasePgRegressTest {
23+
24+
@Override
25+
public int getTestMethodTimeoutSec() {
26+
return 1800;
27+
}
28+
29+
@Test
30+
public void testPgRegressFeature() throws Exception {
31+
runPgRegressTest("yb_index_check_schedule");
32+
}
33+
}

src/postgres/src/backend/access/index/indexam.c

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@
6969
/* Yugabyte includes */
7070
#include "pg_yb_utils.h"
7171
#include "access/yb_scan.h"
72+
#include "catalog/pg_am.h"
73+
#include "catalog/pg_opfamily.h"
7274

7375
/* ----------------------------------------------------------------
7476
* macros used in index_ routines
@@ -124,6 +126,69 @@ static IndexScanDesc index_beginscan_internal(Relation indexRelation,
124126
int nkeys, int norderbys, Snapshot snapshot,
125127
ParallelIndexScanDesc pscan, bool temp_snap);
126128

129+
/*
130+
* Wrapper on the top of index_open(), used during yb_index_check(). Given a
131+
* base relation id, it creates a dummy primary key index object such
132+
* that:
133+
* - indexrelid and indrelid both point to the base relation
134+
* - index key: ybctid column
135+
*/
136+
Relation
137+
yb_dummy_baserel_index_open(Oid relationId, LOCKMODE lockmode)
138+
{
139+
Relation relation;
140+
141+
relation = relation_open(relationId, lockmode);
142+
143+
if (relation->rd_rel->relkind == RELKIND_RELATION)
144+
{
145+
Assert(!relation->rd_index);
146+
Assert(!relation->rd_indam);
147+
Assert(!relation->rd_opfamily);
148+
int natts = 1;
149+
Form_pg_index pg_index =
150+
palloc0(sizeof(FormData_pg_index) + natts * sizeof(int16));
151+
pg_index->indexrelid = RelationGetRelid(relation);
152+
pg_index->indrelid = RelationGetRelid(relation);
153+
pg_index->indnatts = natts;
154+
pg_index->indnkeyatts = natts;
155+
pg_index->indisunique = true;
156+
pg_index->indisprimary = true;
157+
pg_index->indimmediate = true;
158+
pg_index->indisvalid = true;
159+
pg_index->indisready = true;
160+
pg_index->indislive = true;
161+
pg_index->indkey.ndim = 1;
162+
pg_index->indkey.dataoffset = 0; /* never any nulls */
163+
pg_index->indkey.elemtype = INT2OID;
164+
pg_index->indkey.dim1 = natts;
165+
pg_index->indkey.lbound1 = 0;
166+
pg_index->indkey.values[0] = YBTupleIdAttributeNumber;
167+
168+
relation->rd_index = pg_index;
169+
relation->rd_indam = GetIndexAmRoutineByAmId(LSM_AM_OID, false);
170+
relation->rd_opfamily = palloc0(sizeof(Oid) * pg_index->indnkeyatts);
171+
relation->rd_opfamily[0] = BYTEA_LSM_FAM_OID;
172+
}
173+
return relation;
174+
}
175+
176+
/*
177+
* Free the dummy index object created for yb_index_check().
178+
*/
179+
void
180+
yb_free_dummy_baserel_index(Relation relation)
181+
{
182+
Assert(relation->rd_index);
183+
Assert(relation->rd_indam);
184+
Assert(relation->rd_opfamily);
185+
pfree(relation->rd_index);
186+
pfree(relation->rd_indam);
187+
pfree(relation->rd_opfamily);
188+
relation->rd_index = NULL;
189+
relation->rd_indam = NULL;
190+
relation->rd_opfamily = NULL;
191+
}
127192

128193
/* ----------------------------------------------------------------
129194
* index_ interface functions

src/postgres/src/backend/access/yb_access/yb_scan.c

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
#include "access/sysattr.h"
3535
#include "access/xact.h"
3636
#include "access/yb_pg_inherits_scan.h"
37+
#include "catalog/heap.h"
3738
#include "commands/dbcommands.h"
3839
#include "commands/tablegroup.h"
3940
#include "catalog/index.h"
@@ -227,20 +228,8 @@ ybcLoadTableInfo(Relation relation, YbScanPlan scan_plan)
227228
static Oid
228229
ybc_get_atttypid(TupleDesc bind_desc, AttrNumber attnum)
229230
{
230-
Oid atttypid;
231-
232-
if (attnum > 0)
233-
{
234-
/* Get the type from the description */
235-
atttypid = TupleDescAttr(bind_desc, attnum - 1)->atttypid;
236-
}
237-
else
238-
{
239-
/* This must be an OID column. */
240-
atttypid = OIDOID;
241-
}
242-
243-
return atttypid;
231+
return attnum > 0 ? TupleDescAttr(bind_desc, attnum - 1)->atttypid :
232+
SystemAttributeDefinition(attnum)->atttypid;
244233
}
245234

246235
/*
@@ -756,6 +745,9 @@ ybcFetchNextIndexTuple(YbScanDesc ybScan, ScanDirection dir)
756745
INDEXTUPLE_YBCTID(tuple) = PointerGetDatum(syscols.ybbasectid);
757746
ybcUpdateFKCache(ybScan, INDEXTUPLE_YBCTID(tuple));
758747
}
748+
if (syscols.ybuniqueidxkeysuffix != NULL)
749+
tuple->t_ybuniqueidxkeysuffix =
750+
PointerGetDatum(syscols.ybuniqueidxkeysuffix);
759751
}
760752
break;
761753
}
@@ -1253,6 +1245,7 @@ ybcSetupScanKeys(YbScanDesc ybScan, YbScanPlan scan_plan)
12531245
/*
12541246
* Find the scan keys that are the primary key.
12551247
*/
1248+
bool sk_cols_has_ybctid = false;
12561249
for (int i = 0; i < ybScan->nkeys; i++)
12571250
{
12581251
const AttrNumber attnum = scan_plan->bind_key_attnums[i];
@@ -1262,6 +1255,11 @@ ybcSetupScanKeys(YbScanDesc ybScan, YbScanPlan scan_plan)
12621255

12631256
int idx = YBAttnumToBmsIndex(scan_plan->target_relation, attnum);
12641257

1258+
if (attnum == YBTupleIdAttributeNumber)
1259+
{
1260+
sk_cols_has_ybctid = true;
1261+
scan_plan->sk_cols = bms_add_member(scan_plan->sk_cols, idx);
1262+
}
12651263
/*
12661264
* TODO: Can we have bound keys on non-pkey columns here?
12671265
* If not we do not need the is_primary_key below.
@@ -1276,12 +1274,14 @@ ybcSetupScanKeys(YbScanDesc ybScan, YbScanPlan scan_plan)
12761274
}
12771275

12781276
/*
1279-
* If hash key is not fully set, we must do a full-table scan so clear all
1280-
* the scan keys if the hash code was explicitly specified as a
1281-
* scan key then we also shouldn't be clearing the scan keys
1277+
* If hash key is not fully set and ybctid is not set either, we must do a
1278+
* full-table scan so clear all the scan keys if the hash code was
1279+
* explicitly specified as a scan key then we also shouldn't be clearing the
1280+
* scan keys.
12821281
*/
12831282
if (ybScan->hash_code_keys == NIL &&
1284-
!bms_is_subset(scan_plan->hash_key, scan_plan->sk_cols))
1283+
!bms_is_subset(scan_plan->hash_key, scan_plan->sk_cols) &&
1284+
!sk_cols_has_ybctid)
12851285
{
12861286
bms_free(scan_plan->sk_cols);
12871287
scan_plan->sk_cols = NULL;
@@ -1921,6 +1921,11 @@ YbBindSearchArray(YbScanDesc ybScan, YbScanPlan scan_plan,
19211921
length_of_key - 1, attnums,
19221922
num_elems, elem_values);
19231923
}
1924+
else if (scan_plan->bind_key_attnums[i] == YBTupleIdAttributeNumber)
1925+
{
1926+
Assert(num_elems == num_valid);
1927+
YBCPgBindYbctids(ybScan->handle, num_elems, elem_values);
1928+
}
19241929
else
19251930
ybcBindColumnCondIn(ybScan, scan_plan->bind_desc,
19261931
scan_plan->bind_key_attnums[i], num_elems,

src/postgres/src/backend/catalog/heap.c

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,46 @@ static const FormData_pg_attribute a7 = {
250250

251251
static const FormData_pg_attribute *SysAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a7};
252252

253+
static const FormData_pg_attribute yb_a1 = {
254+
.attname = {"ybuniqueidxkeysuffix"},
255+
.atttypid = BYTEAOID,
256+
.attlen = -1,
257+
.attnum = YBUniqueIdxKeySuffixAttributeNumber,
258+
.attcacheoff = -1,
259+
.atttypmod = -1,
260+
.attbyval = false,
261+
.attalign = TYPALIGN_INT,
262+
.attstorage = TYPSTORAGE_EXTENDED,
263+
.attnotnull = false,
264+
.attislocal = true,
265+
};
266+
267+
static const FormData_pg_attribute yb_a2 = {
268+
.attname = {"ybidxbasectid"},
269+
.atttypid = BYTEAOID,
270+
.attlen = -1,
271+
.attnum = YBIdxBaseTupleIdAttributeNumber,
272+
.attcacheoff = -1,
273+
.atttypmod = -1,
274+
.attbyval = false,
275+
.attalign = TYPALIGN_INT,
276+
.attstorage = TYPSTORAGE_EXTENDED,
277+
.attnotnull = true,
278+
.attislocal = true,
279+
};
280+
281+
282+
static const FormData_pg_attribute *YbSysAtt[] = {&yb_a1, &yb_a2};
283+
284+
const FormData_pg_attribute *
285+
YbSystemAttributeDefinition(AttrNumber attno)
286+
{
287+
int index = attno - YBSystemFirstLowInvalidAttributeNumber - 1;
288+
if (index < 0 || index >= lengthof(YbSysAtt))
289+
elog(ERROR, "invalid YB system attribute number %d", attno);
290+
return YbSysAtt[index];
291+
}
292+
253293
/*
254294
* This function returns a Form_pg_attribute pointer for a system attribute.
255295
* Note that we elog if the presented attno is invalid, which would only
@@ -258,6 +298,8 @@ static const FormData_pg_attribute *SysAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a
258298
const FormData_pg_attribute *
259299
SystemAttributeDefinition(AttrNumber attno)
260300
{
301+
if (attno <= YBFirstLowInvalidAttributeNumber)
302+
return YbSystemAttributeDefinition(attno);
261303
if (attno >= 0 || attno < -(int) lengthof(SysAtt))
262304
elog(ERROR, "invalid system attribute number %d", attno);
263305
return SysAtt[-attno - 1];

src/postgres/src/backend/executor/execExprInterp.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4377,7 +4377,11 @@ ExecEvalSysVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext,
43774377
op->resnull);
43784378
*op->resvalue = d;
43794379
/* this ought to be unreachable, but it's cheap enough to check */
4380-
if (unlikely(*op->resnull))
4380+
/*
4381+
* YB note: resnull can be true for ybuniqueidxkeysuffix (used in index
4382+
* consistency checker).
4383+
*/
4384+
if (*op->resnull && op->d.var.attnum != YBUniqueIdxKeySuffixAttributeNumber)
43814385
elog(ERROR, "failed to fetch attribute from slot");
43824386
}
43834387

src/postgres/src/backend/executor/execUtils.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,8 @@ CreateExecutorState(void)
202202
estate->yb_exec_params.yb_fetch_row_limit = yb_fetch_row_limit;
203203
estate->yb_exec_params.yb_fetch_size_limit = yb_fetch_size_limit;
204204

205+
estate->yb_exec_params.yb_index_check = false;
206+
205207
return estate;
206208
}
207209

src/postgres/src/backend/executor/nodeIndexonlyscan.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,10 @@ StoreIndexTuple(TupleTableSlot *slot, IndexTuple itup, TupleDesc itupdesc)
356356
ExecClearTuple(slot);
357357
index_deform_tuple(itup, itupdesc, slot->tts_values, slot->tts_isnull);
358358
ExecStoreVirtualTuple(slot);
359+
360+
TABLETUPLE_YBCTID(slot) = INDEXTUPLE_YBCTID(itup); /* ybidxbasectid */
361+
slot->ts_ybuniqueidxkeysuffix = itup->t_ybuniqueidxkeysuffix; /* ybuniqueidxkeysuffix */
362+
359363
}
360364

361365
/*

src/postgres/src/backend/executor/nodeIndexscan.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -928,7 +928,12 @@ ExecEndIndexScan(IndexScanState *node)
928928
if (indexScanDesc)
929929
index_endscan(indexScanDesc);
930930
if (indexRelationDesc)
931+
{
932+
if (node->ss.ps.state->yb_exec_params.yb_index_check &&
933+
indexRelationDesc->rd_rel->relkind == RELKIND_RELATION)
934+
yb_free_dummy_baserel_index(indexRelationDesc);
931935
index_close(indexRelationDesc, NoLock);
936+
}
932937
}
933938

934939
/* ----------------------------------------------------------------
@@ -1089,8 +1094,12 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
10891094

10901095
/* Open the index relation. */
10911096
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
1092-
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
10931097

1098+
if (!estate->yb_exec_params.yb_index_check)
1099+
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
1100+
else
1101+
indexstate->iss_RelationDesc =
1102+
yb_dummy_baserel_index_open(node->indexid, lockmode);
10941103
/*
10951104
* Initialize index-specific scan state
10961105
*/

src/postgres/src/backend/utils/misc/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ OBJS = \
3333
sampling.o \
3434
superuser.o \
3535
timeout.o \
36-
tzparser.o
36+
tzparser.o \
37+
yb_index_check.o
3738

3839
# This location might depend on the installation directories. Therefore
3940
# we can't substitute it into pg_config.h.

0 commit comments

Comments
 (0)