Skip to content

Commit 7886fda

Browse files
committed
[#26283] YSQL: Support yb_index_check() execution using multiple snapshots
Summary: ===Problem=== Index consistency checker (`yb_index_check()`) currently employs a BNL LEFT join (to join the index and the base relation) followed by a `count(*)` on the base relation to detect any index consistency issue. The details of these operations and checks is mentioned in the summary of D41376 / 10de037. yb_index_check() is a slow operation. Performance experiments show that `yb_index_check()` on an index with `pg_table_size()` of 3GB took 600 seconds in a single region, multi-AZ 3-node cluster. Any operation whose execution time is longer than `timestamp_history_retention_interval_sec` (default value is 15 mins) is susceptible to `Snapshot too old` error. Assuming `timestamp_history_retention_interval_sec` is set to the default, yb_index_check() is susceptible to this error on indexes with `pg_table_size()` >= 4.5 GB. ===Solution=== To resolve this shortcoming, this revision adds support for 'multi-snapshot' execution mode for yb_index_check(). In this mode, the left subplan of the JOIN is divided into small batches. A new (latest) snapshot is picked for processing each batch. The batch size is such that its processing is guaranteed to complete within `timestamp_history_retention_interval_sec` (details below). Hence, in this execution mode, yb_index_check() is guaranteed not to run into the `Snapshot too old` error. ===Implementation details=== ===== Batching ===== **Terminology** There are two batches involved ## BNL batch: batch inside the BNL join ## Checker batch: batch of rows of the left subplan of the JOIN that will be processed using the same snapshot during yb_index_check(). **Checker batch size** To ensure all the rows are processed at least once, the checker batch size should be a multiple of the BNL batch size. This is because the BNL output is ordered by left relation ybctid across the batches, but not within a batch. For instance, consider the following scenario: BNL batch size = 3 | BNL batch | output | max ybctid encountered | 1 | ybctid1, ... | ybctid1 | 1 | ybctid3, ... | ybctid3 | 1 | ybctid2, ... | ybctid3 | 2 | ybctid6, ... | ybctid6 | 2 | ybctid4, ... | ybctid6 | 2 | ybctid5, ... | ybctid6 Note that the ybctids within a batch are not ordered, but across the batch, they are ordered. Now, say if the checker batch size = 4. The checker's first batch will finish at the 4th row, with max ybctid encountered == ybctid6. Consequently, rows corresponding to ybctid4 and ybctid5 will never be processed. We keep processing rows in multiples of yb_bnl_batch_size within a checker batch as long as the elapsed time > 70% of timestamp_history_retention_interval_sec. This threshold of 70% is based on a heuristic. The idea is to keep it closer to 100% so that as many rows as possible are processed within a single batch, to avoid the overhead of creating too many batches. At the same time, keeping it too close to 100% risks running into the Snapshot too old error in scenarios when elapsed time is marginally less than the threshold, and hence the next set of rows are processed in the same batch, but that pushes the elapsed time beyond the timestamp_history_retention_interval_sec. **Batch processing** While processing a checker batch, we keep a track of the maximum processed ybctid (of left relation) and pass it as a lower bound when initializing the next checker batch. **Controlling execution mode** yb_index_check() now takes an optional bool argument `single_snapshot_mode` to control the execution mode. As the name suggests, if it is true, the execution mode is 'single snapshot' (all the rows are processed using a single snapshot) and vice-versa. The default value of this argument is false, meaning yb_index_check() by default executes in multi-snapshot mode. ===== Operations ===== In the multi-snapshot mode, the count(*) on the base relation is replaced by another LEFT join between the base rel and the index rel. Both these operations serve the same purpose - to detect missing rows from the index. The details of the JOIN are as follows: Left relation: base relation Right relation: index relation Join condition: baserel.computed_index_row_ybctid = indexrel.t_ybindexrowybctid. Join type: Batched Nested Loop join Check condition: indexrel.t_ybindexrowybctid is not null (a null value would indicate missing index rows) Jira: DB-15629 Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbIndexCheck' ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbIndexCheckSingleSnapshot' ./yb_build.sh --gtest_filter PgYbIndexCheckTest.YbIndexCheckRepeatableRead ./yb_build.sh --gtest_filter PgYbIndexCheckTest.YbIndexCheckSnapshotTooOld Reviewers: amartsinchyk, kramanathan Reviewed By: amartsinchyk Subscribers: smishra, jason, svc_phabricator, yql Differential Revision: https://phorge.dev.yugabyte.com/D42311
1 parent 6f86a89 commit 7886fda

30 files changed

+1841
-560
lines changed

java/yb-pgsql/src/test/java/org/yb/pgsql/TestPgRegressYbIndexCheck.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
//
1313
package org.yb.pgsql;
1414

15+
import java.util.Map;
1516
import org.junit.Test;
1617
import org.junit.runner.RunWith;
1718
import org.yb.YBTestRunner;
@@ -27,7 +28,14 @@ public int getTestMethodTimeoutSec() {
2728
}
2829

2930
@Test
30-
public void testPgRegressFeature() throws Exception {
31+
public void testPgRegressYbIndexCheck() throws Exception {
3132
runPgRegressTest("yb_index_check_schedule");
3233
}
34+
35+
@Override
36+
protected Map<String, String> getTServerFlags() {
37+
Map<String, String> flagMap = super.getTServerFlags();
38+
appendToYsqlPgConf(flagMap, "yb_test_index_check_num_batches_per_snapshot=1");
39+
return flagMap;
40+
}
3341
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// Copyright (c) YugabyteDB, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
4+
// in compliance with the License. You may obtain a copy of the License at
5+
//
6+
// http://www.apache.org/licenses/LICENSE-2.0
7+
//
8+
// Unless required by applicable law or agreed to in writing, software distributed under the License
9+
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
10+
// or implied. See the License for the specific language governing permissions and limitations
11+
// under the License.
12+
//
13+
package org.yb.pgsql;
14+
15+
import java.util.Map;
16+
import org.junit.Test;
17+
import org.junit.runner.RunWith;
18+
import org.yb.YBTestRunner;
19+
/**
20+
* Runs the pg_regress test suite on YB code.
21+
*/
22+
@RunWith(value=YBTestRunner.class)
23+
public class TestPgRegressYbIndexCheckSingleSnapshot extends BasePgRegressTest {
24+
25+
@Override
26+
public int getTestMethodTimeoutSec() {
27+
return 1800;
28+
}
29+
30+
@Test
31+
public void testPgRegressYbIndexCheckSingleSnapshot() throws Exception {
32+
runPgRegressTest("yb_index_check_schedule");
33+
}
34+
35+
@Override
36+
protected Map<String, String> getTServerFlags() {
37+
Map<String, String> flagMap = super.getTServerFlags();
38+
appendToYsqlPgConf(flagMap, "yb_test_index_check_num_batches_per_snapshot=0");
39+
return flagMap;
40+
}
41+
42+
}

src/postgres/src/backend/access/yb_access/yb_scan.c

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,17 @@ YbBindColumnCondBetween(YbScanDesc ybScan,
256256
bool start_valid, bool start_inclusive, Datum value,
257257
bool end_valid, bool end_inclusive, Datum value_end)
258258
{
259+
/* Special handling of quals on ybctid column. */
260+
if (attnum == YBTupleIdAttributeNumber)
261+
{
262+
HandleYBStatus(YBCPgDmlBindBounds(ybScan->handle,
263+
start_valid ? value : 0,
264+
start_inclusive,
265+
end_valid ? value_end : 0,
266+
end_inclusive));
267+
return;
268+
}
269+
259270
Oid atttypid = ybc_get_atttypid(bind_desc, attnum);
260271
Oid attcollation = YBEncodingCollation(ybScan->handle, attnum,
261272
ybc_get_attcollation(bind_desc,
@@ -730,21 +741,25 @@ ybcFetchNextIndexTuple(YbScanDesc ybScan, ScanDirection dir)
730741
tuple = index_form_tuple(RelationGetDescr(index), ivalues, inulls);
731742
if (syscols.ybctid != NULL)
732743
{
733-
INDEXTUPLE_YBCTID(tuple) = PointerGetDatum(syscols.ybctid);
734-
ybcUpdateFKCache(ybScan, INDEXTUPLE_YBCTID(tuple));
744+
INDEXTUPLE_BASECTID(tuple) = PointerGetDatum(syscols.ybctid);
745+
ybcUpdateFKCache(ybScan, INDEXTUPLE_BASECTID(tuple));
735746
}
736747
}
737748
else
738749
{
739750
tuple = index_form_tuple(tupdesc, values, nulls);
740751
if (syscols.ybbasectid != NULL)
741752
{
742-
INDEXTUPLE_YBCTID(tuple) = PointerGetDatum(syscols.ybbasectid);
743-
ybcUpdateFKCache(ybScan, INDEXTUPLE_YBCTID(tuple));
753+
INDEXTUPLE_BASECTID(tuple) = PointerGetDatum(syscols.ybbasectid);
754+
ybcUpdateFKCache(ybScan, INDEXTUPLE_BASECTID(tuple));
744755
}
756+
757+
/* Fields used by yb_index_check() */
745758
if (syscols.ybuniqueidxkeysuffix != NULL)
746759
tuple->t_ybuniqueidxkeysuffix =
747760
PointerGetDatum(syscols.ybuniqueidxkeysuffix);
761+
if (syscols.ybctid != NULL)
762+
tuple->t_ybindexrowybctid = PointerGetDatum(syscols.ybctid);
748763
}
749764
break;
750765
}

src/postgres/src/backend/catalog/yb_system_functions.sql

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,20 @@ LANGUAGE INTERNAL
115115
VOLATILE STRICT PARALLEL SAFE
116116
AS 'yb_cancel_query_diagnostics';
117117

118+
CREATE OR REPLACE FUNCTION
119+
yb_index_check(indexrelid oid, single_snapshot_mode bool DEFAULT false)
120+
RETURNS void
121+
LANGUAGE INTERNAL
122+
VOLATILE PARALLEL SAFE
123+
AS 'yb_index_check';
124+
125+
CREATE OR REPLACE FUNCTION
126+
yb_compute_row_ybctid(relid oid, key_atts record, ybidxbasectid bytea DEFAULT NULL)
127+
RETURNS bytea
128+
LANGUAGE INTERNAL
129+
IMMUTABLE PARALLEL SAFE
130+
AS 'yb_compute_row_ybctid';
131+
118132
--
119133
-- Grant and revoke statements on YB objects.
120134
--

src/postgres/src/backend/executor/nodeIndexonlyscan.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -389,8 +389,10 @@ StoreIndexTuple(IndexOnlyScanState *node, TupleTableSlot *slot,
389389

390390
ExecStoreVirtualTuple(slot);
391391

392-
TABLETUPLE_YBCTID(slot) = INDEXTUPLE_YBCTID(itup); /* ybidxbasectid */
393-
slot->ts_ybuniqueidxkeysuffix = itup->t_ybuniqueidxkeysuffix; /* ybuniqueidxkeysuffix */
392+
/* Fields used by yb_index_check() */
393+
slot->tts_ybidxbasectid = INDEXTUPLE_BASECTID(itup); /* ybidxbasectid */
394+
slot->tts_ybuniqueidxkeysuffix = itup->t_ybuniqueidxkeysuffix; /* ybuniqueidxkeysuffix */
395+
slot->tts_ybctid = itup->t_ybindexrowybctid; /* index row's ybctid */
394396
}
395397

396398
/*

src/postgres/src/backend/executor/nodeIndexscan.c

Lines changed: 32 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1409,14 +1409,25 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
14091409
else
14101410
{
14111411
varattno = ((Var *) leftop)->varattno;
1412-
if (varattno < 1 || varattno > indnkeyatts)
1413-
elog(ERROR, "bogus index qualification");
14141412

14151413
/*
1416-
* We have to look up the operator's strategy number. This
1417-
* provides a cross-check that the operator does match the index.
1414+
* Special handling for ybctid column. This is currenly used
1415+
* only by yb_index_check() which executes indexqual of the
1416+
* form 'ybctid > lower_bound'.
14181417
*/
1419-
opfamily = index->rd_opfamily[varattno - 1];
1418+
if (varattno == YBTupleIdAttributeNumber)
1419+
opfamily = BYTEA_LSM_FAM_OID;
1420+
else
1421+
{
1422+
if (varattno < 1 || varattno > indnkeyatts)
1423+
elog(ERROR, "bogus index qualification");
1424+
1425+
/*
1426+
* We have to look up the operator's strategy number. This
1427+
* provides a cross-check that the operator does match the index.
1428+
*/
1429+
opfamily = index->rd_opfamily[varattno - 1];
1430+
}
14201431
}
14211432

14221433
get_op_opfamily_properties(opno, opfamily, isorderby,
@@ -1674,14 +1685,25 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
16741685
elog(ERROR, "indexqual doesn't have key on left side");
16751686

16761687
varattno = ((Var *) leftop)->varattno;
1677-
if (varattno < 1 || varattno > indnkeyatts)
1678-
elog(ERROR, "bogus index qualification");
16791688

16801689
/*
1681-
* We have to look up the operator's strategy number. This
1682-
* provides a cross-check that the operator does match the index.
1690+
* Special handling for ybctid column. This is currenly used only by
1691+
* yb_index_check() which executes indexqual of the form:
1692+
* 'ybctid IN (array-expression)'
16831693
*/
1684-
opfamily = index->rd_opfamily[varattno - 1];
1694+
if (varattno == YBTupleIdAttributeNumber)
1695+
opfamily = BYTEA_LSM_FAM_OID;
1696+
else
1697+
{
1698+
if (varattno < 1 || varattno > indnkeyatts)
1699+
elog(ERROR, "bogus index qualification");
1700+
1701+
/*
1702+
* We have to look up the operator's strategy number. This
1703+
* provides a cross-check that the operator does match the index.
1704+
*/
1705+
opfamily = index->rd_opfamily[varattno - 1];
1706+
}
16851707

16861708
get_op_opfamily_properties(opno, opfamily, isorderby,
16871709
&op_strategy,

src/postgres/src/backend/executor/ybModifyTable.c

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include "access/xact.h"
3131
#include "access/yb_scan.h"
3232
#include "catalog/catalog.h"
33+
#include "catalog/heap.h"
3334
#include "catalog/indexing.h"
3435
#include "catalog/pg_attribute.h"
3536
#include "catalog/pg_auth_members_d.h"
@@ -174,24 +175,30 @@ YBCComputeYBTupleIdFromSlot(Relation rel, TupleTableSlot *slot)
174175
* Don't need to fill in for the DocDB RowId column, however we still
175176
* need to add the column to the statement to construct the ybctid.
176177
*/
177-
if (attnum != YBRowIdAttributeNumber)
178+
if (attnum > 0)
178179
{
179-
Oid type_id = ((attnum > 0) ?
180-
TupleDescAttr(slot->tts_tupleDescriptor,
181-
attnum - 1)->atttypid :
182-
InvalidOid);
180+
Oid type_id = TupleDescAttr(slot->tts_tupleDescriptor,
181+
attnum - 1)->atttypid;
183182

184183
next_attr->type_entity = YbDataTypeFromOidMod(attnum, type_id);
185184
next_attr->collation_id = ybc_get_attcollation(RelationGetDescr(rel), attnum);
186185
next_attr->datum = slot_getattr(slot, attnum, &next_attr->is_null);
187186
}
188-
else
187+
else if (attnum == YBRowIdAttributeNumber)
189188
{
190189
next_attr->datum = 0;
191190
next_attr->is_null = false;
192191
next_attr->type_entity = NULL;
193192
next_attr->collation_id = InvalidOid;
194193
}
194+
else
195+
{
196+
Oid type_id = SystemAttributeDefinition(attnum)->atttypid;
197+
198+
next_attr->type_entity = YbDataTypeFromOidMod(attnum, type_id);
199+
next_attr->collation_id = ybc_get_attcollation(RelationGetDescr(rel), attnum);
200+
next_attr->datum = slot_getsysattr(slot, attnum, &next_attr->is_null);
201+
}
195202
YbcPgColumnInfo column_info = {0};
196203

197204
HandleYBTableDescStatus(YBCPgGetColumnInfo(ybc_table_desc,

src/postgres/src/backend/utils/misc/guc.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3536,6 +3536,18 @@ static struct config_bool ConfigureNamesBool[] =
35363536
NULL, NULL, NULL
35373537
},
35383538

3539+
{
3540+
{"yb_test_slowdown_index_check", PGC_SUSET, DEVELOPER_OPTIONS,
3541+
gettext_noop("Slows down yb_index_check() by sleeping for 1s after processing "
3542+
"every row. Used in tests to simulate long running yb_index_check()."),
3543+
NULL,
3544+
GUC_NOT_IN_SAMPLE
3545+
},
3546+
&yb_test_slowdown_index_check,
3547+
false,
3548+
NULL, NULL, NULL
3549+
},
3550+
35393551
{
35403552
{"yb_allow_dockey_bounds", PGC_SUSET, CUSTOM_OPTIONS,
35413553
gettext_noop("If true, allow lower_bound/upper_bound fields of PgsqlReadRequestPB "
@@ -5488,6 +5500,20 @@ static struct config_int ConfigureNamesInt[] =
54885500
NULL, NULL, NULL
54895501
},
54905502

5503+
{
5504+
{"yb_test_index_check_num_batches_per_snapshot", PGC_USERSET, DEVELOPER_OPTIONS,
5505+
gettext_noop("Used to test yb_index_check()"),
5506+
gettext_noop("If set to > 0, number of index rows processed per snapshot "
5507+
"is equal to yb_test_index_check_num_batches_per_snapshot*yb_bnl_batch_size "
5508+
"If set to 0, yb_index_check() will execute in single snapshot mode."),
5509+
GUC_NOT_IN_SAMPLE
5510+
},
5511+
&yb_test_index_check_num_batches_per_snapshot,
5512+
-1,
5513+
-1,
5514+
INT_MAX,
5515+
NULL, NULL, NULL
5516+
},
54915517
{
54925518
{"yb_fk_references_cache_limit", PGC_USERSET, CLIENT_CONN_STATEMENT,
54935519
gettext_noop("Sets the maximum size for the FK reference cache filled by the INSERT, SELECT ... FOR KEY SHARE or similar statmements"),

0 commit comments

Comments
 (0)