Skip to content

Conversation

devanbenz
Copy link

Adds property based testing framework for our compaction testings.
Current we support checking for "gaps" within adjacent compaction groups.

For example:

			expectedResult: func() TestLevelResults {
				return TestLevelResults{
					// Our rogue level 2 file should be picked up in the full compaction
					level4Groups: []tsm1.PlannedCompactionGroup{
						{
							tsm1.CompactionGroup{
								"000016844-000000002.tsm",
								"000016948-000000004.tsm",
								"000016948-000000005.tsm",
								"000017076-000000004.tsm",
								"000017094-000000004.tsm",
							},
							tsdb.DefaultMaxPointsPerBlock,
						},
					},
					// Other files should get picked up by optimize compaction
					level5Groups: []tsm1.PlannedCompactionGroup{
						{
							tsm1.CompactionGroup{
								"000016684-000000007.tsm",
								"000016684-000000008.tsm",
								"000016684-000000009.tsm",
								"000016684-000000010.tsm",
								"000016812-000000004.tsm",
								"000016812-000000005.tsm",
								"000017095-000000005.tsm",
							},
							tsdb.DefaultMaxPointsPerBlock,
						},
					},
				}
			},
		},

Would be invalid and cause our validator to fail due to the file "000017095-000000005.tsm" in level5Groups.

@devanbenz devanbenz marked this pull request as ready for review September 8, 2025 21:29
@devanbenz devanbenz self-assigned this Sep 8, 2025
@@ -4517,6 +4498,10 @@ func TestEnginePlanCompactions(t *testing.T) {
e.Scheduler.SetDepth(1, mockGroupLen)
e.Scheduler.SetDepth(2, mockGroupLen)

// Validate our test case based on compact_property_test.go
validationErr := ValidateTestCase(test, test.expectedResult())
Copy link
Contributor

@philjb philjb Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is validating the expectedResults() twice: once as they are extracted from test and then again as the second argument.

It should be validationErr := ValidateTestCase(test, level1Groups, level2Groups, Level3Groups, Level4Groups, Level5Groups)) where the groups come from e.PlanCompactions()... line 4526/4511

@davidby-influx
Copy link
Contributor

This seems unnecessarily complex for what it does. Would something as simple and stupid as https://goplay.tools/snippet/sKHiARnmzi9 work instead?

betweenGroup, inGroup := fileToGroup[betweenFile.filename]

if inGroup && betweenGroup != groupIdx {
return fmt.Errorf("gap detected: files %s and %s are in group %d, but file %s (between them) is in group %d",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected results: No Gaps Rule violation: gap detected: files 04-04.tsm and 09-04.tsm are in group 1, but file 05-03.tsm (between them) is in group 0

This error message is tough as the group indices don't tell me anything about which level group it came from. I think it should know which level group its talking about.

also: "no gaps rule" is odd description of the property. Maybe its me: the rule is that tsm files within a group must be adjacent in the master list of files. "Adjacency rule" seems more accurate. a "gap" sounds like a missing tsm file instead of one in the wrong place.

Copy link
Contributor

@philjb philjb Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the validation should tell me if the problem is in the expected results (aka I made a bad test case) or in the output of the compaction planner (the compaction planner is wrong). I know if the expected results are correct and valid and the compaction planner produces a bad plan, the test case will still fail, but more information is always helpful.

nvm -- i see that it does this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it'd be nice if all instances of the issues were reported instead of stopping at the first adjacency rule violation.

level4Groups: Level4Groups,
level5Groups: Level5Groups,
})
require.NoError(t, validationErr, "test validation failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let the rest of the test run; assert don't require.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

// validateNoGaps checks that there are no gaps within the same compaction level
// A gap occurs when files A and C are in the same level group, but file B (between A and C)
// is in a different level group, creating a "hole" in the sequence
func validateNoGaps(allFiles []string, groups []tsm1.CompactionGroup) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the algorithm in here is strange; why did you write it this way? I believe it works but it seems more complicated than it needs to be.

It also sorts the files that are in the input CompactionGroups... which hides errors of files being out of order in the group (yes this should get caught elsewhere too).

I don't understand why the algorithm isn't:

  1. parse the master file list filenames to get gens and seqs
  2. sort by generation and then seq within generation into a "sorted master slice"
  3. invert slice to a map[filename]->index in sorted master list
  4. for each group, for each file in the group check if file[i] has an index in the master map that is -1 from file[i+1] in the master map.

This will fail if the group's files aren't sorted or if the group's files aren't adjacent.

e.g.
sorted master files list -> index in sorted slice

0-5 -> 0
1-5 -> 1
2-4 -> 2
3-4 -> 3
4-4 -> 4
5-2 -> 5
6-1 -> 6
7-1 -> 7

Any group has to be a strict subset.

Group0: <-- PASS
1-5 -> 1
2-4 -> 2

Group1:
3-4 -> 3
4-4 -> 4
6-1 -> 6 <--- failure. index 6 is not 4+1. Adjacency failure.

Group2:
6-1 -> 6
5-2 -> 5 <--- failure. index 5 is not 6+1; group file ordering failure.

For the Group1, it'd be easy to print the files in between because you have the index 4 and index 6, so you print the file at index 5 e.g.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. I just noticed that @davidby-influx proposed exactly the same algorithm 6 hours ago in his comment.

@devanbenz
Copy link
Author

devanbenz commented Sep 9, 2025

While rewriting the algorithm for this I stumbled upon this test case (which I believe is wrong)

		{
			// This test will mock a 'backfill' condition where we have a single
			// shard with many generations. The initial generation should be fully
			// compacted, but we have some new generations that are not. We need to ensure
			// the optimize planner will pick these up and compact everything together.
			name: "Backfill mock condition",
			files: []tsm1.ExtFileStat{
				{
					FileStat: tsm1.FileStat{
						Path: "01-05.tsm1",
						Size: 2048 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "01-06.tsm1",
						Size: 2048 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "01-07.tsm1",
						Size: 2048 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "02-04.tsm1",
						Size: 700 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "02-05.tsm1",
						Size: 500 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "02-06.tsm1",
						Size: 400 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "03-02.tsm1",
						Size: 700 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "03-03.tsm1",
						Size: 500 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultAggressiveMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "03-04.tsm1",
						Size: 400 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "04-01.tsm1",
						Size: 700 * 1024 * 1024,
					},
					FirstBlockCount: tsdb.DefaultMaxPointsPerBlock,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "04-02.tsm1",
						Size: 500 * 1024 * 1024,
					},
					FirstBlockCount: 100,
				},
				{
					FileStat: tsm1.FileStat{
						Path: "03-03.tsm1",
						Size: 400 * 1024 * 1024,
					},
					FirstBlockCount: 10,
				},
			},
			testShardTime: -1,
			expectedResult: func() TestLevelResults {
				return TestLevelResults{
					level5Groups: []tsm1.PlannedCompactionGroup{
						{
							tsm1.CompactionGroup{
								"01-05.tsm1",
								"01-06.tsm1",
								"01-07.tsm1",
								"02-04.tsm1",
								"02-05.tsm1",
								"02-06.tsm1",
								"03-02.tsm1",
								"03-03.tsm1",
								"03-04.tsm1",
								"03-03.tsm1",
								"04-01.tsm1",
								"04-02.tsm1",
							}, tsdb.DefaultAggressiveMaxPointsPerBlock},
					},
				}
			},
		},

Notice the generation 3 TSM file at the end of the test input. This would be a violation. The property test validator should account for this as well I assume?

I'm removing complexity, getting rid of the sort for input that would cause this to pass, and cleaning this up.

@philjb
Copy link
Contributor

philjb commented Sep 9, 2025

Notice the generation 3 TSM file at the end of the test input. This would be a violation. The property test validator should account for this as well I assume?

The input tsm file list has a duplicate named tsm file, namely 03-03.tsm1, in the test you reference. So yes, that's a mistake in the test setup.

For this pr; please focus on just the adjacency aspect; let's get that right and merged so we can return to PF; we can add other checks (unique list in input files, sorted files in groups, etc) in separate prs -- they are checks are properties too, which is nice, but they are less important.

func validateFileAdjacency(allFiles []string, groups []tsm1.CompactionGroup) error {
var inputFiles []fileInfo
for i, file := range allFiles {
gen, seq, err := tsm1.DefaultParseFileName(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're not going to sort the input list, why do we need to parse the filename? We only need gen and seq for sorting the input file list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort the input list? For tests, we know the files will be in order, but for random input from production code, does Glob have order guarantees?


var fileMap = make(map[string]fileInfo, len(inputFiles))
for _, file := range inputFiles {
fileMap[file.fileName] = file
Copy link
Contributor

@philjb philjb Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without the sorting this can be just

for i, file := range allFiles {
    fileMap[file] = i
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I can probably condense this down and just build the map directly now.

Copy link
Contributor

@philjb philjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple minor changes but please make them.


// validateFileAdjacency checks that there are no gaps between compaction groups
// An adjacency violation occurs when files A and C are in different groups, but file B (between A and C)
// is also in a different group, creating overlapping or non-contiguous ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing to me. It says if A and C are in different compaction groups and B is in a different group too that's a problem. I don't think that's a problem.

All the files within a planned compaction group must be adjacent with respect to the master list of files (aka allFiles).

Would you update this comment please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this sound?

// validateFileAdjacency checks that there are no adjacency violations between TSM files  
// The following example will highlight an adjacency violation:
// Given the following list of files [01-01.tsm, 02-02.tsm, 03-03.tsm] let's say we have the following compaction plans created
// Group 1: [01-01.tsm, 03-03.tsm] & Group 2: [02-02.tsm]
// This violates file adjacency as the first compaction group sees [01-01.tsm, X, 03-03.tsm] and the second group: [X, 02-02.tsm, X] 
// these are non-contiguous blocks, when the first group performs compaction we will have two files that are out of order compacted together. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

You could also mention that we have this rule because the ordering of the files indicates which overwrite of a data point wins (newest write wins).

var errs []error

// Extract all filenames from test case
var allFiles []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're going to touch this for the comment change, please preallocate this since you know how big it will be. ci pays for allocations too.

@philjb
Copy link
Contributor

philjb commented Sep 10, 2025

And there are some failing tests according to ci that need to be addressed.

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, perhaps actionable, but worth understanding the issues

Validator: validateFileAdjacency,
}

type fileInfo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the wrapping struct?

func validateFileAdjacency(allFiles []string, groups []tsm1.CompactionGroup) error {
var inputFiles []fileInfo
for i, file := range allFiles {
gen, seq, err := tsm1.DefaultParseFileName(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort the input list? For tests, we know the files will be in order, but for random input from production code, does Glob have order guarantees?

// An adjacency violation occurs when files A and C are in different groups, but file B (between A and C)
// is also in a different group, creating overlapping or non-contiguous ranges
func validateFileAdjacency(allFiles []string, groups []tsm1.CompactionGroup) error {
var fileMap = make(map[string]fileInfo, len(allFiles))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort files? Tests will surely have the correct order, but if we use this in Production code, does Glob have ordering promises?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glob actually does not have ordering promises in production. When I was implementing a psuedo version of this in the PR where I implemented adjacency checks (https://github.com/influxdata/influxdb/pull/26766/files#diff-e0d6c334f09b26e18b81206ad3b242f2fd1ef09a7b23ea326a13eea3ac916a9aR2478-R2483) I did have to sort, if we were to ever pull this out and utilize it for other than tests I think that it would be beneficial to keep the sorting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests will surely have the correct order

apparently not!

as well as a better description of the issue.

Re-add sorting
@@ -3519,15 +3500,15 @@ func TestEnginePlanCompactions(t *testing.T) {
},
{
FileStat: tsm1.FileStat{
Path: "03-03.tsm1",
Copy link
Contributor

@philjb philjb Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a duplicate tsm file in this test. there are two 03-03.tsm1 files in the input list.

The question is what do we think the intent of the test is. Do we

  • delete the duplicate
  • rename it as 05-03
  • rename it as 04-03

I think the intent was that this should be the third file in the 04 generation as the 04 generation is a level 1 generation. By making this file 05-03... there's now a level 3 generation after the level 1 generation. I don't think that's the intent of this test (although it is an interesting test case).

please rename to 04-03

return testLevelResults{
level1Groups: []tsm1.PlannedCompactionGroup{
{
tsm1.CompactionGroup{"05-01.tsm", "06-01.tsm", "07-01.tsm", "08-01.tsm", "09-01.tsm", "10-01.tsm", "11-01.tsm", "12-01.tsm"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, what is the intent of the original test. Mock another planned level inside scheduler aggress blocks middle is tough for me to understand, but you've significantly changed the expectations of this test so i think you've changed its intent too much. There's no level1 plan at all anymore!

I think you need to keep the tsm files the same names but properly sort them in files: []tsm1.ExtFileStat{ test setup. That should keep the test expectations unchanged. Please make it so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Sort the input test files, but change nothing else.

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @philjb suggests

return testLevelResults{
level1Groups: []tsm1.PlannedCompactionGroup{
{
tsm1.CompactionGroup{"05-01.tsm", "06-01.tsm", "07-01.tsm", "08-01.tsm", "09-01.tsm", "10-01.tsm", "11-01.tsm", "12-01.tsm"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Sort the input test files, but change nothing else.

Copy link
Contributor

@philjb philjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@devanbenz devanbenz merged commit 3f1f4a0 into master-1.x Sep 10, 2025
9 checks passed
@devanbenz devanbenz deleted the db/compaction-property-based-testing branch September 10, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants