-
Notifications
You must be signed in to change notification settings - Fork 54
feat: add support for nested loop join #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the most outstanding PRs I have ever received in this repository. I have given some suggestions and please add some test cases under the file slt/join.slt
.
There seems to be some problem with ci, I will try to fix it |
please update the repo and try ci again |
I noticed that Tips: This physical option is used for Cost calculation, but currently only supports index selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/execution/volcano/mod.rs
Outdated
@@ -75,7 +77,12 @@ pub fn build_read<T: Transaction>(plan: LogicalPlan, transaction: &T) -> BoxedEx | |||
let right_input = childrens.pop().unwrap(); | |||
let left_input = childrens.pop().unwrap(); | |||
|
|||
HashJoin::from((op, left_input, right_input)).execute(transaction) | |||
match &op.on { | |||
JoinCondition::On { on, .. } if !on.is_empty() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very sorry. I checked it carefully again on my computer and found that there are three slt files that need to open the comment case.
tests\slt\sql_2016\F041_08.slt
tests\slt\crdb\sqlite.slt
L33tests\slt\subquery.slt
L37
At the same time, I discovered a new bug: still using HashJoin
when or
exists in the Join condition leads to incorrect results.
case on sqlite.slt
L33
Tips: I think you can add a tag to the JoinOperator
to decide whether to use NestLoopJoin
or HashJoin
in the Bind stage.
@@ -791,7 +791,12 @@ impl<'a, T: Transaction> Binder<'a, T> { | |||
)?; | |||
} | |||
BinaryOperator::Or => { | |||
todo!("`NestLoopJoin` is not supported yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is correct when Or
does not exist
let _ = fnck_sql
.run("CREATE TABLE tab64784(pk INTEGER primary key, col0 INTEGER, col1 FLOAT, col2 VARCHAR, col3 INTEGER, col4 FLOAT, col5 VARCHAR)")
.await?;
let _ = fnck_sql
.run("INSERT INTO tab64784 VALUES(0,212,202.62,'nshdy',212,208.79,'wsxfc'),(1,213,203.64,'xwfuo',213,209.26,'lyswz'),(2,214,204.82,'jnued',216,210.48,'qczzf'),(3,215,205.40,'jtijf',217,211.96,'dpugl'),(4,216,206.3,'dpdzk',219,212.43,'xfirg'),(5,218,207.43,'qpwyw',220,213.50,'fmgky'),(6,219,208.3,'uooxb',221,215.30,'xpmdy'),(7,220,209.54,'ndtbb',225,218.8,'ivqyw'),(8,221,210.65,'zjpts',226,219.82,'sezsm'),(9,222,211.57,'slaxq',227,220.91,'bdqyb');")
.await?;
let (schema, tuples) = fnck_sql.run("explain SELECT pk, col0 FROM tab64784 WHERE col0 IN (SELECT col3 FROM tab64784 WHERE col3 IS NULL OR (col1 < 22.54) OR col4 > 85.74) OR ((col4 IS NULL))").await?;
println!("{}", create_table(&schema, &tuples));
let (schema, tuples) = fnck_sql.run("explain SELECT pk, col0 FROM tab64784 WHERE col0 IN (SELECT col3 FROM tab64784 WHERE col3 IS NULL OR (col1 < 22.54) OR col4 > 85.74)").await?;
println!("{}", create_table(&schema, &tuples));
+-------------------------------------------------------------------------------------------------------------------------+
| PLAN |
+=========================================================================================================================+
| Projection [tab64784.pk, tab64784.col0] [Project] |
| LeftSemi Join Where ((tab64784.col0 = (tab64784.col3) as (_temp_table_1_.col3)) || tab64784.col4 is null) [HashJoin] |
| Scan tab64784 -> [pk, col0, col3, col4] [SeqScan] |
| Projection [tab64784.col3] [Project] |
| Filter ((tab64784.col3 is null || (tab64784.col1 < 22.54)) || (tab64784.col4 > 85.74)), Is Having: false [Filter] |
| Scan tab64784 -> [col1, col3, col4] [SeqScan] |
+-------------------------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------------------------+
| PLAN |
+=========================================================================================================================+
| Projection [tab64784.pk, tab64784.col0] [Project] |
| LeftSemi Join On tab64784.col0 = (tab64784.col3) as (_temp_table_1_.col3) [HashJoin] |
| Scan tab64784 -> [pk, col0, col3] [SeqScan] |
| Projection [tab64784.col3] [Project] |
| Filter ((tab64784.col3 is null || (tab64784.col1 < 22.54)) || (tab64784.col4 > 85.74)), Is Having: false [Filter] |
| Scan tab64784 -> [col1, col3, col4] [SeqScan] |
+-------------------------------------------------------------------------------------------------------------------------+
I just wonder that do all the queries(including subqueries) use the same binder info ? I found something wrong with the filter info Binary {
op: Or,
left_expr: Binary {
op: Or,
left_expr: Binary {
op: Eq,
left_expr: Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
name: "col0",
table_name: Some("tab64784"),
},
},
),
pos: 1,
},
right_expr: Reference { -- should be in right table
expr: Alias {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
name: "col3",
table_name: Some("tab64784"),
},
},
),
alias: Expr(
ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
name: "col3",
table_name: Some("_temp_table_1_"),
},
},
),
),
},
pos: 2, -- here is left table, but should be right table
},
},
right_expr: Binary {
op: And,
left_expr: IsNull {
negated: false,
expr: Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
name: "col4",
table_name: Some("tab64784"),
},
},
),
pos: 3,
},
},
right_expr: Binary {
op: Lt,
left_expr: Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
name: "col3",
table_name: Some("tab64784"),
},
},
),
pos: 2, -- here is left table
},
right_expr: Constant(
Int32(8),
),
},
},
},
right_expr: Binary { ... }
} |
you are right! this is only possible in
|
I re-project the subquery so that the column output by the subquery can retain the You can move the code to your PR and try to see if it solves the problem (I tested it myself and it was fine) pr: #168 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a profoundly influential PR, which fixes many problems related to Subquery and Join.
Tips: I added the test file and fixed join explain display.
Projection [a.id, a.v1, a.v2, b.id, b.v3, b.v4, b.v5] [Project]
Cross Join Nothing [NestLoopJoin]
Scan a -> [id, v1, v2] [SeqScan]
Scan b -> [id, v3, v4, v5] [SeqScan]
What problem does this PR solve?
Support nested loop join, but full join is not supported.
There are still problems in
explain
output.Issue link: #156
What is changed and how it works?
cross join:
left join:
right join
subquery:
Code changes
Check List
Tests
Side effects
Note for reviewer