-
Notifications
You must be signed in to change notification settings - Fork 75
Join docs #1437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Join docs #1437
Changes from 9 commits
1ce08a4
4eba809
0fe349a
4e5e60b
b64b96c
6a6fcff
a205d04
63b420d
0642463
278b781
ca9e963
cf5313c
623238e
3ded9e1
1ffbb39
c206279
7a72ccd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
[//]: # (title: join) | ||
|
||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.multiple.JoinSamples--> | ||
|
||
Joins two [`DataFrame`](DataFrame.md) object by join columns. | ||
|
||
```kotlin | ||
join(otherDf, type = JoinType.Inner) [ { joinColumns } ] | ||
|
||
joinColumns: JoinDsl.(LeftDataFrame) -> Columns | ||
|
||
interface JoinDsl: LeftDataFrame { | ||
|
||
val right: RightDataFrame | ||
|
||
fun DataColumn.match(rightColumn: DataColumn) | ||
} | ||
``` | ||
|
||
`joinColumns` is a [column selector](ColumnSelectors.md) that defines column mapping for join: | ||
|
||
Related operations: [](multipleDataFrames.md) | ||
|
||
## Examples | ||
|
||
<!---FUN notebook_test_join_3--> | ||
|
||
```kotlin | ||
dfAges | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_3.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_5--> | ||
|
||
```kotlin | ||
dfCities | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_5.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_6--> | ||
|
||
```kotlin | ||
// INNER JOIN on differently named keys: | ||
// Merge a row when dfAges.firstName == dfCities.name. | ||
// With the given data all 3 names match → all rows merge. | ||
dfAges.join(dfCities) { firstName match right.name } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_6.html" width="100%" height="500px"></inline-frame> | ||
|
||
If mapped columns have the same name, just select join columns from the left [`DataFrame`](DataFrame.md): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm it's hard to see where the previous example ends and the new one begins. Maybe you could give them a small title? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you do exactly that below :) nice |
||
|
||
<!---FUN notebook_test_join_8--> | ||
|
||
```kotlin | ||
dfLeft | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_8.html" width="100%" height="500px"></inline-frame> | ||
|
||
|
||
<!---FUN notebook_test_join_10--> | ||
|
||
```kotlin | ||
dfRight | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_10.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_11--> | ||
|
||
```kotlin | ||
// INNER JOIN on "name" only: | ||
// Merge when left.name == right.name. | ||
// Duplicate keys produce multiple merged rows (one per pairing). | ||
dfLeft.join(dfRight) { name } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_11.html" width="100%" height="500px"></inline-frame> | ||
|
||
If `joinColumns` is not specified, columns with the same name from both [`DataFrame`](DataFrame.md) | ||
objects will be used as join columns: | ||
|
||
|
||
<!---FUN notebook_test_join_12--> | ||
|
||
```kotlin | ||
// INNER JOIN on all same-named columns ("name" and "city"): | ||
// Merge when BOTH name AND city are equal; otherwise the row is dropped. | ||
dfLeft.join(dfRight) | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_12.html" width="100%" height="500px"></inline-frame> | ||
|
||
|
||
## Join types | ||
|
||
Supported join types: | ||
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects | ||
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md) | ||
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null` | ||
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null` | ||
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null` | ||
* `Exclude` — only mismatched rows from left [`DataFrame`](DataFrame.md) | ||
|
||
For every join type there is a shortcut operation: | ||
|
||
```kotlin | ||
df.innerJoin(otherDf) [ { joinColumns } ] | ||
df.filterJoin(otherDf) [ { joinColumns } ] | ||
df.leftJoin(otherDf) [ { joinColumns } ] | ||
df.rightJoin(otherDf) [ { joinColumns } ] | ||
df.fullJoin(otherDf) [ { joinColumns } ] | ||
df.excludeJoin(otherDf) [ { joinColumns } ] | ||
``` | ||
|
||
|
||
### Examples {id="examples_1"} | ||
|
||
<!---FUN notebook_test_join_13--> | ||
|
||
```kotlin | ||
dfLeft | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_14--> | ||
|
||
```kotlin | ||
dfRight | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_15--> | ||
|
||
```kotlin | ||
// INNER JOIN: | ||
// Keep only rows where (name, city) match on both sides. | ||
// In this dataset both Charlies match twice (Moscow, Milan) → 2 merged rows. | ||
dfLeft.innerJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_15.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_16--> | ||
|
||
```kotlin | ||
// FILTER JOIN: | ||
// Keep ONLY left rows that have ANY match on (name, city). | ||
// No right-side columns are added. | ||
dfLeft.filterJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_16.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_17--> | ||
|
||
```kotlin | ||
// LEFT JOIN: | ||
// Keep ALL left rows. If (name, city) matches, attach right columns; | ||
// if not, right columns are null (e.g., Alice–London has no right match). | ||
dfLeft.leftJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_17.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_18--> | ||
|
||
```kotlin | ||
// RIGHT JOIN: | ||
// Keep ALL right rows. If no left match, left columns become null | ||
// (e.g., Alice with city=null exists only on the right). | ||
dfLeft.rightJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_18.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_19--> | ||
|
||
```kotlin | ||
// FULL JOIN: | ||
// Keep ALL rows from both sides. Where there's no match on (name, city), | ||
// the other side is filled with nulls. | ||
dfLeft.fullJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_19.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_20--> | ||
|
||
```kotlin | ||
// EXCLUDE JOIN: | ||
// Keep ONLY left rows that have NO match on (name, city). | ||
// Useful to find "unpaired" left rows. | ||
dfLeft.excludeJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_20.html" width="100%" height="500px"></inline-frame> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you just copied this part from the original file