(internal MVP) Prototype new iteration of ImportDataSchema annotation #1416

koperagen · 2025-08-28T15:55:11Z

This PR enables basic idea of a new workflow with imported schemas:

KSP processor finds "import schema" declarations (now CLASSES, not KT FILES), matches declaration to reader, reads data, writes schemas to directory. Can be triggered as usual Gradle task whenever needed
Compiler plugin handles all codegen

The goal for now is MVP for internal testing.

How this approach different:

Schema declaration is now class, not file annotation. More convenient syntax for declarations
Schemas are stored under version control system, more transparent compared to hidden generated code directory

What's new:
Approach to custom formats and generic schema preprocessing
We explore idea of service loader in KSP processor. Anyone can provide SchemaReader implementation in another module of their project and generate schemas for arbitrary data, for example

KSP plugin now has two new parameters:

disables old annotation processing for DataSchema and ImportDataSchema so they don't conflict with the compiler plugin
output directory where json files will be generated, serves as input of compiler plugin

ksp {
  arg("dataframe.experimentalImportSchema", "true") 
  arg("dataframe.importedSchemasOutput", path)
}

Example setup:

plugins {
    kotlin("jvm") version "2.3.255-SNAPSHOT"
    kotlin("plugin.dataframe") version "2.3.255-SNAPSHOT"
    id("com.google.devtools.ksp") version "2.2.0-2.0.2"
}

repositories {
    mavenLocal()
    maven("https://packages.jetbrains.team/maven/p/kt/dev/")
    mavenCentral()
}

dependencies {
    val version = "1.0.0-dev"
    implementation("org.jetbrains.kotlinx:dataframe-core:$version")
    implementation("org.jetbrains.kotlinx:dataframe-json:$version")
    implementation("org.jetbrains.kotlinx:dataframe-csv:$version")
    ksp("org.jetbrains.kotlinx.dataframe:symbol-processor-all:$version")

    // Module with custom readers
    ksp(project(":reader"))
    implementation(project(":reader"))
    testImplementation(kotlin("test"))
}

tasks.test {
    useJUnitPlatform()
}

val schemasDir = layout.projectDirectory.dir("src/schemas")!!
ksp {
    arg("dataframe.importedSchemasOutput", schemasDir.toString())
    arg("dataframe.experimentalImportSchema", "true")
    arg("dataframe.resolutionDir", layout.projectDirectory.asFile.absolutePath)
}

kotlin {
    jvmToolchain(11)
    compilerOptions.freeCompilerArgs.addAll(
        "-P", "plugin:org.jetbrains.kotlin.dataframe:schemasPath=${schemasDir.asFile}"
    )
}

Jolanrensen · 2025-08-29T11:00:20Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt

@@ -43,6 +44,9 @@ public annotation class ImportDataSchema(
    val enableExperimentalOpenApi: Boolean = false,
 )

+@Target(AnnotationTarget.CLASS)
+public annotation class DataSchemaSource(val source: String, val qualifier: String = SchemaReader.DEFAULT_QUALIFIER)


I can guess what source does, but qualifier is unclear for me. Some comments would be nice, even though it's just a proof-of-concept

I think I remember vaguely from your demo that this allowed to make distinctions of some kind. But I don't remember exactly without a small example

May be some KDocs here as well?

Jolanrensen · 2025-08-29T11:02:16Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

@@ -59,6 +59,36 @@ public interface SupportedDataFrameFormat : SupportedFormat {
    public fun readDataFrame(file: File, header: List<String> = emptyList()): DataFrame<*>
 }

+/**
+ * User-facing API implemented by a companion object of an imported schema [org.jetbrains.kotlinx.dataframe.annotations.DataSchemaSource]


*the companion object

Jolanrensen · 2025-08-29T11:04:14Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

+
+/**
+ * Handler of classes annotated with [org.jetbrains.kotlinx.dataframe.annotations.DataSchemaSource].
+ * Implementations must have a single zero-argument constructor


they could also be object singletons maybe, since they have no state

Good idea! It first needs to be adjusted in compiler plugin

Jolanrensen · 2025-08-29T11:10:46Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

+
+    public fun accepts(path: String, qualifier: String): Boolean = qualifier == DEFAULT_QUALIFIER
+
+    public fun read(path: String): DataFrame<*>


we do need a way to pass extra arguments in the future. There are many ways to do this but we can figure that out later :)

Jolanrensen · 2025-08-29T11:12:51Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

+
+    public fun read(path: String): DataFrame<*>
+
+    public fun default(path: String): DataFrame<*> = read(path)


I'd still rename this to readDefault or readSource, something more imperative.

Jolanrensen · 2025-08-29T11:18:00Z

plugins/symbol-processor/src/main/kotlin/org/jetbrains/dataframe/ksp/toJsonElement.kt

+
+/**
+ * Serializes data schema into a human-readable JSON format.
+ * Input of compiler plugin for "imported data schema" feature


please add a tiny sample :) similar to what Nikita did in serialization_format.md It helps to see that this builds

Jolanrensen · 2025-08-29T11:19:19Z

plugins/symbol-processor/src/main/kotlin/org/jetbrains/dataframe/ksp/toJsonElement.kt

+ * Input of compiler plugin for "imported data schema" feature
+ */
+fun DataFrameSchema.toJsonString(
+    json: Json = Json { prettyPrint = true },


could be extracted to a private const

Jolanrensen · 2025-08-29T11:22:15Z

...ol-processor/src/main/kotlin/org/jetbrains/dataframe/ksp/DataFrameSymbolProcessorProvider.kt

+        val configuration = DataFrameConfiguration(
+            resolutionDir = environment.options["dataframe.resolutionDir"],
+            importedSchemasOutput = environment.options[DATAFRAME_IMPORTED_SCHEMAS_OUTPUT],
+            experimentalImportSchema = environment.options["dataframe.experimentalImportSchema"].equals(


probably better readable like:

environment.options["dataframe.experimentalImportSchema"] .equals("true", ignoreCase = true)

This also satisfies KtLint :)

AndreiKingsley · 2025-09-09T13:43:40Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt

@@ -43,6 +44,9 @@ public annotation class ImportDataSchema(
    val enableExperimentalOpenApi: Boolean = false,
 )

+@Target(AnnotationTarget.CLASS)
+public annotation class DataSchemaSource(val source: String, val qualifier: String = SchemaReader.DEFAULT_QUALIFIER)


May be some KDocs here as well?

koperagen requested review from zaleslaw, Jolanrensen and AndreiKingsley August 28, 2025 15:55

koperagen self-assigned this Aug 28, 2025

koperagen force-pushed the schema-readers branch from 6ab4ef8 to dee09cf Compare August 28, 2025 16:07

Prototype new iteration of ImportDataSchema annotation

9eacca5

koperagen force-pushed the schema-readers branch from dee09cf to 9eacca5 Compare August 29, 2025 08:38

Jolanrensen approved these changes Aug 29, 2025

View reviewed changes

AndreiKingsley approved these changes Sep 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(internal MVP) Prototype new iteration of ImportDataSchema annotation #1416

(internal MVP) Prototype new iteration of ImportDataSchema annotation #1416

Uh oh!

koperagen commented Aug 28, 2025 •

edited

Loading

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

AndreiKingsley Sep 9, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

koperagen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

Jolanrensen Aug 29, 2025

Uh oh!

AndreiKingsley Sep 9, 2025

Uh oh!

Uh oh!


		public fun accepts(path: String, qualifier: String): Boolean = qualifier == DEFAULT_QUALIFIER

		public fun read(path: String): DataFrame<*>


		public fun read(path: String): DataFrame<*>

		public fun default(path: String): DataFrame<*> = read(path)

(internal MVP) Prototype new iteration of ImportDataSchema annotation #1416

Are you sure you want to change the base?

(internal MVP) Prototype new iteration of ImportDataSchema annotation #1416

Uh oh!

Conversation

koperagen commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

koperagen commented Aug 28, 2025 •

edited

Loading