Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Jul 11, 2025

This PR extends the DataFrame Spring integration from CSV-only to comprehensive multi-format support, following Spring Data patterns for unified data source management.

Multi-Format Data Source Support

The implementation now supports all major DataFrame formats through dedicated annotations:

CSV Data Sources

@CsvDataSource(file = "sales.csv", delimiter = ',', header = true)
lateinit var salesData: DataFrame<*>

@CsvDataSource(file = "products.tsv", delimiter = '\t')
lateinit var productData: DataFrame<*>

JSON Data Sources

@JsonDataSource(file = "users.json")
lateinit var userData: DataFrame<*>

@JsonDataSource(
    file = "complex.json",
    typeClashTactic = JSON.TypeClashTactic.ANY_COLUMNS,
    keyValuePaths = ["user.preferences", "config.settings"]
)
lateinit var complexData: DataFrame<*>

Arrow/Parquet Data Sources

@ArrowDataSource(file = "analytics.parquet")
lateinit var analyticsData: DataFrame<*>

@ArrowDataSource(file = "timeseries.arrow", format = ArrowFormat.IPC)
lateinit var timeseriesData: DataFrame<*>

JDBC Data Sources

@JdbcDataSource(
    connectionBean = "dataSource",
    tableName = "customers"
)
lateinit var customerData: DataFrame<*>

@JdbcDataSource(
    url = "jdbc:h2:mem:testdb",
    username = "sa", 
    password = "",
    query = "SELECT * FROM orders WHERE status = 'COMPLETED'"
)
lateinit var orders: DataFrame<*>

Spring Data-Inspired Architecture

The design follows established Spring Data patterns:

  • Declarative Annotations: Similar to @Query in Spring Data JPA
  • Strategy Pattern: Format-specific processors handle different data sources
  • Bean Integration: Leverages existing Spring infrastructure for connections
  • Property Placeholders: Support for externalized configuration via ${...}
  • Type Safety: Compile-time validation of format-specific parameters

Advanced Parameter Management

Each annotation provides type-safe parameters specific to its format:

  • CSV: Custom delimiters, header configuration
  • JSON: Type clash tactics, key-value path processing, number unification
  • Arrow: Format detection, nullability options
  • JDBC: Connection beans, custom queries, result limits

Real-World Usage

@Component
class AnalyticsService {
    
    @CsvDataSource(file = "exports/customers.csv")
    lateinit var customers: DataFrame<*>
    
    @JsonDataSource(file = "logs/events.json")
    lateinit var events: DataFrame<*>
    
    @ArrowDataSource(file = "ml/features.parquet")
    lateinit var features: DataFrame<*>
    
    @JdbcDataSource(
        connectionBean = "metricsDataSource",
        query = "SELECT * FROM metrics WHERE timestamp >= NOW() - INTERVAL '1 hour'"
    )
    lateinit var realtimeMetrics: DataFrame<*>
    
    fun generateReport() {
        // All data sources automatically loaded and ready to use
    }
}

Backward Compatibility

The original @DataSource annotation remains supported but is deprecated in favor of the more explicit @CsvDataSource.

Benefits

  1. Unified Data Access: Consistent annotation-based approach across all formats
  2. Reduced Boilerplate: No manual DataFrame construction required
  3. Spring Integration: Leverages existing Spring beans and configuration
  4. Developer Experience: Follows familiar Spring Data patterns
  5. Type Safety: Format-specific parameters with compile-time validation

The implementation transforms DataFrame Spring integration from a CSV-specific utility into a comprehensive data access layer that rivals Spring Data in functionality while maintaining the power and flexibility of the DataFrame API.

Fixes #1321.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@Copilot Copilot AI changed the title [WIP] Explore the possiblity to define schema and constunct object in Spring (DI framework) style Implement Spring DI-style DataFrame initialization with @DataSource annotation Jul 11, 2025
Copilot finished work on behalf of zaleslaw July 11, 2025 13:03
@Copilot Copilot AI requested a review from zaleslaw July 11, 2025 13:03
@zaleslaw
Copy link
Collaborator

@copilot extend this module to support of the different file formats (CSV, JSON, Parquet, Arrow) and try to find optimal solution if we have a lot of parameters for each file type, explore Spring Data approach to work with different data sources, also we also support JDBC, could it be also covered?

Copy link
Author

Copilot AI commented Aug 22, 2025

@copilot extend this module to support of the different file formats (CSV, JSON, Parquet, Arrow) and try to find optimal solution if we have a lot of parameters for each file type, explore Spring Data...

I've implemented comprehensive multi-format support for the DataFrame Spring integration, extending it far beyond the original CSV-only implementation. The new design follows Spring Data patterns and includes:

🎯 Multi-Format Support:

  • @CsvDataSource - CSV/TSV files with custom delimiters
  • @JsonDataSource - JSON files with type clash handling
  • @ArrowDataSource - Arrow/Parquet/Feather files with format auto-detection
  • @JdbcDataSource - Database tables/queries with connection pooling

🏗️ Spring Data-Inspired Architecture:

  • Declarative annotations similar to Spring Data repositories
  • Strategy pattern for handling different data source types
  • Bean reference support for database connections
  • Property placeholder resolution for externalized config
  • Comprehensive error handling with meaningful messages

⚙️ Advanced Parameter Management:

  • Type-safe parameters specific to each format
  • Flexible configuration supporting complex scenarios
  • Sensible defaults minimizing boilerplate
  • Support for Spring's ${...} property placeholders

📊 JDBC Integration:
Uses existing dataframe-jdbc module capabilities:

@JdbcDataSource(
    connectionBean = "dataSource", 
    query = "SELECT * FROM orders WHERE status = 'COMPLETED'",
    limit = 1000
)
lateinit var orders: DataFrame<*>

The implementation provides exactly what you requested - a unified approach handling multiple formats with optimal parameter management, closely following Spring Data patterns while maintaining the power and flexibility of the DataFrame API.

Commit: 66b692e

@Copilot Copilot AI changed the title Implement Spring DI-style DataFrame initialization with @DataSource annotation Implement comprehensive multi-format DataFrame Spring integration with Spring Data patterns Aug 22, 2025
Copilot finished work on behalf of zaleslaw August 22, 2025 10:28
…ame reading methods

This commit includes a refactor to use more specific annotations (`CsvDataSource` and others), upgrades DataFrame Spring to Java 17, adjusts build scripts to handle Java 17 modules, and replaces deprecated `readJdbc` with new `readSqlQuery` and `readSqlTable` methods in DataFrame processing.
…ocessing logic

This commit introduces a new detailed Spring-style integration example (`SpringIntegrationExample2.kt`), showcasing advanced usage patterns and GitHub issue resolution (#1321). Updates also include improvements in DataFrame field injection logic to handle enhanced annotation processing, robust property checks, and better fallback mechanisms for ApplicationContext. Additionally, minor tweaks enable broader compatibility and extensibility within the Spring ecosystem.
…and added new Spring integration demos

This commit deprecates the legacy `@DataSource` annotation in favor of the more specific `@CsvDataSource`. It removes outdated example files and introduces new detailed Spring integration examples demonstrating annotation-based DataFrame initialization, including `CsvDataSource_with_Application_Context` and `CsvDataSource_with_Configuration`. Adjustments also include sample data reorganization and updates to tests for compatibility.
…ration

Introduce a comprehensive Spring Boot example (`springboot-dataframe-web`) showcasing annotated CSV-based data source initialization, web controllers, Thymeleaf templates, and sample data files. The example includes customer and sales reports with sorting and filtering functionalities, leveraging DataFrame operations and Spring Boot features.
… configuration, and sample data

Added Spring Boot Actuator dependency to `springboot-dataframe-web`, introduced `DataFrameConfiguration` for better DataFrame post-processing, and updated CSV data sources for customers and sales. Adjusted annotations, enhanced lifecycle handling in `DataFramePostProcessor`, and added visual documentation and sample data files. Updated build scripts for Java 17 compatibility.
# Conflicts:
#	settings.gradle.kts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore the possiblity to define schema and constunct object in Spring (DI framework) style
2 participants