A Model Context Protocol (MCP) server that enables Large Language Models to perform sophisticated Business Intelligence analytics on data stored in Amazon S3 using the powerful super library.
The S3 Super MCP Server bridges the gap between natural language queries and complex data analytics. It allows AI assistants like Claude Desktop to understand business questions in plain English and automatically convert them into optimized SuperSQL queries that can analyze massive datasets stored in S3.
- Natural Language Processing: Convert business questions into SQL queries automatically
- Multi-Format Support: Query JSON, CSV, Parquet, and other formats seamlessly
- Business Intelligence Templates: Pre-built analytics for revenue, customer analysis, churn prediction
- High Performance: Leverages super's optimized query engine for fast results
- Serverless Ready: Deploy to AWS Lambda for automatic scaling
- Production Grade: Comprehensive error handling, monitoring, and security
- Executive Dashboards: "Show me revenue trends by product category this quarter"
- Operational Analytics: "Find all API errors in the last 24 hours by service"
- Customer Intelligence: "Segment our users by value and identify top spenders"
- Log Analysis: "Analyze response times and identify performance bottlenecks"
- Python 3.11 or higher
- AWS credentials configured
- super binary installed
- Access to S3 buckets containing your data
macOS:
brew install brimdata/tap/super
Linux:
cd /tmp
wget https://github.com/brimdata/super/releases/latest/download/super-linux-amd64.tar.gz
tar -xzf super-linux-amd64.tar.gz
sudo mv super-linux-amd64/super /usr/local/bin/
sudo chmod +x /usr/local/bin/super
Windows: Download from super releases and add to PATH.
# Clone the repository
git clone https://github.com/your-org/s3-super-mcp-server.git
cd s3-super-mcp-server
# Install dependencies
pip install -r requirements.txt
# Verify installation
python -c "import src.mcp_server; print('✅ Installation successful')"
Configure AWS credentials using one of these methods:
# Option 1: AWS CLI
aws configure
# Option 2: Environment Variables
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1
# Option 3: IAM Roles (recommended for production)
# Use IAM roles with appropriate S3 permissions
Your AWS credentials need these S3 permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:ListObjects",
"s3:ListObjectsV2"
],
"Resource": [
"arn:aws:s3:::your-data-bucket",
"arn:aws:s3:::your-data-bucket/*"
]
}
]
}
Add to your Claude Desktop configuration (claude_desktop_config.json
):
{
"mcpServers": {
"s3-super-mcp": {
"command": "python",
"args": ["/path/to/s3-super-mcp-server/src/mcp_server.py"],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1"
}
}
}
}
The server implements standard MCP protocol and works with any compliant client. Run locally:
python src/mcp_server.py
Once configured with your MCP client, you can ask natural language questions:
Revenue Analysis:
"Show me the top 10 products by revenue in the last 30 days"
Customer Segmentation:
"Segment our customers by total spend and show average order values"
Log Analysis:
"Find all errors in API logs from yesterday grouped by service"
Performance Monitoring:
"What's the average response time by endpoint over the last week?"
The MCP server provides these tools for AI assistants:
Primary interface for natural language queries
- Converts business questions to SuperSQL
- Executes queries against S3 data
- Returns formatted results
Data discovery and schema analysis
- Analyzes data structure and types
- Provides field statistics
- Suggests query patterns
Pre-built business intelligence calculations
- Revenue analysis
- Customer lifetime value
- Churn analysis
- Conversion funnels
- Cohort analysis
Multi-source data analysis
- Join data across S3 locations
- Combine different data formats
- Unified analysis across datasets
Data validation and profiling
- Completeness assessment
- Consistency validation
- Quality scoring
The server works best with structured data in S3:
E-commerce Transactions (s3://your-bucket/transactions/*.json
):
{
"transaction_id": "txn_12345",
"user_id": "user_789",
"product": "laptop",
"amount": 1299.99,
"timestamp": "2024-01-15T10:30:00Z",
"channel": "web"
}
Application Logs (s3://your-bucket/logs/*.json
):
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "ERROR",
"service": "api-gateway",
"message": "Connection timeout",
"response_time_ms": 5000
}
# Run the MCP server locally
python src/mcp_server.py
# Run tests
pytest tests/ -v
# Check code quality
black src/ tests/
ruff check src/ tests/
Deploy to AWS Lambda for production use:
# Build and deploy with SAM
sam build
sam deploy --guided
# Or use the deployment script
./scripts/deploy.sh production
The Lambda deployment provides:
- Automatic scaling based on demand
- Built-in monitoring and logging
- Cost-effective pay-per-query pricing
- High availability across regions
Configure the server with these environment variables:
Variable | Description | Default |
---|---|---|
AWS_REGION |
AWS region for S3 access | us-east-1 |
LOG_LEVEL |
Logging verbosity | INFO |
SUPER_BINARY_PATH |
Path to super binary | /usr/local/bin/super |
QUERY_TIMEOUT |
Query timeout in seconds | 300 |
MAX_RESULTS |
Maximum results per query | 10000 |
- Use specific S3 paths to reduce data scanning
- Apply filters early in queries to minimize processing
- Consider converting JSON to Parquet for better performance
- Partition S3 data by date/region for faster queries
- Use consistent field naming across datasets
- Store frequently queried data in optimized formats
- Monitor S3 data transfer costs
- Use S3 Intelligent Tiering for cost optimization
- Set appropriate query timeouts to prevent runaway costs
Super binary not found:
# Verify installation
which super
super --version
# If not found, reinstall following installation instructions
AWS permissions error:
# Test S3 access
aws s3 ls s3://your-bucket/
# Verify credentials
aws sts get-caller-identity
Query timeout:
- Reduce dataset size with more specific S3 paths
- Use sampling for exploratory queries
- Increase timeout for complex analytics
Memory issues:
- Use streaming queries with LIMIT clauses
- Process data in smaller chunks
- Consider upgrading Lambda memory allocation
Enable verbose logging for troubleshooting:
export LOG_LEVEL=DEBUG
python src/mcp_server.py
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests with coverage
pytest tests/ --cov=src --cov-report=html
# Format code
black src/ tests/
isort src/ tests/
# Lint code
ruff check src/ tests/
mypy src/
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: GitHub Wiki
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- super - The analytics engine powering this server
- Model Context Protocol - The standard protocol for AI tool integration
- Claude Desktop - AI assistant with MCP support
Built with ❤️ for the data community. Enable your AI assistant to unlock insights from your S3 data lakes.