Skip to content

Dirac-Software/walstats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WAL Statistics Analyzer

A Go utility that parses PostgreSQL WAL dump output (from pg_waldump) and stores it in a DuckDB database for analysis.

Features

  • Parses pg_waldump text output with high-performance state machine parser
  • Parallel processing with configurable worker pool
  • Stores WAL records in DuckDB format for fast analytical queries
  • High-performance bulk loading using DuckDB's Appender API
  • Batch processing for efficient database writes
  • Support for both stdin and file input
  • Configurable batch size and parallelism for performance tuning
  • Progress reporting every 5 seconds during processing

Installation

go mod download
go build -o walstats

Usage

From stdin (pipe from pg_waldump)

pg_waldump /path/to/wal/000000010000000000000001 | ./walstats -output wal.duckdb

From saved file

# First save pg_waldump output
pg_waldump /path/to/wal/000000010000000000000001 > waldump.txt

# Then process it
./walstats -input waldump.txt -output wal.duckdb

Command-line Options

  • -input: Input file containing pg_waldump output (use "-" for stdin, default: "-")
  • -output: Output DuckDB database file (default: "waldump.duckdb")
  • -batch-size: Number of records to batch before writing (default: 1000)
  • -workers: Number of parallel workers (0 = number of CPU cores, default: 0)
  • -help: Show help message

Database Schema

The utility creates a wal_records table with the following columns:

  • lsn: Log Sequence Number (VARCHAR, PRIMARY KEY)
  • prev_lsn: Previous LSN (VARCHAR)
  • xid: Transaction ID (BIGINT)
  • resource_mgr: Resource manager name (VARCHAR)
  • record_type: Type of WAL record (VARCHAR)
  • length: Record length (INTEGER)
  • total_length: Total record length (INTEGER)
  • description: Full description from pg_waldump (TEXT)
  • timestamp: Transaction timestamp if available (TIMESTAMP)
  • database: Database OID (INTEGER) - parsed from rel field (2nd component)
  • tablespace: Tablespace OID (INTEGER) - parsed from rel field (1st component)
  • relfilenode: Relation file node (INTEGER)
  • block_number: Block number within relation (INTEGER)
  • fork_name: Fork name (main, fsm, vm, etc.) (VARCHAR)
  • inserted_at: When the record was inserted into DuckDB (TIMESTAMP)

Querying the Data

Once the data is in DuckDB, you can analyze it using SQL:

-- Connect to DuckDB
duckdb wal.duckdb

-- Count records by resource manager
SELECT resource_mgr, COUNT(*) as count 
FROM wal_records 
GROUP BY resource_mgr 
ORDER BY count DESC;

-- Find largest WAL records
SELECT lsn, resource_mgr, total_length, description 
FROM wal_records 
ORDER BY total_length DESC 
LIMIT 10;

-- Analyze transaction patterns
SELECT xid, COUNT(*) as record_count, SUM(total_length) as total_size
FROM wal_records
WHERE xid > 0
GROUP BY xid
ORDER BY total_size DESC;

-- Analyze activity by relation
SELECT database, tablespace, relfilenode, 
       COUNT(*) as record_count,
       SUM(total_length) as total_size
FROM wal_records
WHERE relfilenode IS NOT NULL
GROUP BY database, tablespace, relfilenode
ORDER BY total_size DESC;

-- Find hot blocks (frequently modified)
SELECT database, tablespace, relfilenode, block_number,
       COUNT(*) as modification_count
FROM wal_records
WHERE block_number IS NOT NULL
GROUP BY database, tablespace, relfilenode, block_number
ORDER BY modification_count DESC
LIMIT 20;

Requirements

  • Go 1.21 or higher
  • PostgreSQL 10 or higher (for pg_waldump)
  • DuckDB Go driver (automatically installed via go.mod)

Performance

  • State machine parser: 3-7x faster than regex-based parsing
  • Parallel processing: Scales linearly with CPU cores for parsing
  • DuckDB Appender API: 5-10x faster than standard SQL inserts
  • Benchmarks: Can process millions of WAL records per minute on modern hardware

Performance Tuning

  • Workers: Set to number of CPU cores for CPU-bound parsing (default: auto-detect)
  • Batch size: Larger batches (5000-10000) can improve throughput for large files
  • Memory usage: Scales with workers × batch-size × average record size

Version Compatibility

The utility is compatible with pg_waldump output from PostgreSQL 10 and later versions. The core format has remained consistent:

  • PostgreSQL 10-16: Fully compatible
  • PostgreSQL 9.6 and earlier: Uses pg_xlogdump (may need minor adjustments)

Format Elements Parsed

The parser handles the standard pg_waldump format:

rmgr: <manager> len (rec/tot): <rec>/<tot>, tx: <xid>, lsn: <lsn>, prev <prev_lsn>, desc: <description>

Key fields that remain consistent across versions:

  • Resource manager (rmgr)
  • Record lengths (len)
  • Transaction ID (tx)
  • Log Sequence Numbers (lsn, prev)
  • Description with optional relation info (rel tablespace/database/relfilenode)

Version-Specific Differences

While the core format is stable, some differences exist:

  • Relation format: Always tablespace/database/relfilenode in PG 10+
  • Resource managers: New types added in newer versions (handled gracefully)
  • Description details: May vary but core structure remains same

Notes

  • Progress is reported every 5 seconds during processing
  • DuckDB writes are serialized but parsing happens in parallel
  • Unknown resource managers or description formats are stored as-is

About

A utility to conver pg_waldump output to a DuckDB database for analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages