An extensible benchmark for evaluating large language models on planning
-
Updated
Jun 25, 2025 - PDDL
An extensible benchmark for evaluating large language models on planning
A Benchmark Suite for Cloud Services.
The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
RiVEC Bencmark Suite
Machine Learning Benchmark Scripts
A selection of ANSI C benchmarks and programs useful as benchmarks
Generate performance reports from your django database performance tests.
Featherlight benchmark framework, drop-in replacement for criterion and gauge.
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite
A list of benchmark suites used in the research related to compilers, program performance, scientific computations etc.
benchmark and evaluate generative research synthesis
This repository contains the code base for the Open Stream Processing Benchmark.
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms
A framework for benchmarking clustering algorithms
A JuMP-based library of Non-Linear and Mixed-Integer Non-Linear Programs
GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Add a description, image, and links to the benchmark-suite topic page so that developers can more easily learn about it.
To associate your repository with the benchmark-suite topic, visit your repo's landing page and select "manage topics."