docs: update README with latest benchmarks and new sections #489

droot · 2025-08-14T21:58:04Z

No description provided.

janetkuo · 2025-08-14T23:27:50Z

README.md

 | gemini-2.5-flash-preview-04-17 | 10 | 0 | 100% |
 | gemini-2.5-pro-preview-03-25 | 10 | 0 | 100% |
 | gemma-3-27b-it | 8 | 2 | 80% |
+| AWS Bedrock Claude 3.7 Sonnet | 10 | 0 | 100% |


To improve the clarity of the benchmark, could we clarify if AWS Bedrock is just the access layer? It might be better to list the core model, 'Claude 3.7 Sonnet', to ensure we're comparing the models directly.

We will add a column for the llm provider in the final benchmark report for completeness and reproducibility. I have seen differences in behavior of the model across different providers (in bedrock case, there are some prompt specific changes as well that I am not super sure of) and when we include secondary metrics e.g. cost, latency etc, it will become even more important. This will also be critical for open models where one inference stack (llama.cpp, vllm) affects even the accuracy of the same model.

/cc @noahlwest

docs: update README with latest benchmarks and new sections

d0ca7b9

droot requested review from janetkuo and noahlwest August 14, 2025 21:58

noahlwest approved these changes Aug 14, 2025

View reviewed changes

droot merged commit beb33b5 into GoogleCloudPlatform:main Aug 14, 2025
6 checks passed

janetkuo reviewed Aug 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update README with latest benchmarks and new sections #489

docs: update README with latest benchmarks and new sections #489

Uh oh!

droot commented Aug 14, 2025

Uh oh!

Uh oh!

janetkuo Aug 14, 2025

Uh oh!

droot Aug 15, 2025

Uh oh!

Uh oh!

docs: update README with latest benchmarks and new sections #489

docs: update README with latest benchmarks and new sections #489

Uh oh!

Conversation

droot commented Aug 14, 2025

Uh oh!

Uh oh!

janetkuo Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

droot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!