[CI] Speed up model unit tests in CI #24253

afeldman-nm · 2025-09-04T14:39:07Z

Purpose

Addresses #23670 (not sure as of yet if there will be additional PRs to fix this issue)

This PR speeds up the regression tests which run for each PR (as opposed to for example the nightly tests, which are not impacted by this PR)

No impact on buildkite/fastcheck/pr unit tests

Impact on buildkite/ci/pr unit tests: lower the typical execution time of the following regression test categories in CI-

Basic Models
- The existing Basic Models test category comprises the following sub-categories: model initialization, transformers, registry, utils, vision
- Separate out the slowest model initialization unit tests since they consume most of the Basic Models test time; create an "extra" model initialization unit tests category which only runs when model source is modified, and shard these extra initialization tests 4x to exploit parallelism at the level of CI scheduling. In a "typical" scenario, these "extra" model initialization tests will not run
- The remaining (ostensibly faster) model initialization tests will stay within the "Basic Models" test category, which is also now sharded 4x in order to bring test runtimes as close to 10min as possible
- Note that when vLLM model source code is modified, both the "Basic Models" and "Extra Model Initialization" tests will run; running both of these test categories is equivalent to running all of the "Basic Models" tests currently implemented on main
- Basic Models test run time on main: 1h2m
- Basic Models test run time in this PR (typical scenario): ~10m (4 shards with about 5-10m runtime each)
- Extra Model Initialization test run time in this PR (scenario where model source is modified): ~15m (4 shards with about 15m runtime each)
Language Models [standard] - these tests are grouped together by the core_model pytest mark
- The existing Language Models test category comprises a number of sub-categories of tests including (but not limited to) "common", "embedding", and "classification", which appear to have some particularly slow test-cases
- In order to group these slow test-cases together, create a slow_test pytest mark
- Mark the most expensive subset of core_model tests as slow_test
- Create "extra" standard language models test category for slow standard language models tests (core_model and slow_test pytest marks), which only runs in the special case where model source is modified. Shard these extra tests 4x to lower test time
- In the typical case, only the remaining "Language Models [standard]" will run. Sharding does not appear to be needed to reduce test time
- Language Models [standard] test run time on main: 42m
- Language Models [standard] test run time in this PR (typical scenario): 10m
- Language Models [standard][extra] test run time in this PR (scenario where model source is modified): ~20m (4 shards with 5-20m runtime each)
Language models [hybrid]
- The hybrid language models tests are too precisely-specified; it is not possible to mark some as lower-priority & only run in special cases
- Thus hybrid language model tests are sharded 4x to exploit parallelism at the CI level
- Language Models [hybrid] test run time on main: 35m
- Language Models [hybrid] test run time in this PR: 20m (4 shards with 10-20m runtime each)

Trigger conditions for unit tests

vllm/ modification -> Basic Models Tests, Language Models [standard] Tests, Language Models [hybrid] Tests

Any vLLM model source code modification -> Extra Model Initialization Tests, Language Modes [standard][extra] (in addition to all of the tests which run when vllm/ is modified)

Test Plan

Used an instrumented PR to measure runtimes and see that tests pass
Once full buildkite/ci/pr tests are enabled on this PR with /ready, we will see that the new unit tests are triggered as expected

Test Result

For test runtimes, see summary about and details here https://buildkite.com/vllm/fastcheck/builds/41168#01993638-97b7-47b5-bdaf-2c0f70e0c16c

TODO: validate that tests are triggered as expected

Note: there appear to be two regression tests failing, which I am pretty sure are not caused by this PR:

models/test_initialization.py::test_can_initialize_other[LlamaForCausalLMEagle3]
entrypoints/openai/test_completion_with_prompt_embeds.py::test_completions_with_logprobs_and_prompt_embeds[-HuggingFaceH4/zephyr-7b-beta-1]

Signed-off-by: Andrew Feldman <[email protected]>

mergify · 2025-09-08T13:57:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Feldman <[email protected]>

DarkLight1337

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

afeldman-nm · 2025-09-08T20:21:29Z

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

Yes, that is correct @DarkLight1337

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-09-09T14:50:19Z

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

More specifically @DarkLight1337 the goal is to speed up the regressions that run for each PR in the typical case, by (1) recategorizing certain slow model tests so that they only run in under specific atypical conditions (i.e. modifications to model code), and (2) sharding model tests to exploit parallelism at the granularity of CI test scheduling

Also realize I forgot to mention, the PR is addressing this issue

#23670

Signed-off-by: Andrew Feldman <[email protected]>

tests/models/test_initialization.py

mgoin · 2025-09-11T13:53:01Z

I worry about aggressively sharding fast tests due to the fixed cost we pay to start an instance. Last time I checked it was 3-5min init. Has that time gone down?

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-09-11T16:18:32Z

I worry about aggressively sharding fast tests due to the fixed cost we pay to start an instance. Last time I checked it was 3-5min init. Has that time gone down?

Per our DMs, I will limit the number of parallelized tests and lower the parallelism to 2x. At least until init time is reduced.

Signed-off-by: Andrew Feldman <[email protected]>

njhill

Thanks @afeldman-nm

afeldman-nm · 2025-09-12T16:27:03Z

Note that the blackwell-test appears to be failing on main: https://buildkite.com/vllm/ci/builds/30438/steps/canvas?jid=01993af3-6a12-4fcf-ab0c-a10841a79959

PR to fix the issue is WIP, see #24750

Signed-off-by: Andrew Feldman <[email protected]>

Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: bbartels <[email protected]>

Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: bruceszchen <[email protected]>

afeldman-nm added 2 commits September 4, 2025 10:36

test-pipeline

f65ff24

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into speed_model_ci

d5fa076

mergify bot added the ci/build label Sep 4, 2025

afeldman-nm added 2 commits September 4, 2025 12:20

basic tests

35a331b

Signed-off-by: Andrew Feldman <[email protected]>

common and embedding markers

d630671

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm mentioned this pull request Sep 4, 2025

[CI]: Speed up Models Tests #23670

Open

afeldman-nm changed the title ~~[WIP] Speed up model unit tests in CI~~ [WIP] [CI] Speed up model unit tests in CI Sep 4, 2025

afeldman-nm changed the title ~~[WIP] [CI] Speed up model unit tests in CI~~ [CI] Speed up model unit tests in CI Sep 4, 2025

afeldman-nm added 5 commits September 4, 2025 12:54

parallelize basic tests

7d31d30

Signed-off-by: Andrew Feldman <[email protected]>

adjust

6c8c6e9

Signed-off-by: Andrew Feldman <[email protected]>

wip

3709f82

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into speed_model_ci

0a12b97

revise

183f97d

Signed-off-by: Andrew Feldman <[email protected]>

burn changes

bba582f

Signed-off-by: Andrew Feldman <[email protected]>

mergify bot added the needs-rebase label Sep 8, 2025

Merge branch 'main' into speed_model_ci

b2a3d02

mergify bot removed the needs-rebase label Sep 8, 2025

afeldman-nm added 3 commits September 8, 2025 10:34

Break out different initialization tests

36a4875

Signed-off-by: Andrew Feldman <[email protected]>

shard hybrid models

e32e371

Signed-off-by: Andrew Feldman <[email protected]>

Standard language models test

1d883af

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm marked this pull request as ready for review September 8, 2025 16:13

afeldman-nm requested review from DarkLight1337 and ywang96 as code owners September 8, 2025 16:13

DarkLight1337 reviewed Sep 8, 2025

View reviewed changes

small fix

774b9e4

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm added 2 commits September 9, 2025 12:17

Merge branch 'main' into speed_model_ci

4941627

wip

1fd0c68

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm added 4 commits September 10, 2025 18:06

more sharding

ee2b0dc

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into speed_model_ci

4a26031

isolate test; lower parallelism

c288ab3

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into speed_model_ci

6ee7661

DarkLight1337 reviewed Sep 11, 2025

View reviewed changes

tests/models/test_initialization.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 11, 2025

View reviewed changes

tests/models/test_initialization.py Show resolved Hide resolved

afeldman-nm added 5 commits September 11, 2025 11:37

Merge branch 'main' into speed_model_ci

ff660fc

lower shard factor and reorganize

e66b7e6

Signed-off-by: Andrew Feldman <[email protected]>

explanatory comments

28239d4

Signed-off-by: Andrew Feldman <[email protected]>

wip

009458f

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into speed_model_ci

63015ab

test names

aef7215

Signed-off-by: Andrew Feldman <[email protected]>

njhill approved these changes Sep 11, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 11, 2025

afeldman-nm added 3 commits September 11, 2025 22:02

Merge branch 'main' into speed_model_ci

09e84a4

Merge branch 'main' into speed_model_ci

bb15907

Merge branch 'main' into speed_model_ci

a806634

njhill enabled auto-merge (squash) September 12, 2025 15:10

simon-mo merged commit c8c4259 into vllm-project:main Sep 12, 2025
79 of 81 checks passed

afeldman-nm deleted the speed_model_ci branch September 12, 2025 17:44

afeldman-nm mentioned this pull request Sep 12, 2025

[CI] Move applicable tests to CPU #24080

Open

5 tasks

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

8287efe

Signed-off-by: Andrew Feldman <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

eeaceb0

Signed-off-by: Andrew Feldman <[email protected]>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

bb04cab

Signed-off-by: Andrew Feldman <[email protected]>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

22aa409

Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: bbartels <[email protected]>

cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

212ab74

Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: bruceszchen <[email protected]>

cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025

[CI] Speed up model unit tests in CI (vllm-project#24253)

624de6d

Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: bruceszchen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI] Speed up model unit tests in CI #24253

[CI] Speed up model unit tests in CI #24253

Uh oh!

afeldman-nm commented Sep 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Sep 8, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

afeldman-nm commented Sep 8, 2025 •

edited

Loading

Uh oh!

afeldman-nm commented Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mgoin commented Sep 11, 2025

Uh oh!

afeldman-nm commented Sep 11, 2025 •

edited

Loading

Uh oh!

njhill left a comment

Uh oh!

afeldman-nm commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CI] Speed up model unit tests in CI #24253

[CI] Speed up model unit tests in CI #24253

Uh oh!

Conversation

afeldman-nm commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Trigger conditions for unit tests

Test Plan

Test Result

Uh oh!

mergify bot commented Sep 8, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

afeldman-nm commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afeldman-nm commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin commented Sep 11, 2025

Uh oh!

afeldman-nm commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

afeldman-nm commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

afeldman-nm commented Sep 4, 2025 •

edited by github-actions bot

Loading

afeldman-nm commented Sep 8, 2025 •

edited

Loading

afeldman-nm commented Sep 9, 2025 •

edited

Loading

afeldman-nm commented Sep 11, 2025 •

edited

Loading