Skip to content

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Sep 4, 2025

Purpose

Addresses #23670 (not sure as of yet if there will be additional PRs to fix this issue)

This PR speeds up the regression tests which run for each PR (as opposed to for example the nightly tests, which are not impacted by this PR)

No impact on buildkite/fastcheck/pr unit tests

Impact on buildkite/ci/pr unit tests: lower the typical execution time of the following regression test categories in CI-

  • Basic Models

    • The existing Basic Models test category comprises the following sub-categories: model initialization, transformers, registry, utils, vision
    • Separate out the slowest model initialization unit tests since they consume most of the Basic Models test time; create an "extra" model initialization unit tests category which only runs when model source is modified, and shard these extra initialization tests 4x to exploit parallelism at the level of CI scheduling. In a "typical" scenario, these "extra" model initialization tests will not run
    • The remaining (ostensibly faster) model initialization tests will stay within the "Basic Models" test category, which is also now sharded 4x in order to bring test runtimes as close to 10min as possible
    • Note that when vLLM model source code is modified, both the "Basic Models" and "Extra Model Initialization" tests will run; running both of these test categories is equivalent to running all of the "Basic Models" tests currently implemented on main
    • Basic Models test run time on main: 1h2m
    • Basic Models test run time in this PR (typical scenario): ~10m (4 shards with about 5-10m runtime each)
    • Extra Model Initialization test run time in this PR (scenario where model source is modified): ~15m (4 shards with about 15m runtime each)
  • Language Models [standard] - these tests are grouped together by the core_model pytest mark

    • The existing Language Models test category comprises a number of sub-categories of tests including (but not limited to) "common", "embedding", and "classification", which appear to have some particularly slow test-cases
    • In order to group these slow test-cases together, create a slow_test pytest mark
    • Mark the most expensive subset of core_model tests as slow_test
    • Create "extra" standard language models test category for slow standard language models tests (core_model and slow_test pytest marks), which only runs in the special case where model source is modified. Shard these extra tests 4x to lower test time
    • In the typical case, only the remaining "Language Models [standard]" will run. Sharding does not appear to be needed to reduce test time
    • Language Models [standard] test run time on main: 42m
    • Language Models [standard] test run time in this PR (typical scenario): 10m
    • Language Models [standard][extra] test run time in this PR (scenario where model source is modified): ~20m (4 shards with 5-20m runtime each)
  • Language models [hybrid]

    • The hybrid language models tests are too precisely-specified; it is not possible to mark some as lower-priority & only run in special cases
    • Thus hybrid language model tests are sharded 4x to exploit parallelism at the CI level
    • Language Models [hybrid] test run time on main: 35m
    • Language Models [hybrid] test run time in this PR: 20m (4 shards with 10-20m runtime each)

Trigger conditions for unit tests

vllm/ modification -> Basic Models Tests, Language Models [standard] Tests, Language Models [hybrid] Tests

Any vLLM model source code modification -> Extra Model Initialization Tests, Language Modes [standard][extra] (in addition to all of the tests which run when vllm/ is modified)

Test Plan

  • Used an instrumented PR to measure runtimes and see that tests pass
  • Once full buildkite/ci/pr tests are enabled on this PR with /ready, we will see that the new unit tests are triggered as expected

Test Result

For test runtimes, see summary about and details here https://buildkite.com/vllm/fastcheck/builds/41168#01993638-97b7-47b5-bdaf-2c0f70e0c16c

TODO: validate that tests are triggered as expected

Note: there appear to be two regression tests failing, which I am pretty sure are not caused by this PR:

  • models/test_initialization.py::test_can_initialize_other[LlamaForCausalLMEagle3]
  • entrypoints/openai/test_completion_with_prompt_embeds.py::test_completions_with_logprobs_and_prompt_embeds[-HuggingFaceH4/zephyr-7b-beta-1]

@mergify mergify bot added the ci/build label Sep 4, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm afeldman-nm changed the title [WIP] Speed up model unit tests in CI [WIP] [CI] Speed up model unit tests in CI Sep 4, 2025
@afeldman-nm afeldman-nm changed the title [WIP] [CI] Speed up model unit tests in CI [CI] Speed up model unit tests in CI Sep 4, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Feldman <[email protected]>
@mergify mergify bot added the needs-rebase label Sep 8, 2025
@mergify mergify bot removed the needs-rebase label Sep 8, 2025
@afeldman-nm afeldman-nm marked this pull request as ready for review September 8, 2025 16:13
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

@afeldman-nm
Copy link
Contributor Author

afeldman-nm commented Sep 8, 2025

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

Yes, that is correct @DarkLight1337

Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm
Copy link
Contributor Author

afeldman-nm commented Sep 9, 2025

Is the intention to still run these tests on each PR (unlike the Extended tests which are nightly only) but in parallel with the faster model tests?

More specifically @DarkLight1337 the goal is to speed up the regressions that run for each PR in the typical case, by (1) recategorizing certain slow model tests so that they only run in under specific atypical conditions (i.e. modifications to model code), and (2) sharding model tests to exploit parallelism at the granularity of CI test scheduling

Also realize I forgot to mention, the PR is addressing this issue

#23670

@mgoin
Copy link
Member

mgoin commented Sep 11, 2025

I worry about aggressively sharding fast tests due to the fixed cost we pay to start an instance. Last time I checked it was 3-5min init. Has that time gone down?

@afeldman-nm
Copy link
Contributor Author

afeldman-nm commented Sep 11, 2025

I worry about aggressively sharding fast tests due to the fixed cost we pay to start an instance. Last time I checked it was 3-5min init. Has that time gone down?

Per our DMs, I will limit the number of parallelized tests and lower the parallelism to 2x. At least until init time is reduced.

Signed-off-by: Andrew Feldman <[email protected]>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 11, 2025
@njhill njhill enabled auto-merge (squash) September 12, 2025 15:10
@afeldman-nm
Copy link
Contributor Author

Note that the blackwell-test appears to be failing on main: https://buildkite.com/vllm/ci/builds/30438/steps/canvas?jid=01993af3-6a12-4fcf-ab0c-a10841a79959

PR to fix the issue is WIP, see #24750

@simon-mo simon-mo merged commit c8c4259 into vllm-project:main Sep 12, 2025
79 of 81 checks passed
@afeldman-nm afeldman-nm deleted the speed_model_ci branch September 12, 2025 17:44
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: bruceszchen <[email protected]>
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: bruceszchen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants