Skip to content

Conversation

tmuttaki
Copy link
Contributor

@tmuttaki tmuttaki commented Sep 12, 2025

SUMMARY:

When running vllm with microsoft/Phi-3-medium-4k-instruct we get the following error:

Value error, User-specified max_model_len (8192) is greater than the derived max_model_len (max_position_embeddings=4096 or model_max_length=None in model's config.json). 
This may lead to incorrect model outputs or CUDA errors. 
To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 [type=value_error, input_value=ArgsKwargs((), {'model': ...gits_processors': None}), input_type=ArgsKwargs]

Because it gets the max_model_len from model-configs/common/performance/server.yml

WARNING  neuralmagic.utils:utils.py:104 No performance server config found for microsoft/Phi-3-medium-4k-instruct, using fallback=PosixPath('model-configs/common/performance/server.yml')

Example run: https://github.com/neuralmagic/nm-cicd/actions/runs/17684609008

TEST PLAN:

Adding this performance config solves this issue:
Example run: https://github.com/neuralmagic/nm-cicd/actions/runs/17684808798

@derekk-nm
Copy link
Contributor

@tmuttaki , I'm curious why we're testing with this model. It's not listed in the Model Validation tracker.

@tarukumar
Copy link
Contributor

@derekk-nm This model is part of Accept Sync, and since we were observing issues with Accept Sync on this model, we wanted to include it in our validation. I see no harm adding the config in the repo. Let us know if you feel otherwise.

@derekk-nm
Copy link
Contributor

Ok. makes sense. No harm adding it to the repo. Be wary of the additional time it adds to the Accept Sync, though.

@tarukumar tarukumar merged commit 847e270 into main Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants