docs: add vllm semantic router blog #76

Xunzhuo · 2025-09-01T02:52:18Z

this PR adds vllm semantic router blog

Signed-off-by: bitliu <[email protected]>

_posts/2025-09-01-semantic-router.md

Signed-off-by: bitliu <[email protected]>

rootfs · 2025-09-01T12:45:50Z

_posts/2025-09-01-semantic-router.md

+
+Take **GPT-5** as an example. Its real breakthrough isn't in the number of parameters, but in the **"automatic routing + thinking quota"**:
+
+* **Light queries → Light models**: For example, "Why is the sky blue?" does not require expensive inference models.


nit: "light" -> "simple"/"trivial"/or "casual"

rootfs · 2025-09-01T12:50:22Z

_posts/2025-09-01-semantic-router.md

+
+* **Complex/High-value queries → Strong inference models**: Legal analysis, financial simulations, etc., are routed to models with Chain-of-Thought capabilities.
+
+The logic behind this mechanism is called **"Per-token Unit Economics"**.


"Per-token Unit Economics"

Maybe we can borrow a page from here and call it "AI Token Economics"

youkaichao · 2025-09-01T15:56:10Z

please continue at #77

Xunzhuo force-pushed the add-vsr-blog branch 2 times, most recently from 6792af7 to a6c1fe5 Compare September 1, 2025 03:24

Xunzhuo marked this pull request as ready for review September 1, 2025 03:41

Xunzhuo force-pushed the add-vsr-blog branch 2 times, most recently from 52f372e to 5a632bd Compare September 1, 2025 06:33

docs: add vllm semantic router blog

5fd1503

Signed-off-by: bitliu <[email protected]>

Xunzhuo force-pushed the add-vsr-blog branch from 5a632bd to 5fd1503 Compare September 1, 2025 08:01

hmellor reviewed Sep 1, 2025

View reviewed changes

_posts/2025-09-01-semantic-router.md Outdated Show resolved Hide resolved

resolve reviews

77d7553

Signed-off-by: bitliu <[email protected]>

rootfs reviewed Sep 1, 2025

View reviewed changes

youkaichao mentioned this pull request Sep 1, 2025

Add vLLM Semantic Router Blog #77

Merged

youkaichao closed this Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add vllm semantic router blog #76

docs: add vllm semantic router blog #76

Xunzhuo commented Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

rootfs Sep 1, 2025

Uh oh!

rootfs Sep 1, 2025

Uh oh!

youkaichao commented Sep 1, 2025

Uh oh!

Uh oh!


		Take GPT-5 as an example. Its real breakthrough isn't in the number of parameters, but in the "automatic routing + thinking quota":

		* Light queries → Light models: For example, "Why is the sky blue?" does not require expensive inference models.


		* Complex/High-value queries → Strong inference models: Legal analysis, financial simulations, etc., are routed to models with Chain-of-Thought capabilities.

		The logic behind this mechanism is called "Per-token Unit Economics".

docs: add vllm semantic router blog #76

docs: add vllm semantic router blog #76

Conversation

Xunzhuo commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rootfs Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

rootfs Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Sep 1, 2025

Uh oh!

Uh oh!

Xunzhuo commented Sep 1, 2025 •

edited

Loading