Get throughput from output for async functions #739

KRRT7 · 2025-09-17T21:45:39Z

PR Type

Enhancement, Tests, Bug fix

Description

Derive test context from pytest hooks
Add async throughput calculation support
Extend baseline model with throughput
Update tests for new interfaces

Diagram Walkthrough

flowchart LR
  pytestHook["pytest hooks set test context env"] -- "env vars" --> extractor["extract_test_context_from_frame uses env"]
  runParse["run_and_parse_tests returns results thrice"] -- "behavior/perf results" --> calcTP["calculate_async_throughput aggregates stdout"]
  calcTP -- "function throughput" --> baseline["OriginalCodeBaseline.async_throughput"]

File Walkthrough

Relevant files

Enhancement

6 files

codeflash_wrap_decorator.py `Replace stack introspection with env-based test context`	+12/-174
models.py `Add async throughput to baseline model`	+1/-0
function_optimizer.py `Plumb results for throughput; compute and store async throughput`	+44/-10
parse_test_output.py `Add helpers to compute throughput from stdout`	+63/-0
pytest_plugin.py `Set/clear test context env vars per test`	+22/-0
test_runner.py `Log executed pytest command for visibility`	+1/-1

Tests

8 files

test_async_run_and_parse_tests.py `Adapt to new API and class-name expectations`	+10/-10
test_async_wrapper_sqlite_validation.py `Provide env context; fix sync context extraction test`	+5/-2
test_codeflash_capture.py `Update for triple-return API and expectations`	+12/-12
test_extract_test_context_from_frame.py `Remove obsolete stack-based context tests`	+0/-217
test_instrument_all_and_run.py `Adjust tests to new run_and_parse_tests signature`	+5/-5
test_instrument_tests.py `Update behavioral/perf/line profile calls to new API`	+22/-22
test_instrumentation_run_results_aiservice.py `Align to new return tuple from test runner`	+5/-5
test_pickle_patcher.py `Adapt pickle patch tests to new API`	+4/-4

Configuration changes

1 files

codeflash.code-workspace `Update launch args to async concurrency example`	+5/-1

This reverts commit 96211b8.

github-actions · 2025-09-17T21:46:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Regex Robustness The new throughput parsers rely on strict colon-delimited patterns without escaping, which may fail if module/function names contain colons or unexpected characters, or if stdout interleaves tags. Consider anchoring, stricter grouping, and normalization to avoid false positives/negatives. start_pattern = re.compile(r"!\$######([^:]):([^:]):([^:]):([^:]):([^:]+)######\$!") end_pattern = re.compile(r"!######([^:]):([^:]):([^:]):([^:]):([^:]+)######!") def calculate_function_throughput_from_stdout(stdout: str, function_name: str) -> int: """A completed execution is defined as having both a start tag and matching end tag: Start: !$######test_module:test_function:function_name:loop_index:iteration_id######$! End: !######test_module:test_function:function_name:loop_index:iteration_id######! """ start_matches = start_pattern.findall(stdout) end_matches = end_pattern.findall(stdout) end_matches_set = set(end_matches) # Count completed executions for the specific function only function_throughput = 0 for start_match in start_matches: # Check if this execution is for the function we're interested in and has a matching end tag # function_name is at index 2 in the match tuple if start_match in end_matches_set and len(start_match) > 2 and start_match[2] == function_name: function_throughput += 1 return function_throughput Excessive Logging Logging the full pytest command at info level on every run could be noisy and leak paths; consider using debug level or redacting sensitive parts. blocklist_args = [f"-p no:{plugin}" for plugin in BEHAVIORAL_BLOCKLISTED_PLUGINS if plugin != "cov"] logger.info(f"{' '.join(coverage_cmd + common_pytest_args + blocklist_args + result_args + test_files)}") results = execute_test_subprocess( Env Var Dependence extract_test_context_from_frame now hard-requires environment variables and raises if missing; ensure all runners (including unittest and non-pytest paths) set these vars or add a safer fallback to prevent crashes. def extract_test_context_from_frame() -> tuple[str, str \| None, str]: # test_module = os.environ.get("CODEFLASH_TEST_MODULE") test_module = os.environ["CODEFLASH_TEST_MODULE"] test_class = os.environ.get("CODEFLASH_TEST_CLASS", None) # test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") test_function = os.environ["CODEFLASH_TEST_FUNCTION"] if test_module and test_function: return (test_module, test_class if test_class else None, test_function) raise RuntimeError( "Test context environment variables not set - ensure tests are run through codeflash test runner" )

github-actions · 2025-09-17T21:47:37Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Normalize end tags before matching The start/end tag formats elsewhere include optional extra timing appended to the end tag, making tuple equality fail and undercounting throughput. Normalize end matches (strip trailing timing after the last colon) before comparing to start tuples. This aligns with the normalization used above and ensures completed executions are counted correctly. codeflash/verification/parse_test_output.py [81-103] start_pattern = re.compile(r"!\$######([^:]):([^:]):([^:]):([^:]):([^:]+)######\$!") end_pattern = re.compile(r"!######([^:]):([^:]):([^:]):([^:]):([^:]+)######!") def calculate_function_throughput_from_stdout(stdout: str, function_name: str) -> int: """A completed execution is defined as having both a start tag and matching end tag: Start: !$######test_module:test_function:function_name:loop_index:iteration_id######$! - End: !######test_module:test_function:function_name:loop_index:iteration_id######! + End: !######test_module:test_function:function_name:loop_index:iteration_id[:timing]######! """ start_matches = start_pattern.findall(stdout) - end_matches = end_pattern.findall(stdout) - end_matches_set = set(end_matches) + raw_end_matches = end_pattern.findall(stdout) - # Count completed executions for the specific function only + # Normalize end matches by stripping optional trailing ':timing' from the last group + normalized_end_matches = [] + for m in raw_end_matches: + if m and len(m) == 5: + last = m[4] + parts = last.split(":") + if len(parts) > 1: + # keep everything except the last timing segment + last = ":".join(parts[:-1]) + normalized_end_matches.append((m[0], m[1], m[2], m[3], last)) + else: + normalized_end_matches.append(m) + + end_matches_set = set(normalized_end_matches) + function_throughput = 0 - for start_match in start_matches: - # Check if this execution is for the function we're interested in and has a matching end tag - # function_name is at index 2 in the match tuple - if start_match in end_matches_set and len(start_match) > 2 and start_match[2] == function_name: + if len(start_match) > 2 and start_match[2] == function_name and start_match in end_matches_set: function_throughput += 1 return function_throughput Suggestion importance[1-10]: 8 __ Why: This addresses a subtle correctness bug: end tags may include timing info causing tuple mismatch; normalization aligns with earlier logic and prevents undercounting. High impact for accurate throughput metrics and consistent with PR context.	Medium
Possible issue	Prevent KeyError on env access Using direct indexing on `os.environ` will raise `KeyError` in environments outside pytest or if hooks fail, causing hard crashes. Safely read the variables with `.get()` and raise the existing `RuntimeError` only when required fields are missing. This keeps previous behavior while avoiding unexpected `KeyError`. codeflash/code_utils/codeflash_wrap_decorator.py [33-45] def extract_test_context_from_frame() -> tuple[str, str \| None, str]: - # test_module = os.environ.get("CODEFLASH_TEST_MODULE") - test_module = os.environ["CODEFLASH_TEST_MODULE"] + test_module = os.environ.get("CODEFLASH_TEST_MODULE") test_class = os.environ.get("CODEFLASH_TEST_CLASS", None) - # test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") - test_function = os.environ["CODEFLASH_TEST_FUNCTION"] + test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") if test_module and test_function: return (test_module, test_class if test_class else None, test_function) raise RuntimeError( "Test context environment variables not set - ensure tests are run through codeflash test runner" ) Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that direct os.environ indexing can raise KeyError and proposes a safe .get() fallback without changing semantics. It's contextually accurate and reduces unexpected crashes, though not critical.	Medium
General	Avoid blank class env leakage Writing empty string for `CODEFLASH_TEST_CLASS` makes downstream consumers distinguish "no class" vs "unset". Use `pop(..., None)` before setup to avoid stale values, and only set `CODEFLASH_TEST_CLASS` when a class exists. This prevents confusing blank class names and ensures clean state per test. codeflash/verification/pytest_plugin.py [454-474] @pytest.hookimpl(tryfirst=True) def pytest_runtest_setup(self, item: pytest.Item) -> None: + # Clear any previous test context to avoid leakage between tests + for var in ("CODEFLASH_TEST_MODULE", "CODEFLASH_TEST_CLASS", "CODEFLASH_TEST_FUNCTION"): + os.environ.pop(var, None) + test_module_name = item.module.__name__ if item.module else "unknown_module" - test_class_name = None - if item.cls: - test_class_name = item.cls.__name__ + test_class_name = item.cls.__name__ if item.cls else None test_function_name = item.name if "[" in test_function_name: test_function_name = test_function_name.split("[", 1)[0] os.environ["CODEFLASH_TEST_MODULE"] = test_module_name - os.environ["CODEFLASH_TEST_CLASS"] = test_class_name or "" + if test_class_name: + os.environ["CODEFLASH_TEST_CLASS"] = test_class_name os.environ["CODEFLASH_TEST_FUNCTION"] = test_function_name @pytest.hookimpl(trylast=True) def pytest_runtest_teardown(self, item: pytest.Item) -> None: """Clean up test context environment variables after each test.""" for var in ["CODEFLASH_TEST_MODULE", "CODEFLASH_TEST_CLASS", "CODEFLASH_TEST_FUNCTION"]: os.environ.pop(var, None) Suggestion importance[1-10]: 6 __ Why: Clearing env vars before setup and omitting empty class values improves robustness and state isolation. It's a reasonable enhancement with moderate impact; existing code already cleans up in teardown so the issue is not severe.	Low

KRRT7 added 12 commits September 12, 2025 17:37

Update codeflash.code-workspace

ab500c1

Update parse_test_output.py

f98dae6

add tests_project_root to the wrappers

96211b8

Update codeflash.code-workspace

f872d5d

Revert "add tests_project_root to the wrappers"

ee1d386

This reverts commit 96211b8.

Update test_runner.py

b858881

Delete test_extract_test_context_from_frame.py

a42c63a

better impl

fadbdf7

Update parse_test_output.py

1034ee8

calculate throughput

4ae0f93

add unit tests

fb66ff5

fix wrapper validation test

2190963

github-actions bot added the Review effort 3/5 label Sep 17, 2025

KRRT7 added 6 commits September 17, 2025 15:32

fix linting

2a3c975

cleanup

9c53948

Update function_optimizer.py

e5c4562

optimized candidate result oo

04d31b5

critic

8e516fc

update tests again

d7f9bbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get throughput from output for async functions #739

Get throughput from output for async functions #739

KRRT7 commented Sep 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

Uh oh!

Get throughput from output for async functions #739

Are you sure you want to change the base?

Get throughput from output for async functions #739

Conversation

KRRT7 commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 17, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Sep 17, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

KRRT7 commented Sep 17, 2025 •

edited by github-actions bot

Loading