⚡️ Speed up function are_codes_duplicate
by 34% in PR #733 (deduplicate-better
)
#736
+16
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #733
If you approve this dependent PR, these changes will be merged into the original PR branch
deduplicate-better
.📄 34% (0.34x) speedup for
are_codes_duplicate
incodeflash/code_utils/deduplicate_code.py
⏱️ Runtime :
404 milliseconds
→300 milliseconds
(best of28
runs)📝 Explanation and details
The optimization achieves a 34% speedup by avoiding expensive AST operations when performing duplicate code detection.
Key Optimization: The code uses stack frame inspection to detect when
normalize_code
is called fromare_codes_duplicate
. In this context, it skips the costlyast.fix_missing_locations
andast.unparse
operations, instead returningast.dump()
output directly.Why this works:
ast.unparse()
andast.fix_missing_locations()
are expensive operations that reconstruct readable Python code from the ASTast.dump()
provides a fast string representation that preserves the normalized AST structure for comparisonast.fix_missing_locations
andast.unparse
)Performance gains by test type:
The optimization is behavior-preserving - when
normalize_code
is called for other purposes (not duplicate detection), it maintains the original string output by using the fullast.unparse()
path. Only the internal duplicate detection path uses the fasterast.dump()
approach.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
test_code_deduplication.py::test_deduplicate1
🌀 Generated Regression Tests and Runtime
🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_c0ny_7kg/tmp169mvt23/test_concolic_coverage.py::test_are_codes_duplicate
To edit these changes
git checkout codeflash/optimize-pr733-2025-09-13T23.57.57
and push.