You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exception handler variable names and comprehension targets are normalized but parameter names, globals, and imports are preserved; consider whether attribute names (e.g., obj.attr) and keyword argument names should remain untouched to avoid structurally different code being treated as duplicates.
defvisit_ExceptHandler(self, node):
"""Normalize exception variable names"""ifnode.name:
node.name=self.get_normalized_name(node.name)
returnself.generic_visit(node)
defvisit_comprehension(self, node):
"""Normalize comprehension target variables"""# Create new scope for comprehensionold_mapping=dict(self.var_mapping)
old_counter=self.var_counter# Process the comprehensionnode=self.generic_visit(node)
# Restore scopeself.var_mapping=old_mappingself.var_counter=old_counterreturnnode
Docstring stripping assumes first body element is a Constant; this may miss cases like formatted strings or future annotations and could affect round-tripping; verify behavior across Python versions and ensure it doesn’t remove non-docstring top-level constants.
defremove_docstrings_from_ast(node):
"""Remove docstrings from AST nodes."""# Process all nodes in the tree, but avoid recursionforcurrent_nodeinast.walk(node):
ifisinstance(current_node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef, ast.Module)):
if (
current_node.bodyandisinstance(current_node.body[0], ast.Expr)
andisinstance(current_node.body[0].value, ast.Constant)
andisinstance(current_node.body[0].value.value, str)
):
current_node.body=current_node.body[1:]
Using normalized code as a cache key changes prior behavior; confirm normalization preserves semantically relevant differences (e.g., differing literals or control flow) to avoid over-deduplication that might skip viable candidates.
normalized_code=normalize_code(candidate.source_code.flat.strip())
ifnormalized_codeinast_code_to_id:
logger.info(
"Current candidate has been encountered before in testing, Skipping optimization candidate."
)
normalize_code can raise ValueError on bad syntax; if uncaught here it aborts the loop. Wrap normalization in the existing try/except block or a local try to skip invalid candidates without crashing.
-normalized_code = normalize_code(candidate.source_code.flat.strip())+try:+ normalized_code = normalize_code(candidate.source_code.flat.strip())+except ValueError as e:+ logger.error(e)+ continue
if normalized_code in ast_code_to_id:
logger.info(
"Current candidate has been encountered before in testing, Skipping optimization candidate."
)
Suggestion importance[1-10]: 7
__
Why: normalize_code may raise ValueError, and the current normalization call is outside the existing try/except; wrapping it prevents the loop from aborting on bad syntax. This is accurate and improves robustness, though it's a localized error-handling enhancement.
Medium
Possible issue
Keep comprehension scope consistent
Resetting mappings for comprehensions breaks name consistency between targets and subsequent loads in the same scope, producing mismatched normalization. Avoid snapshotting/restoring the mapping; let comprehensions use the current scope so targets and uses normalize consistently.
def visit_comprehension(self, node):
- """Normalize comprehension target variables"""- # Create new scope for comprehension- old_mapping = dict(self.var_mapping)- old_counter = self.var_counter+ """Use current scope for comprehension targets"""+ return self.generic_visit(node)- # Process the comprehension- node = self.generic_visit(node)-- # Restore scope- self.var_mapping = old_mapping- self.var_counter = old_counter- return node-
Suggestion importance[1-10]: 6
__
Why: Snapshotting/restoring the mapping inside visit_comprehension can indeed cause inconsistencies between targets/loads across surrounding scope; removing it aligns normalization with actual scope rules. This improves correctness for comprehensions but is not a critical bug.
Low
Preserve exception variable names
Normalizing exception variable names can conflate distinct handlers and change behavior when exception variables shadow globals/nonlocals/parameters. Preserve exception variable names like parameters and globals to avoid incorrect deduplication and potential semantic drift.
def visit_ExceptHandler(self, node):
- """Normalize exception variable names"""- if node.name:- node.name = self.get_normalized_name(node.name)+ """Preserve exception variable names (do not normalize)"""
return self.generic_visit(node)
Suggestion importance[1-10]: 5
__
Why: The existing code normalizes ExceptHandler.name, which could over-normalize and slightly affect deduplication semantics; preserving names is a reasonable minor improvement. However, since deduplication already preserves globals/params and this change is not critical for correctness, impact is moderate.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
implements variable name anonymization, according to the order in which they are seen
PR Type
Enhancement, Tests
Description
Add AST-based code deduplication utilities
Normalize local variable names robustly
Integrate normalization into optimizer cache
Add comprehensive deduplication tests
Diagram Walkthrough
File Walkthrough
deduplicate_code.py
Introduce AST-based code normalization utilities
codeflash/code_utils/deduplicate_code.py
function_optimizer.py
Use normalized code for candidate deduplication
codeflash/optimization/function_optimizer.py
test_code_deduplication.py
Add tests validating code normalization and dedup
tests/test_code_deduplication.py