Related Code Files:
code-intelligence-toolkit/data_flow_tracker.py- Original implementation of the data flow analysis toolcode-intelligence-toolkit/data_flow_tracker_v2.py- Enhanced version with impact analysis, calculation paths, and type trackingcode-intelligence-toolkit/doc_generator.py- Automated documentation generator leveraging data flow analysiscode-intelligence-toolkit/run_any_python_tool.sh- Wrapper script for executiontest_data_flow.py- Simple test examplestest_complex_data_flow.py- Complex test scenarios
The Data Flow Tracker is a comprehensive static analysis tool that tracks how data flows through your Python and Java code. It builds a complete dependency graph showing how variables affect each other through assignments, function calls, and complex expressions.
Data flow analysis tracks how values propagate through a program:
- Forward Analysis: Given variable X, find all variables that depend on X
- Backward Analysis: Given variable Y, find all variables that Y depends on
The tool tracks data flow across function boundaries:
def process(x): # x is parameter
y = x * 2 # y depends on x
return y # return value depends on y
result = process(input_val) # result depends on input_val through process()The tool is already integrated into the code-intelligence-toolkit:
# Direct usage
python3 code-intelligence-toolkit/data_flow_tracker.py --help
# Through wrapper (recommended)
./run_any_python_tool.sh data_flow_tracker.py --help# Track what variable 'x' affects
./run_any_python_tool.sh data_flow_tracker.py --var x --file calc.py
# Example output:
# Variable 'x' affects:
# - y = 2 * x (line 10)
# - z = y + 5 (line 11)
# - result = z * factor (line 15)# Track what affects variable 'result'
./run_any_python_tool.sh data_flow_tracker.py --var result --direction backward --file calc.py
# Example output:
# Variable 'result' depends on:
# - z (line 15: result = z * factor)
# - factor (line 15: result = z * factor)
# - y (line 11: z = y + 5)
# - x (line 10: y = 2 * x)# Track both forward and backward dependencies
./run_any_python_tool.sh data_flow_tracker.py --var total --direction both --file calc.pyTrack data flow across function calls:
# Enable inter-procedural tracking
./run_any_python_tool.sh data_flow_tracker.py --var user_input --file app.py --inter-procedural
# Tracks flows like:
# user_input → process_data(input_value) → scaled → transform(scaled) → resultAnalyze entire directories or multiple files:
# Analyze all Python files in a directory
./run_any_python_tool.sh data_flow_tracker.py --var config --scope src/ --recursive
# Analyze specific files
./run_any_python_tool.sh data_flow_tracker.py --var price --file model.py utils.py calc.pyGet structured output for programmatic processing:
./run_any_python_tool.sh data_flow_tracker.py --var x --file calc.py --format json
# Output:
{
"forward": {
"variable": "x",
"affects": [
{
"name": "y",
"location": "calc.py:10",
"code": "y = 2 * x",
"expression": "(2 * x)"
}
],
"flow_paths": ["x → y", "x → y → z"],
"total_affected": 3
}
}Generate visual dependency graphs:
# Generate DOT file
./run_any_python_tool.sh data_flow_tracker.py --var x --file calc.py --format graph > flow.dot
# Convert to image
dot -Tpng flow.dot -o flow.png
dot -Tsvg flow.dot -o flow.svgAnalyze all variables in a file:
./run_any_python_tool.sh data_flow_tracker.py --show-all --file module.py
# Shows dependency information for every variable foundControl how deep to trace dependencies:
# Only trace 2 levels deep
./run_any_python_tool.sh data_flow_tracker.py --var x --max-depth 2 --file calc.pyAnalyze specific file types:
# Only analyze Python files
./run_any_python_tool.sh data_flow_tracker.py --var data --scope src/ -g "*.py"
# Exclude test files
./run_any_python_tool.sh data_flow_tracker.py --var config --scope . --exclude "*test*"- Variable assignments:
x = 5,y = x + 2 - Multiple assignments:
a = b = c = 10 - Augmented assignments:
x += 1,y *= 2
- Binary operations:
result = (a + b) * (c - d) - Ternary operators:
val = x if x > 0 else -x - Comparisons:
is_valid = x > 0 and y < 100 - Boolean logic:
flag = condition1 or condition2
- Lists:
data = [x, y, z],first = data[0] - Tuples:
point = (x, y),a, b = point - Dictionaries:
config = {'a': x, 'b': y},val = config['a'] - Sets:
unique = {x, y, z}
- Tuple unpacking:
first, second, *rest = values - List comprehensions:
squares = [x*x for x in data] - Dict comprehensions:
mapped = {k: v*2 for k, v in items.items()} - Generator expressions:
gen = (x*2 for x in range(10))
- Instance variables:
self.value = x - Method calls:
result = self.process(data) - Property access:
val = obj.property - Method chaining:
result = obj.method1().method2().value
- Function calls:
result = process(x, y) - Return values:
return x * 2 - Global variables:
global config; config = x - Lambda functions:
fn = lambda x: x * 2
- Variable declarations:
int x = 5; - Assignments:
y = x + 2; - Field access:
this.value = x;
- Binary operations:
result = (a + b) * (c - d); - Ternary operators:
val = x > 0 ? x : -x; - Method calls:
result = process(x, y); - Object creation:
obj = new MyClass(x);
- Arrays:
int[] data = {x, y, z}; - Array access:
first = data[0]; - Method chaining:
result = obj.method1().method2();
# config_manager.py
def calculate_timeout(base_timeout, retry_count, backoff_factor):
adjusted_timeout = base_timeout * (1 + backoff_factor)
max_wait = adjusted_timeout * retry_count
final_timeout = min(max_wait, 300) # Cap at 5 minutes
return final_timeout
# Track what affects final_timeout
$ ./run_any_python_tool.sh data_flow_tracker.py --var final_timeout --direction backward --file config_manager.py
# Output shows:
# final_timeout depends on:
# - max_wait (from adjusted_timeout and retry_count)
# - adjusted_timeout (from base_timeout and backoff_factor)# data_processor.py
input_size = 1000
compression_ratio = 0.75
buffer_multiplier = 2
compressed_size = input_size * compression_ratio
buffer_size = compressed_size * buffer_multiplier
final_allocation = buffer_size + (input_size * 0.1) # 10% overhead
# Track forward flow from compression_ratio
$ ./run_any_python_tool.sh data_flow_tracker.py --var compression_ratio --file data_processor.py
# Shows how changing compression_ratio affects:
# - compressed_size
# - buffer_size
# - final_allocation# analyzer.py
class DataAnalyzer:
def __init__(self):
self.scale_factor = 1.5
self.threshold = 0.02
def analyze_data(self, raw_value, weight, confidence):
weighted_value = raw_value * weight
score = (weighted_value * confidence) / self.scale_factor
if score > self.threshold:
result = score * 100
return self.normalize_result(result)
return 0
def normalize_result(self, value):
return value * 0.95 # 5% adjustment
# Analyze the entire module
$ ./run_any_python_tool.sh data_flow_tracker.py --show-all --file analyzer.py --inter-proceduralThe tool shows complete paths of data flow:
x → y → z → result
This means: x affects y, which affects z, which affects result
Each dependency includes:
- File and line number:
calc.py:15 - Actual code:
result = x * factor - Parsed expression:
(x * factor)
Total affected variables: N- How many variables are affected (forward)Total dependencies: N- How many variables contribute (backward)
-
Start with key variables: Focus on critical values like configuration parameters, user inputs, or calculation results
-
Use inter-procedural for complex code: Enable
--inter-proceduralwhen analyzing code with many function calls -
Combine with refactoring: Use before refactoring to understand impact:
# See what depends on old_method before renaming ./run_any_python_tool.sh data_flow_tracker.py --var old_method --file module.py -
Generate graphs for documentation: Visual graphs help explain complex calculations
-
Use JSON for automation: Parse JSON output for automated dependency checking
- Static Analysis: Only tracks explicit data flow, not runtime behavior
- No Alias Analysis: Doesn't track pointer/reference aliases
- Limited Dynamic Features: Can't track
eval(),exec(), or reflection - No Cross-Language: Can't track Python calling Java or vice versa
- Check variable name spelling
- Ensure the variable is actually assigned in the code
- Try
--show-allto see all available variables
- Check file paths are correct
- Use
--scopefor directories - Ensure file extensions match (
*.pyfor Python)
- Use
--max-depthto limit traversal depth - Filter specific variables instead of
--show-all - Use
--format jsonand process programmatically
Combine with other code-intelligence-toolkit tools:
# Find where a method is defined
./run_any_python_tool.sh navigate_ast.py MyClass.py --to process_data
# Track data flow from that method
./run_any_python_tool.sh data_flow_tracker.py --var result --file MyClass.py
# Then refactor safely
./run_any_python_tool.sh replace_text_ast.py --file MyClass.py result new_resultData Flow Tracker V2 adds three powerful capabilities for deeper code intelligence:
Shows where data "escapes" its local scope and causes observable effects:
# See all the places where changing db_config would have effects
./run_any_python_tool.sh data_flow_tracker_v2.py --var db_config --show-impact --file app.pyOutput shows:
- Returns: Functions that return values dependent on the variable
- Side Effects: File writes, network calls, console output
- State Changes: Modifications to global variables or class members
- Risk Assessment: Overall risk level of making changes
Example output:
============================================================
Impact Analysis
============================================================
🔄 RETURNS:
- get_connection at db.py:45
Returns value dependent on db_config
⚠️ SIDE EFFECTS:
🟡 file_write at logger.py:89
External call to write
📝 STATE CHANGES:
- global_write at config.py:23
External call to cache_config
────────────────────────────────────────────────────────────
SUMMARY:
Total exit points: 4
Functions affected: 3
High risk count: 1
⚡ MEDIUM RISK: External side effects detected - ensure testing covers these
Extracts the minimal "critical path" showing exactly how a value is calculated:
# Understand how final_price is calculated
./run_any_python_tool.sh data_flow_tracker_v2.py --var final_price --show-calculation-path --file pricing.pyShows only the essential steps, filtering out noise:
============================================================
Calculation Path
============================================================
1. base_price = get_product_price()
Location: pricing.py:10
↓
2. tax_rate = lookup_tax_rate(location)
Inputs: location
Location: pricing.py:15
↓
3. discount = apply_coupon(coupon_code)
Inputs: coupon_code
Location: pricing.py:20
↓
4. final_price = (base_price * (1 + tax_rate)) - discount
Inputs: base_price, tax_rate, discount
Location: pricing.py:25
Monitors how variable types and states evolve through the code:
# Track type changes and potential issues
./run_any_python_tool.sh data_flow_tracker_v2.py --var user_data --track-state --file process.pyReveals type changes and warnings:
============================================================
Type & State Evolution for 'user_data'
============================================================
📈 TYPE EVOLUTION:
process.py:10: dict ✓
process.py:15: dict (nullable) ✓
Possible values: [None]
process.py:20: UserModel ✓
🔄 STATE CHANGES:
process.py:10: assignment
process.py:15: assignment (in conditional)
process.py:20: assignment
⚠️ WARNINGS:
- Variable may be None - add null checks
- Type changes detected: dict → UserModel
Before Refactoring:
# Check impact before renaming a configuration variable
./run_any_python_tool.sh data_flow_tracker_v2.py --var old_config_name --show-impact --file settings.pyDebugging Complex Calculations:
# Understand why a value is wrong
./run_any_python_tool.sh data_flow_tracker_v2.py --var wrong_result --show-calculation-path --file calc.pyType Safety Validation:
# Verify type consistency before deployment
./run_any_python_tool.sh data_flow_tracker_v2.py --var api_response --track-state --file handler.pyYou can use V2 alongside V1 features:
# First, see what affects the variable (V1)
./run_any_python_tool.sh data_flow_tracker.py --var total --direction backward --file calc.py
# Then, understand the calculation path (V2)
./run_any_python_tool.sh data_flow_tracker_v2.py --var total --show-calculation-path --file calc.py
# Finally, check impact of changes (V2)
./run_any_python_tool.sh data_flow_tracker_v2.py --var total --show-impact --file calc.pyThe Intelligence Layer represents a breakthrough enhancement that transforms complex technical analysis into intuitive, actionable insights through natural language explanations and interactive visualizations.
Convert technical analysis into plain English explanations that anyone can understand:
./run_any_python_tool.sh data_flow_tracker_v2.py --var database_config --show-impact --explain --file app.pyExample output:
📊 **Impact Analysis for 'database_config'**:
🚨 **High Risk Change**: Modifying 'database_config' affects 8 different places across 4 functions.
It affects 3 return values, causes 2 external side effects (like file writes or console output),
and modifies 3 global or class variables.
💡 **Recommendation**: Break this change into smaller steps and test each affected function thoroughly.
./run_any_python_tool.sh data_flow_tracker_v2.py --var final_price --show-calculation-path --explain --file pricing.pyExample output:
🔍 **How 'final_price' is Calculated**:
This value is calculated through 6 steps, showing the complete algorithm flow.
**The Critical Path**:
1. **Variable Created**: 'base_price' is first declared (depends on: product_id)
2. **Calculation Step**: 'tax_rate' is computed from location (depends on: location)
3. **Calculation Step**: 'discount' is computed from coupon_code (depends on: coupon_code)
4. **Calculation Step**: 'final_price' is computed from base_price, tax_rate, discount
💡 **Understanding**: To debug issues with 'final_price', trace through these 6 steps.
Each step shows exactly where the value comes from and what influences it.
./run_any_python_tool.sh data_flow_tracker_v2.py --var user_data --track-state --explain --file process.pyExample output:
🔄 **State Evolution Analysis for 'user_data'**:
**Type Changes Detected**: 'user_data' changes types: dict → UserModel.
This could indicate potential bugs or intentional polymorphic behavior.
**State Modifications**: 'user_data' is modified 4 times, including 2 changes inside loops.
⚠️ **Potential Issues Detected**:
• Variable may be None - add null checks
• Type changes detected: dict → UserModel
💡 **Analysis Summary**: Consider type annotations or validation to handle type changes safely.
Track the 4 state modifications to understand variable behavior.
Generate professional, self-contained HTML reports with interactive network visualizations:
./run_any_python_tool.sh data_flow_tracker_v2.py --var config --show-impact --output-html --file app.pyOutput: data_flow_impact_config_app_py.html
./run_any_python_tool.sh data_flow_tracker_v2.py --var total --show-calculation-path --explain --output-html --file calc.pyThis generates both:
- Interactive HTML visualization file
- Console explanation of the analysis
🎨 Professional Styling:
- Risk-based color coding (Red for high risk, Yellow for medium, Green for low)
- Modern, responsive design that works on all devices
- Professional typography and layout
🔍 Interactive Exploration:
- Click nodes to see detailed information about variables and operations
- Drag and zoom to explore complex dependency networks
- Toggle physics to freeze or animate the network layout
- Center view to reset the visualization focus
📊 Rich Visualizations:
- Impact Analysis: Shows source variable connected to all affected areas
- Calculation Path: Step-by-step flow with input dependencies clearly marked
- State Tracking: Timeline of type evolution and state changes
- Standard Analysis: Forward/backward dependency networks
💾 Export Capabilities:
- PNG Export: Save visualizations as high-quality images
- Self-Contained: No external dependencies - works offline
- Shareable: Email or share HTML files with team members
Impact Analysis Visualization:
./run_any_python_tool.sh data_flow_tracker_v2.py --var sensitive_data --show-impact --output-html --file security.py- Central node: The variable being analyzed
- Connected nodes: Return values (green), side effects (red), state changes (orange)
- Risk-based header colors and indicators
Calculation Path Visualization:
./run_any_python_tool.sh data_flow_tracker_v2.py --var algorithm_result --show-calculation-path --output-html --file compute.py- Linear flow showing calculation steps
- Input variables feeding into each step
- Clear progression from inputs to final result
State Tracking Visualization:
./run_any_python_tool.sh data_flow_tracker_v2.py --var dynamic_var --track-state --output-html --file evolving.py- Timeline of type evolution
- State change annotations with context (loop/conditional)
- Warning indicators for potential issues
# Before approving a PR, understand the full impact
./run_any_python_tool.sh data_flow_tracker_v2.py --var modified_variable --show-impact --explain --output-html --file changed_file.pyBenefit: Reviewers get both intuitive explanations and visual exploration tools
# When a bug is reported, trace the calculation path with explanations
./run_any_python_tool.sh data_flow_tracker_v2.py --var incorrect_result --show-calculation-path --explain --file buggy_module.pyBenefit: Clear English explanation of how the value is computed + visual trace
# Before refactoring, get risk assessment and visual impact map
./run_any_python_tool.sh data_flow_tracker_v2.py --var legacy_function --show-impact --explain --output-html --file old_code.pyBenefit: Risk level assessment + actionable testing recommendations + shareable impact visualization
# Generate visual documentation of complex algorithms
./run_any_python_tool.sh data_flow_tracker_v2.py --var complex_calculation --show-calculation-path --output-html --file algorithm.pyBenefit: Self-documenting code with interactive exploration for new team members
# Help new developers understand codebase dependencies
./run_any_python_tool.sh data_flow_tracker_v2.py --var core_component --show-impact --explain --output-html --file main.pyBenefit: Intuitive explanations make complex codebases approachable
# If final_result is wrong, trace back to find the issue
./run_any_python_tool.sh data_flow_tracker.py --var final_result --direction backward --file calc.py# Track where user input flows
./run_any_python_tool.sh data_flow_tracker.py --var user_input --file app.py --inter-procedural# Find all variables affected by expensive calculation
./run_any_python_tool.sh data_flow_tracker.py --var expensive_calc --file module.py# Verify no unintended dependencies
./run_any_python_tool.sh data_flow_tracker.py --var sensitive_data --file security.pyThe intelligence layer now powers automated documentation generation through doc_generator.py, which leverages data flow analysis to create intelligent documentation.
# Generate API documentation for functions
./run_any_python_tool.sh doc_generator.py --function calculatePrice --file pricing.py --style api-docs
# Create user-friendly guides for classes
./run_any_python_tool.sh doc_generator.py --class UserManager --file auth.py --style user-guide --depth deep
# Generate technical analysis documentation
./run_any_python_tool.sh doc_generator.py --module --file database.py --style technical --output html
# Quick reference cards
./run_any_python_tool.sh doc_generator.py --function process_data --file data.py --style quick-ref --format docstring
# Tutorial-style documentation
./run_any_python_tool.sh doc_generator.py --class APIClient --file client.py --style tutorial --depth medium- API Documentation (
--style api-docs): Technical reference with parameters, return values, and usage examples - User Guides (
--style user-guide): Friendly explanations accessible to all skill levels - Technical Analysis (
--style technical): Deep analysis with data flow, complexity metrics, and architectural insights - Quick Reference (
--style quick-ref): Concise format for immediate lookup - Tutorials (
--style tutorial): Educational approach with step-by-step guidance
- Markdown (
--format markdown): For documentation systems and README files - HTML (
--format html): For web documentation and reports - Docstring (
--format docstring): For inline Python documentation - reStructuredText (
--format rst): For Sphinx and other documentation generators
The documentation generator leverages the same data flow analysis used by the intelligence layer:
- Dependency Analysis: Shows what functions depend on and affect
- Complexity Assessment: Provides complexity warnings and refactoring suggestions
- Auto-Generated Examples: Creates contextually appropriate code samples
- Risk Assessment: Identifies high-complexity areas that need careful documentation
# 1. Analyze the data flow first
./run_any_python_tool.sh data_flow_tracker_v2.py --var config --show-impact --explain --file app.py
# 2. Generate comprehensive documentation
./run_any_python_tool.sh doc_generator.py --function setup_config --file app.py --style technical --depth deep
# 3. Create user-friendly guide
./run_any_python_tool.sh doc_generator.py --function setup_config --file app.py --style user-guide --format htmlThis creates a complete documentation suite: technical analysis for developers, visual impact analysis for code review, and user-friendly guides for broader audiences.