Related Code Files:
code-intelligence-toolkit/data_flow_tracker.py- The data flow analysis tool- Example code files in your project
This document provides real-world examples of using the Data Flow Tracker to analyze complex algorithms, debug calculations, and ensure data integrity in various types of applications.
You need to understand how input values flow through a weighted average calculation.
// WeightedCalculator.java
public class WeightedCalculator {
private double cumulativeWeight = 0;
private double cumulativeWeightedValue = 0;
public double updateAverage(double value, double weight) {
cumulativeWeightedValue += value * weight;
cumulativeWeight += weight;
double average = cumulativeWeightedValue / cumulativeWeight;
return average;
}
}# Track how value affects the final average
./run_any_python_tool.sh data_flow_tracker.py --var value --file WeightedCalculator.java
# Output:
# value affects:
# → cumulativeWeightedValue (line 7: cumulativeWeightedValue += value * weight)
# → average (line 9: average = cumulativeWeightedValue / cumulativeWeight)- Value directly impacts cumulativeWeightedValue
- Changes to value calculation will affect average accuracy
- Weight is equally important in the calculation
Analyze how resource allocation depends on various parameters.
# resource_manager.py
class ResourceManager:
def __init__(self):
self.total_resources = 100000
self.allocation_ratio = 0.02 # 2%
self.scale_factor = 10
def calculate_allocation(self, priority_score, demand_factor):
base_demand = abs(priority_score - demand_factor)
allocated_amount = self.total_resources * self.allocation_ratio
# Base allocation
base_allocation = allocated_amount / base_demand
# Apply scaling
scaled_allocation = base_allocation * self.scale_factor
# Apply maximum limit
max_allocation = self.total_resources * 0.3
final_allocation = min(scaled_allocation, max_allocation)
return final_allocation# Track what affects final allocation (backward analysis)
./run_any_python_tool.sh data_flow_tracker.py --var final_allocation --direction backward --file resource_manager.py
# Track impact of changing allocation_ratio (forward analysis)
./run_any_python_tool.sh data_flow_tracker.py --var allocation_ratio --file resource_manager.py --inter-procedural
# Generate visual dependency graph
./run_any_python_tool.sh data_flow_tracker.py --var base_allocation --format graph --file resource_manager.py > flow.dot
dot -Tpng flow.dot -o dependencies.pngfinal_allocation depends on:
← scaled_allocation (line 18: final_allocation = min(scaled_allocation, max_allocation))
← max_allocation (line 18)
← base_allocation (line 15: scaled_allocation = base_allocation * self.scale_factor)
← self.scale_factor (line 15)
← allocated_amount (line 12: base_allocation = allocated_amount / base_demand)
← base_demand (line 12)
← self.total_resources (line 10: allocated_amount = self.total_resources * self.allocation_ratio)
← self.allocation_ratio (line 10)
← priority_score (line 9: base_demand = abs(priority_score - demand_factor))
← demand_factor (line 9)
Debug complex data processing calculations to find bottlenecks.
// DataProcessor.java
public class DataProcessor {
private double processingThreshold = 0.65;
public double processData(Map<String, Integer> inputData,
Map<String, Integer> referenceData,
int batchSize) {
int inputSum = 0;
int referenceSum = 0;
// Sum input values
int count = 0;
for (Integer value : inputData.values()) {
inputSum += value;
if (++count >= batchSize) break;
}
// Sum reference values
count = 0;
for (Integer value : referenceData.values()) {
referenceSum += value;
if (++count >= batchSize) break;
}
// Calculate ratio
double totalValue = inputSum + referenceSum;
double processingRatio = inputSum / totalValue;
// Generate result
if (processingRatio > processingThreshold) {
return 1.0; // High priority
} else if (processingRatio < (1 - processingThreshold)) {
return -1.0; // Low priority
}
return 0.0; // Normal
}
}# 1. Track what affects the processing ratio
./run_any_python_tool.sh data_flow_tracker.py --var processingRatio --direction backward --file DataProcessor.java
# 2. See full dependency chain
./run_any_python_tool.sh data_flow_tracker.py --show-all --file DataProcessor.java
# 3. Track specific parameter impact
./run_any_python_tool.sh data_flow_tracker.py --var batchSize --file DataProcessor.javaThe analysis reveals that batchSize parameter affects both inputSum and referenceSum, but the counting logic might terminate early if data has fewer entries than requested.
Understand dependencies in a system that combines multiple components.
# multi_component_system.py
class MultiComponentSystem:
def __init__(self):
self.filter_window = 20
self.smoothing_factor = 14
self.amplification = 1.5
def process_signal(self, raw_data, timestamps):
# Calculate components
filtered = self.apply_filter(raw_data, self.filter_window)
smoothed = self.apply_smoothing(filtered, self.smoothing_factor)
baseline = sum(raw_data[-20:]) / 20
# Check conditions
above_baseline = raw_data[-1] > baseline
# Signal quality
quality_good = smoothed > 30 and smoothed < 70
# Amplitude check
current_amplitude = raw_data[-1]
amplitude_high = current_amplitude > baseline * self.amplification
# Combined result
if above_baseline and quality_good and amplitude_high:
signal_strength = (smoothed - 50) / 50 * self.calculate_confidence()
return ('PROCESS', signal_strength)
return ('SKIP', 0.0)# Analyze entire module with inter-procedural tracking
./run_any_python_tool.sh data_flow_tracker.py --show-all --file multi_component_system.py --inter-procedural
# Track specific component dependencies
./run_any_python_tool.sh data_flow_tracker.py --var filter_window --file multi_component_system.py
# See what affects the final signal
./run_any_python_tool.sh data_flow_tracker.py --var signal_strength --direction backward --file multi_component_system.pysignal_strengthdepends on smoothing calculation and confidence method- Changing
filter_windowaffects filtering but not signal strength directly amplificationis critical for signal generation
Identify calculation bottlenecks by tracing expensive operations.
// PerformanceCritical.java
public class PerformanceCritical {
private double previousValue = 0;
private double smoothingFactor = 0.1;
private double threshold = 0.001;
public boolean shouldProcess(double currentValue, long timestamp) {
// Expensive calculation 1
double valueChange = (currentValue - previousValue) / previousValue;
// Expensive calculation 2
double smoothedChange = smoothingFactor * valueChange +
(1 - smoothingFactor) * previousSmoothed;
// Expensive calculation 3
double complexity = calculateComplexity(dataHistory);
// Decision logic
boolean valueSignal = Math.abs(smoothedChange) > threshold;
boolean complexityOk = complexity < maxComplexity;
boolean timeValid = isWithinWindow(timestamp);
return valueSignal && complexityOk && timeValid;
}
}# Find what depends on expensive complexity calculation
./run_any_python_tool.sh data_flow_tracker.py --var complexity --file PerformanceCritical.java
# Check if any calculations are unused
./run_any_python_tool.sh data_flow_tracker.py --show-all --file PerformanceCritical.java | grep "Affects 0 variables"Before optimizing parameters, understand their impact radius.
# parameter_sensitive_algorithm.py
class Algorithm:
def __init__(self):
# Key parameters
self.window_size = 50
self.activation_threshold = 0.02
self.deactivation_threshold = 0.01
self.scale_factor = 1.0
def calculate_activation(self, data):
# Historical metrics
history = self.extract_window(data, self.window_size)
mean_value = sum(history) / len(history)
std_dev = self.calculate_std(history)
# Score calculation
current_value = (data[-1] - data[-2]) / data[-2]
score = (current_value - mean_value) / std_dev
# Activation logic
if score > self.activation_threshold:
output = self.scale_factor * (score - self.activation_threshold)
return min(output, 1.0) # Cap at 100%
return 0.0# See everything affected by window_size
./run_any_python_tool.sh data_flow_tracker.py --var window_size --file parameter_sensitive_algorithm.py
# Output shows:
# window_size affects:
# → history (via extract_window)
# → mean_value
# → std_dev
# → score
# → output
# This reveals window_size has wide impact!
# Compare with activation_threshold impact
./run_any_python_tool.sh data_flow_tracker.py --var activation_threshold --file parameter_sensitive_algorithm.py
# Output shows:
# activation_threshold affects:
# → output (only when score > activation_threshold)
# This shows activation_threshold has limited, conditional impactAlways analyze data flow before deploying changes:
# Check what a modified calculation affects
./run_any_python_tool.sh data_flow_tracker.py --var modified_calc --file module.pyEnsure critical parameters flow correctly to outputs:
# Verify parameter flows to final output
./run_any_python_tool.sh data_flow_tracker.py --var critical_param --file module.py --inter-proceduralWhen output is wrong, trace backwards:
# Start from wrong output and trace back
./run_any_python_tool.sh data_flow_tracker.py --var wrong_output --direction backward --file module.pyIdentify unused calculations:
# Find calculations that don't affect anything
./run_any_python_tool.sh data_flow_tracker.py --show-all --file module.py | grep "Affects 0"Create visual docs for complex algorithms:
# Generate complete dependency graph
./run_any_python_tool.sh data_flow_tracker.py --show-all --format graph --file complex_algorithm.py > algorithm.dot
dot -Tsvg algorithm.dot -o algorithm_flow.svgVariables like cumulativeSum have wide impact:
- Track with forward analysis
- Changes affect all downstream calculations
Variables like threshold have conditional impact:
- May not show in basic analysis
- Use
--show-allto see conditional dependencies
Window sizes affect many calculations:
- Critical to analyze before changing
- Often have cascading effects
Multipliers and scale factors amplify everything:
- Always verify their flow to final output
- Check for proper bounds/limits
# 1. View module structure
./run_any_python_tool.sh show_structure_ast.py Module.java
# 2. Find key calculations
./run_any_python_tool.sh find_text.py "calculate.*" --type regex --file Module.java
# 3. Analyze data flow
./run_any_python_tool.sh data_flow_tracker.py --var output --file Module.java
# 4. Check for issues
./run_any_python_tool.sh suggest_refactoring.py Module.java
# 5. Safe refactoring
./run_any_python_tool.sh replace_text_ast.py --file Module.java oldVar newVar- Large Modules: Use
--max-depthto limit analysis depth - Multiple Files: Analyze core calculation files first
- Java Code: Ensure javalang is installed
- Complex Expressions: Check JSON output for full expression details
- Performance: For large codebases, analyze specific variables rather than
--show-all