-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathlean-optimizer.html
More file actions
179 lines (164 loc) · 8.89 KB
/
lean-optimizer.html
File metadata and controls
179 lines (164 loc) · 8.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>lean-optimizer — Todd Espy</title>
<link rel="stylesheet" href="styles.css" />
</head>
<body>
<div class="layout">
<main class="content">
<div class="banner">
<h1>lean-optimizer</h1>
<p>Parallel Optimization Engine for Algorithmic Trading</p>
</div>
<p>
<a class="contact-link" href="index.html">← Back to portfolio</a>
|
<a class="contact-link" href="https://github.com/t-espy/lean-optimizer-public" target="_blank">View on GitHub</a>
</p>
<h2>Overview</h2>
<p>
lean-optimizer is a production-grade parallel optimization engine for
<a class="contact-link" href="https://www.quantconnect.com/lean" target="_blank">QuantConnect LEAN</a>
algorithmic trading strategies. It compiles a C# strategy once, spins up a pool of warm Docker
containers with persistent .NET harness processes, and searches the parameter space via a
configurable multi-stage pipeline.
</p>
<h2>Measured Performance</h2>
<p>
Head-to-head comparison against LEAN CLI's built-in <code>lean optimize --strategy "grid search"</code>
on a 7-parameter space (679,140 grid points), 15 parallel workers, same strategy, same data:
</p>
<table>
<tr>
<th></th>
<th>LEAN CLI Grid Search</th>
<th>lean-optimizer (GA)</th>
</tr>
<tr>
<td><strong>Search strategy</strong></td>
<td>Exhaustive grid (all 679,140 points)</td>
<td>Incremental genetic algorithm</td>
</tr>
<tr>
<td><strong>Evaluations</strong></td>
<td>679,140</td>
<td>611</td>
</tr>
<tr>
<td><strong>Throughput (15 workers)</strong></td>
<td>2.15 evals/sec</td>
<td>3.82 evals/sec</td>
</tr>
<tr>
<td><strong>Wall time</strong></td>
<td>~88 hours (extrapolated from 1,772 evals in 825s)</td>
<td>2 min 40 sec</td>
</tr>
<tr>
<td><strong>Search efficiency</strong></td>
<td>1x</td>
<td>1,112x fewer evaluations</td>
</tr>
</table>
<p>
The two advantages are independent and compound: the persistent harness yields ~1.8x higher
throughput per worker, and the GA's directed search converges in 1,112x fewer evaluations.
Combined: what takes LEAN CLI's grid search an estimated ~88 hours completes in under 3 minutes.
</p>
<h2>Architecture</h2>
<p>The pipeline searches the parameter space via a configurable sequence of stages:</p>
<ol>
<li><strong>Latin Hypercube Sampling (LHS)</strong> — space-filling initial exploration</li>
<li><strong>Bayesian Optimization (Optuna TPE)</strong> — directed search with batch ask/tell</li>
<li><strong>Incremental Genetic Algorithm</strong> — population-based refinement with worst
replacement and early stopping</li>
<li><strong>Local Grid Search</strong> — Chebyshev-distance neighborhood refinement around top
candidates</li>
</ol>
<p>
Stages can be enabled or disabled independently. The GA can run standalone or seeded from
prior stages. All stages share a common <code>BatchRunner</code> interface and deduplication cache.
</p>
<h2>Persistent Harness</h2>
<p>
The core innovation is the persistent .NET harness. Each Docker container runs a long-lived
<code>dotnet /Harness/LeanHarness.dll</code> process that reads JSON requests from stdin, runs
backtests with full LEAN state reset between runs, and writes JSON responses to stdout. This
eliminates per-eval .NET startup, assembly loading, and LEAN engine initialization overhead.
</p>
<p>
The harness uses LEAN's regression test reset sequence:
<code>Config.Reset()</code>, <code>Composer.Instance.Reset()</code>,
<code>SymbolCache.Clear()</code> — the same sequence LEAN's own test suite uses to run
multiple backtests in one process. <code>Console.Out</code> is redirected to stderr so LEAN's
trace/debug output doesn't corrupt the JSON protocol.
</p>
<h2>Key Design Decisions</h2>
<ul>
<li><strong>Compile-once</strong> — Strategy source files are content-hashed. If the hash matches
a cached build, compilation is skipped. Never compile per-evaluation.</li>
<li><strong>Warm workers</strong> — <code>WorkerPool</code> starts N long-lived Docker containers
at init. Workers are acquired/released via a blocking queue — never create/destroy containers
per backtest.</li>
<li><strong>Incremental GA</strong> — Uses worst-replacement, not generational. Children only enter
the population if they beat the current worst member. Best fitness is monotonically
non-decreasing.</li>
<li><strong>Checkpoint/resume</strong> — Atomic JSON checkpoints after each stage. Resume with
<code>--resume</code> to skip completed stages and reuse their results.</li>
<li><strong>Injected stages</strong> — All stages implement <code>OptimizationStage</code> and
receive a <code>BatchRunner</code> callable. Stages never touch Docker, workers, or
compilation directly.</li>
</ul>
<h2>Background</h2>
<p>
The genetic algorithm at the core of lean-optimizer is the same class of algorithm I developed at
Togai InfraLogic in the early 1990s, where my research in fuzzy logic and genetic algorithms was
published at IEEE and NASA conferences. Three decades later, the same optimization principles run
in production against live market data.
</p>
<h2>Tech Stack</h2>
<ul>
<li>Python 3.12, Optuna, NumPy, SciPy</li>
<li>C# / .NET 9.0 (LEAN engine, persistent harness)</li>
<li>Docker (warm container pool, read-only data mounts)</li>
<li>NVIDIA DGX Spark (20-core, 128GB)</li>
<li>153 passing tests (pytest)</li>
</ul>
</main>
<aside class="sidebar" id="contact">
<h2>Contact</h2>
<p>Feel free to reach out:</p>
<ul style="list-style: none; padding-left: 0;">
<li><strong>Email:</strong> <span id="email"></span></li>
<li><strong>LinkedIn:</strong> <a href="https://www.linkedin.com/in/toddespy/"
target="_blank">linkedin.com/in/toddespy</a></li>
</ul>
<h2 style="margin-top: 2rem;">Certifications</h2>
<div class="cert-badges">
<a href="https://www.credly.com/badges/ef070385-9cd4-4596-8289-5796bf4c61e4/public_url" target="_blank">
<img src="images/aws-certified-ai-practitioner-early-adopter600x600.png"
alt="AWS Certified AI Practitioner" width="180" />
</a>
<a href="https://www.credly.com/badges/0aa4f899-e591-4644-96ed-97cb862f94ab/public_url" target="_blank">
<img src="images/aws-certified-machine-learning-engineer-associate-e.png"
alt="AWS Certified ML Engineer" width="180" />
</a>
<a href="https://www.credly.com/badges/6e18930b-2286-423c-a406-49bb93acd98d/public_url" target="_blank">
<img src="images/aws-certified-cloud-practitioner.png" alt="AWS Certified Cloud Practitioner"
width="160" />
</a>
</div>
</aside>
</div>
<script>
(function () {
var u = "todd.espy", d = "gmail.com", e = u + "@" + d;
var s = document.getElementById("email");
if (s) s.innerHTML = '<a href="mailto:' + e + '">' + e + '</a>';
})();
</script>
</body>
</html>