Skip to content

Commit d620cc2

Browse files
authored
Merge pull request #13 from Boyeep/chore/add-community-files
Chore/add community files
2 parents eab367b + d1fdce5 commit d620cc2

3 files changed

Lines changed: 390 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ An SBOM workflow also publishes SPDX artifacts for the repository source plus th
155155
5. Split training and experimentation into a separate workspace later.
156156

157157
The short public roadmap lives in [soon.md](./soon.md).
158+
A sign-language adaptation roadmap for this template lives in [roadmap.md](./roadmap.md).
158159

159160
## Repository Standards
160161

roadmap.md

Lines changed: 371 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,371 @@
1+
# Sign-Language Roadmap For This Template
2+
3+
This roadmap answers a specific question:
4+
5+
What is the best way to turn this `Next.js + FastAPI` computer-vision template into a sign-language project without fighting the repo shape?
6+
7+
## Short Answer
8+
9+
For this template, the optimal path is:
10+
11+
1. prototype in `Colab` or a local notebook
12+
2. train a small model on landmarks, not raw images
13+
3. export the model to `ONNX`
14+
4. run inference in the FastAPI backend
15+
5. reuse the existing webcam and upload flows in the frontend
16+
6. keep the API contract stable while the model improves
17+
18+
That is the best fit for this repo when the goal is a usable MVP, especially for:
19+
20+
- a sign alphabet demo
21+
- a small vocabulary of static signs
22+
- a single-user webcam experience
23+
24+
It is not automatically the best path for:
25+
26+
- full sign-language translation
27+
- multi-person scenes
28+
- long video understanding
29+
- mobile-first deployment
30+
31+
## Scope Assumption
32+
33+
This roadmap assumes the first release is:
34+
35+
- one signer
36+
- webcam-first
37+
- real-time or near-real-time
38+
- a limited sign set
39+
- product demo quality before research-grade accuracy
40+
41+
If the target is full language understanding from day one, this roadmap should still be used as the starting path, but you should expect an additional sequence-model and dataset phase later.
42+
43+
## Core Principles
44+
45+
- keep the repo detection-first and inference-first
46+
- do training outside the runtime path
47+
- keep the backend responsible for model loading and output shaping
48+
- keep the frontend focused on capture, review, and feedback
49+
- preserve the API contract as long as possible
50+
- add complexity only when the current phase is clearly limiting you
51+
52+
## Why This Is The Optimal Path Here
53+
54+
This repo already gives you:
55+
56+
- webcam capture
57+
- image upload
58+
- a backend inference service
59+
- a typed API contract
60+
- a review-oriented frontend
61+
62+
The fastest way to make that useful for sign language is not to rebuild the whole stack. It is to swap the starter backend pipeline for a sign-focused pipeline and keep the rest of the product flow intact.
63+
64+
## Recommended Stack
65+
66+
- `MediaPipe Hand Landmarker` for the MVP
67+
- `PyTorch` for training
68+
- `ONNX` as the exported model format
69+
- `ONNX Runtime` for backend serving
70+
- `FastAPI` as the inference boundary
71+
- existing `Next.js` webcam and upload UI for the product layer
72+
73+
Why:
74+
75+
- landmarks are easier to learn from than full frames for a small sign set
76+
- webcam latency is better with local inference than a hosted API
77+
- `ONNX Runtime` is a strong deployment path from training into production
78+
- this fits the current repo without turning it into a research notebook dump
79+
80+
## What Not To Do First
81+
82+
- do not start with `YOLO` as the main recognizer for a single-person webcam demo
83+
- do not start by changing the frontend to run the whole model client-side
84+
- do not jump to full sentence-level sign translation before a static-sign baseline works
85+
- do not mix training notebooks and runtime inference code into the same backend module
86+
- do not add hosted model dependencies unless you are comfortable with latency and cost
87+
88+
## Phase 0: Define The Product Slice
89+
90+
Goal:
91+
92+
- pick a first version of the problem that this template can actually ship
93+
94+
Recommended choice:
95+
96+
- `ASL alphabet` or a `small sign set` of 10 to 30 classes
97+
98+
Deliverables:
99+
100+
- sign list
101+
- class naming convention
102+
- target frame size
103+
- camera assumptions
104+
- simple success metric such as top-1 accuracy plus prediction latency
105+
106+
Exit criteria:
107+
108+
- the team agrees on whether this is `static signs` or `dynamic signs`
109+
- the project has a clear demo target
110+
111+
## Phase 1: Prototype In Colab Or A Notebook
112+
113+
Goal:
114+
115+
- prove that the signs can be separated with a lightweight pipeline
116+
117+
Use:
118+
119+
- `Colab` if you want quick setup and easy sharing
120+
- local notebook if you want tighter control and local files
121+
122+
Tasks:
123+
124+
- collect or import a small labeled dataset
125+
- run `MediaPipe Hand Landmarker`
126+
- extract hand landmarks
127+
- build a baseline classifier in `PyTorch`
128+
- measure accuracy, confusion, and latency
129+
130+
Deliverables:
131+
132+
- one notebook that can reproduce baseline results
133+
- sample confusion matrix
134+
- saved training artifacts
135+
136+
Exit criteria:
137+
138+
- the model is clearly better than guessing
139+
- you know which labels are confused
140+
- you can export the trained model or reproduce the training run
141+
142+
## Phase 2: Separate Training From Runtime
143+
144+
Goal:
145+
146+
- stop treating the notebook as the product
147+
148+
Recommended repo shape:
149+
150+
- `notebooks/` for experiments
151+
- `training/` later if training becomes a real workspace
152+
- backend stays focused on inference only
153+
154+
Tasks:
155+
156+
- document dataset assumptions
157+
- save model version metadata
158+
- define reproducible preprocessing steps
159+
- export the best baseline to `ONNX`
160+
161+
Deliverables:
162+
163+
- `ONNX` model artifact
164+
- preprocessing notes
165+
- label map
166+
167+
Exit criteria:
168+
169+
- the model can be loaded outside the notebook
170+
- preprocessing is stable and documented
171+
172+
## Phase 3: Add A Sign Pipeline To The Backend
173+
174+
Goal:
175+
176+
- make the trained model available through the template's inference service
177+
178+
Best fit in this repo:
179+
180+
- add a new pipeline in `backend/app/vision/service.py`
181+
- keep model-specific loading behind the vision service boundary
182+
- reuse `backend/app/api/routes/inference.py`
183+
184+
Recommended first pipeline:
185+
186+
- `sign-static`
187+
188+
Tasks:
189+
190+
- load the `ONNX` model in the backend
191+
- run landmark extraction
192+
- run classification
193+
- return typed results
194+
- add tests for the pipeline behavior
195+
196+
Contract guidance:
197+
198+
- preserve the existing response shape where possible
199+
- use detections for hand boxes if available
200+
- use metrics for latency or handedness
201+
- if classification needs first-class output, add a clean typed field in `docs/openapi.yaml` instead of model-specific ad hoc fields
202+
203+
Deliverables:
204+
205+
- working backend sign pipeline
206+
- tests for known fixtures
207+
- updated API contract if needed
208+
209+
Exit criteria:
210+
211+
- the frontend can call the pipeline through the existing endpoint
212+
- the output is typed and documented
213+
214+
## Phase 4: Reuse The Existing Frontend
215+
216+
Goal:
217+
218+
- get value from the template instead of rewriting the UI
219+
220+
Use:
221+
222+
- `frontend/src/components/webcam-console.tsx`
223+
- `frontend/src/components/inference-console.tsx`
224+
225+
Tasks:
226+
227+
- add the new pipeline to the pipeline list
228+
- show the predicted sign prominently
229+
- show confidence and relevant metrics
230+
- optionally render hand boxes or landmarks
231+
- keep the review surface simple
232+
233+
Recommended UX for the first version:
234+
235+
- live prediction
236+
- confidence score
237+
- top alternative prediction
238+
- capture frame button
239+
- clear visual state when confidence is low
240+
241+
Exit criteria:
242+
243+
- a user can open the webcam page and get understandable predictions
244+
- the result panel feels product-shaped, not notebook-shaped
245+
246+
## Phase 5: Add Evaluation And Regression Checks
247+
248+
Goal:
249+
250+
- make the sign pipeline safe to change
251+
252+
Tasks:
253+
254+
- add fixture images or short frame sets
255+
- add snapshot-backed API responses when practical
256+
- measure latency in the backend
257+
- track per-class accuracy outside the runtime path
258+
259+
Deliverables:
260+
261+
- backend tests
262+
- sample evaluation report
263+
- performance notes
264+
265+
Exit criteria:
266+
267+
- you can change the model without guessing whether the app regressed
268+
269+
## Phase 6: Move From Static Signs To Dynamic Signs
270+
271+
Goal:
272+
273+
- support signs that depend on motion over time
274+
275+
When to do this:
276+
277+
- only after the static-sign path is stable
278+
279+
Recommended stack:
280+
281+
- `MediaPipe Holistic` or `hands + pose`
282+
- a sequence model such as `LSTM`, `GRU`, or a small `Transformer`
283+
284+
Tasks:
285+
286+
- collect short sign sequences
287+
- train a temporal model
288+
- decide whether the backend needs a frame window or short clip input
289+
- extend the API carefully if the current single-frame shape is no longer enough
290+
291+
Deliverables:
292+
293+
- `sign-sequence` pipeline
294+
- temporal confidence output
295+
- updated contract if frame windows are introduced
296+
297+
Exit criteria:
298+
299+
- the dynamic model beats the static baseline on motion-dependent signs
300+
301+
## Phase 7: Production Hardening
302+
303+
Goal:
304+
305+
- make the project reliable enough for real demos or deployment
306+
307+
Tasks:
308+
309+
- add model versioning
310+
- improve error handling for camera and input failures
311+
- benchmark CPU and memory usage
312+
- consider GPU or TensorRT only if latency actually requires it
313+
- add observability for inference timing and failure rates
314+
315+
Deliverables:
316+
317+
- versioned model loading
318+
- release notes for model changes
319+
- deployment checklist
320+
321+
Exit criteria:
322+
323+
- the app is repeatable, testable, and stable across environments
324+
325+
## Suggested Milestone Order
326+
327+
1. static-sign scope
328+
2. notebook baseline
329+
3. `ONNX` export
330+
4. backend `sign-static` pipeline
331+
5. webcam UI integration
332+
6. tests and evaluation
333+
7. dynamic-sign extension
334+
8. production hardening
335+
336+
## Decision Rules
337+
338+
- if one webcam user is the target, prefer landmarks before object detection
339+
- if you need full-body or facial context, move from hands-only to holistic features
340+
- if the notebook cannot reproduce results, do not integrate the model yet
341+
- if the frontend needs model-specific fields, add them through OpenAPI, not hidden assumptions
342+
- if latency is good enough on CPU, do not optimize infrastructure early
343+
344+
## Where To Put Things
345+
346+
- experiments: `notebooks/`
347+
- future repeatable training workspace: `training/`
348+
- inference integration: `backend/app/vision/`
349+
- contract updates: `docs/openapi.yaml`
350+
- generated frontend types: `frontend/src/generated/openapi.ts`
351+
- user-facing capture and review UI: `frontend/src/components/`
352+
353+
## Recommended First Release
354+
355+
The best first release for a sign-language adaptation of this template is:
356+
357+
- static signs only
358+
- webcam-first
359+
- one signer
360+
- local inference
361+
- typed backend contract
362+
- visible confidence score
363+
- clear fallback when confidence is low
364+
365+
That is realistic, demonstrable, and aligned with the template's strengths.
366+
367+
## Related Docs
368+
369+
- `docs/sign-language-template.md`
370+
- `docs/tooling.md`
371+
- `soon.md`

0 commit comments

Comments
 (0)