Spatial Text Editor Prototype: VisionOS Interactions on the Web

A web-based experiment exploring spatial computing interactions—without the headset.

This project simulates the magical interaction model of VisionOS (Eye Tracking + Hand Gestures + Voice) directly in the browser using standard webcams. It combines real-time computer vision with intelligent voice command processing to create a futuristic text editing experience.

🔗 Live Demo

Try it here: https://ky.yth.tw/ (Special thanks to Yi-Tang Huang for hosting and support)

🚀 The Concept

As spatial computing (AR/VR) becomes more prevalent, our interaction paradigms are shifting from "Point & Click" to "Look & Speak." This prototype proves that these rich, multimodal interactions can be built today using standard web technologies, making them accessible to anyone with a laptop.

Core Interaction Loop

Lift your hand to Select: Lift your right hand to move the cursor. Hover on the words you want to change. (Note: "Look to Select" will be introduced in v3.0, where your eyes act as the cursor.)
Pinch and hold to voice-replace: A simple hand gesture confirms your intent, separating "selection" from "action" to prevent accidental clicks (the Midas Touch problem).
Speak to Edit: Voice is not just for dictation—it's for command. Hold a pinch and speak to contextually replace words.

✨ Key Features

1. Multimodal Input System

The system fuses three distinct input streams in real-time:

Eye Tracking (Gaze): Uses WebGazer.js to track where you are looking on the screen.
Hand Tracking (Gesture): Uses MediaPipe to detect pinch gestures for clicking and holding.
Voice Command (Intent): Uses the Web Speech API for low-latency transcription.

2. Intelligent Semantic Editing

Unlike standard dictation, this editor understands context. It analyzes the sentence structure to perform smart replacements.

Context-Aware: If you select "Monday" and say "tomorrow", the system automatically removes the preposition "on" if it's no longer needed.
Grammar Correction: It handles articles, prepositions, and temporal modifiers automatically so you can speak naturally.

3. Immersive "Mixed Reality" Environments

To simulate the feeling of a headset, the application renders 3D environments using Gaussian Splatting.

Head-Coupled Parallax: The 3D scene adjusts based on your head position (tracked via webcam), creating a "window into a virtual world" effect on your flat 2D screen.
Foveated Rendering: Simulates human vision by keeping the area you're looking at sharp while blurring the periphery, increasing immersion and focus.

🛠 Hardware & Setup

No VR headset required.

Camera: A standard webcam is required for hand and face tracking.
Optimization: The Mixed Reality mode and tracking parameters are specifically calibrated for MacBook Webcams, though other high-quality webcams will work.
Environment: Good lighting is essential for accurate computer vision tracking.

💻 Running Locally

Clone the repository

git clone https://github.com/yourusername/spatial-text-editor.git
cd spatial-text-editor

Install dependencies
```
npm install
```
Start the development server
```
npm run dev
```
Open in Browser Navigate to http://localhost:5173 (or the port shown in your terminal).

🏗 Technical Stack

Framework: React + Vite
Computer Vision:
- MediaPipe Tasks Vision (Hand & Face Tracking)
- WebGazer.js (Eye Tracking)
3D Rendering:
- Three.js
- Gaussian Splats 3D (Photorealistic 3D Scenes)
Animation: Motion (formerly Framer Motion)
Styling: Tailwind CSS

For a detailed user guide on gestures and settings, please see InteractionGuide.md.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
components		components
docs		docs
media		media
public		public
src/types		src/types
types		types
.gitignore		.gitignore
.gitmessage		.gitmessage
InteractionGuide.md		InteractionGuide.md
README.md		README.md
dev.log		dev.log
dev_server.log		dev_server.log
entry.tsx		entry.tsx
figma-mock.ts		figma-mock.ts
index.css		index.css
index.html		index.html
main.tsx		main.tsx
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
repro_magic.js		repro_magic.js
server.log		server.log
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Text Editor Prototype: VisionOS Interactions on the Web

🔗 Live Demo

🚀 The Concept

Core Interaction Loop

✨ Key Features

1. Multimodal Input System

2. Intelligent Semantic Editing

3. Immersive "Mixed Reality" Environments

🛠 Hardware & Setup

💻 Running Locally

🏗 Technical Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spatial Text Editor Prototype: VisionOS Interactions on the Web

🔗 Live Demo

🚀 The Concept

Core Interaction Loop

✨ Key Features

1. Multimodal Input System

2. Intelligent Semantic Editing

3. Immersive "Mixed Reality" Environments

🛠 Hardware & Setup

💻 Running Locally

🏗 Technical Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages