A modular reinforcement learning system for drone navigation — where emotions shape flight paths.
Envolate teaches a small drone lively, character-driven flight behavior using reinforcement learning.
Not just "fly from A to B" — but:
- A curious drone slowly circles around interesting objects
- A startled drone flinches away and keeps its distance
- A relaxed drone glides smoothly through space
- An excited drone flies fast, sharp curves
Emotions are not a gimmick — they are the interface between perception and movement.
[Camera + Sensors]
│
▼
┌───────────────────┐
│ HIGH LEVEL │ ← Situational awareness
│ │ Detects people, objects, obstacles
│ Output: │ Generates emotional state + goal
│ • Emotion │
│ • Target pos │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ MID LEVEL │ ← Trajectory Planner
│ │ Translates emotion → concrete flight path
│ "curious" → │ Circle around object, low velocity
│ "startled" → │ Evasion route, drop altitude
│ "excited" → │ Fast curves, high velocity
└────────┬──────────┘
│ next_pos (xyz) + next_vel (xyz)
▼
┌───────────────────┐
│ LOW LEVEL │ ← Motor Controller (RL Agent)
│ │ Knows NO emotions
│ Sees only: │ Blindly follows waypoints
│ • Target pos │ Translates to motor RPMs
│ • Desired vel │
│ │
│ Output: 4x RPM │
└───────────────────┘
Core principle: Emotions shape the route in the Mid Level. The Low Level is purely mechanical — it flies whatever it receives. New emotional behaviors require no Low Level retraining.
Trained in Genesis — a high-performance physics simulation with CUDA backend.
- Drone: Crazyflie cf2x (palm-sized)
- Parallelization: 64–256 environments simultaneously
- Hardware: GTX 1050 Ti (4GB VRAM)
The Low Level agent learns:
- Stable hovering
- Precise waypoint navigation
- Velocity tracking (foundation for flight style control)
Observation Space (20D):
| Variable | Dim |
|---|---|
| Relative position to waypoint | 3 |
| Desired velocity | 3 |
| Orientation (quaternion) | 4 |
| Linear velocity | 3 |
| Angular velocity | 3 |
| Last motor values | 4 |
Curriculum Training (6 stages):
| Stage | Style | Velocity |
|---|---|---|
| 1a | Hover | 0 m/s |
| 1b | Creeping | < 0.3 m/s |
| 1c | Gliding | 0.5–1.0 m/s |
| 1d | Precise / Fast | 2.0–3.0 m/s |
| 1e | Jumpy | Sudden changes |
| 1f | Mixed | Everything combined |
Trajectory planner: Emotion + target position → waypoints + velocities
Camera-based perception, object detection, emotion inference from scene
# Install Genesis (CUDA required)
pip install genesis-world
# Dependencies
pip install stable-baselines3 gymnasium torch tensorboard
# Train a single stage
cd src
python train_low_level.py --stage 1a --num_envs 64
# Run full curriculum
python train_low_level.py --all_stages
# Monitor training
tensorboard --logdir logs/src/
low_level_env.py # Genesis RL environment (20D obs, 6D commands)
train_low_level.py # PPO training via stable-baselines3, curriculum
Most drone AI projects optimize for efficiency: fastest path, minimum energy.
Envolate optimizes for character: a drone that feels alive.
This is not an academic project — it is an experiment in the question: Can a robot have a personality?
This is a solo side project and I'm learning as I go. If you have ideas, suggestions, or improvements — whether about the RL setup, the emotion architecture, Genesis-specific tricks, or anything else — please open an issue or PR. All feedback is genuinely appreciated.
Some open questions I'd love input on:
- Best way to represent emotional states as trajectory parameters in Mid Level?
- How to structure emotion → trajectory mapping (parameterized? separate networks per emotion?)
- Efficient reward shaping strategies for curriculum RL in Genesis
- Sim-to-real transfer strategies for the Crazyflie
Built with Genesis World · Reinforcement Learning · Python