Skip to content
View Dev-next-gen's full-sized avatar
🫣
I am not here
🫣
I am not here

Block or report Dev-next-gen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Dev-next-gen/README.md

Leo Camus — @Dev-next-gen

AI Infrastructure · Multi-GPU ROCm · Independent Research · Offensive Security

Full Stack Development

Paris, France · Self-taught · No degree · Full stack from silicon to inference.


I build systems that run at the edge of what's technically possible — locally, at scale, without compromise. No cloud dependency, no abstraction layers hiding the truth. Every component understood, every parameter owned.

From founding a SaaS startup at 26, to operating a 300+ GPU farm on-site in Ukraine, to deploying 80B LLMs on self-hosted ROCm infrastructure and publishing independent AI research — every step was built from scratch, under real constraints.


Projects

flux-amd-rocm — FLUX.1-dev at parity with NVIDIA on AMD RDNA3

4-GPU Megatron-style tensor parallelism · 51 s/image @ 1024² · 11 GB/GPU. Int8 quantization + async group offloading on a single RX 7800 XT · 80 s · 12.5 GB VRAM.

diffusers-rocm-parallel — Multi-GPU inference stack for AMD

Tensor parallel FLUX on 5× RX 7800 XT (gfx1101) · ring attention LSE shape fix · Ulysses context parallel.

openclaw — Autonomous bug bounty pipeline

Multi-agent orchestration · Qwen3 80B + 14B · fully local · recon → scan → CVSS → report.

CAMUS Theory — Independent AI Research

Graft-based temporal cognition in frozen LLMs. TemporalAdapter (<0.6% params) grafted at mid-depth via forward pre-hook. R² ≈ 0.9 for log-time decoding from 1B parameters. ~5-dimensional subspace invariant across model sizes. Validated on TinyLlama-1.1B and Qwen2.5-14B in under 30 minutes for $0.83.


Background

  • 2020–2022 — On-site GPU infrastructure engineer, 300+ GPU production facility, Kyiv, Ukraine. End-to-end hardware deployment, network architecture, 24/7 uptime under real production constraints.
  • 2019 — Founded and shipped a full SaaS repair management platform solo (350+ pages, logistics, billing, payments). Shut down by Covid.
  • 2022–now — Freelance AI infra, security research, independent publications.

Stack

Compute       5× AMD RX 7800 XT (gfx1101) · 80 GB VRAM · ROCm 7.1
              Custom builds: rocWMMA · FA_ALL_QUANTS · HIP_GRAPHS
Inference     PyTorch · diffusers · torchao · llama.cpp · vLLM · 38 t/s @ 80B ctx 262K
ML            Tensor parallelism · group offloading · int8/int4 · Triton kernels
Security      nuclei · subfinder · katana · httpx · Burp Suite Pro · responsible disclosure
Systems       Python · Rust · Node.js · Next.js · FastAPI · PostgreSQL · Supabase · Docker

Products

SaaS platforms, mobile apps, full-stack web. Recent deliveries:

  • Email marketing platform — self-hosted, SPF/DKIM/DMARC, warm-up automation, 10/10 deliverability on first test
  • Yoga studio app — React/Vite, Supabase auth, booking system, deployed in production
  • Hyperlocal marketplace — mobile app, real-time geolocation, neighbor-to-neighbor listings
  • OSINT platformosint-platform — open-source Palantir alternative, 6-tier data ingestion, entity graph, real-time analysis

Stack: Python · Node.js · Rust · Next.js · React · FastAPI · PostgreSQL · Supabase · Docker · Stripe · REST APIs

Infrastructure

CPU     2× Intel Xeon E5-2698 v4 — 80 threads
RAM     512 GB ECC
GPU     5× AMD RX 7800 XT (gfx1101) — 80 GB VRAM total
NVMe    Multi-drive storage array
OS      Ubuntu · ROCm 7.1
Net     10 GbE local · self-hosted services

Open to research collabs, freelance infra missions, or projects that shouldn't exist yet.

Pinned Loading

  1. gpu-cluster-lab gpu-cluster-lab Public

    AMD/NVIDIA GPU cluster infrastructure — ~300 GPU deployment, ROCm, kernel tuning, multi-node benchmarking

    2

  2. local-llm-stack local-llm-stack Public

    Production-grade local LLM deployment stack — llama.cpp, Ollama, GGUF/GGML, ROCm AMD, 14B to 80B models

    2

  3. osint-platform osint-platform Public

    Open-source intelligence platform — Palantir Gotham alternative. 6-level source integration, ontology graph, real-time threat analysis.

    6

  4. camus-theory camus-theory Public

    The CAMUS Theory: Emergent Temporal Cognition in Language Models — DOI: 10.5281/zenodo.19509846

    TeX 2

  5. diffusers-rocm-parallel diffusers-rocm-parallel Public

    Multi-GPU tensor/context parallel diffusion on AMD ROCm — with the patch that makes it actually work.

    Python 2

  6. flux-amd-rocm flux-amd-rocm Public

    FLUX.1-dev on AMD Radeon consumer GPUs — fast, low-VRAM, and shippable. Backport patches + benchmarks for torchao + diffusers group_offload on ROCm.

    Python 2