diff --git a/.claude/rules/architecture/cpu.md b/.claude/rules/architecture/cpu.md deleted file mode 100644 index 79e837f..0000000 --- a/.claude/rules/architecture/cpu.md +++ /dev/null @@ -1,83 +0,0 @@ ---- -paths: hdl/cpu/** ---- - -# CPU Architecture - -**Last updated**: 2026-01-05 -**Sources**: [cpu.v](hdl/cpu/cpu.v), [cpu_core_params.vh](hdl/cpu/cpu_core_params.vh) - -RV32I soft core, 3-stage pipeline. No M/F/D extensions, no multiplication. - -## Pipeline Stages - -**Stage 1: Fetch/Decode/Execute** -- Fetch instruction (AXI), decode, execute ALU/comparator -- Outputs: `w_Alu_Result`, `w_Compare_Result`, `w_Instruction_Valid` - -**Stage 2: Memory/Wait** -- Issue AXI read/write for loads/stores -- Pipeline registers: `r_S2_*` (Valid, Alu_Result, Load_Data, Rd, Write_Enable) -- Stalls while memory operations complete - -**Stage 3: Writeback** -- Write ALU result, load data, immediate, or PC+4 to register file -- Writeback mux selects source based on `r_S3_Wb_Src` - -## Timing - -**Cycles per instruction**: Variable -- S1: 1 cycle (ALU/decode) -- S2: 0 cycles (no memory) or 2-4 cycles (load/store AXI transaction) -- S3: 1 cycle (writeback) - -Tests use `PIPELINE_CYCLES` from [tests/cpu/constants.py](tests/cpu/constants.py) as conservative wait. - -## Stall Logic - -```verilog -w_Stall_S1 = w_Debug_Stall - || !i_Init_Calib_Complete - || (r_S2_Valid && (w_S2_Is_Load || w_S2_Is_Store) - && !(w_Mem_Read_Done || w_Mem_Write_Done)); -``` - -CPU stalls when: -- `w_Debug_Stall`: Debug peripheral halted CPU -- `!i_Init_Calib_Complete`: DDR3 MIG not ready -- Memory op in progress: S2 has valid load/store waiting for AXI completion - -## Hazards - -**Status**: No hazard detection or forwarding implemented. - -**Workaround**: Tests insert NOPs or wait `PIPELINE_CYCLES` between dependent instructions. - -## PC (Program Counter) - -**Normal**: `PC += 4` after instruction completes -**Branch taken**: `PC = PC + immediate` -**Jump**: `PC = target address` -**Reset**: `PC = 0` - -Mux control: `w_Pc_Alu_Mux_Select` chooses between `PC+4` and `w_Alu_Result` - -## Register File - -32 registers × 32 bits (XLEN=32) -- Read ports: Rs1, Rs2 (from instruction[19:15], [24:20]) -- Write port: Rd (r_S3_Rd), enabled by `w_Wb_Enable` -- Sources: ALU, comparator, immediate, PC+4, load data -- Register 0 always reads 0 (RISC-V spec) - -See [register_file.v](hdl/cpu/register_file/register_file.v) - -## Memory Interface - -Two separate AXI4-Lite masters: -1. **Instruction memory**: Fetch-only (read) -2. **Data memory**: Loads/stores - -No error handling - assumes all transactions succeed. - -See [memory.md](memory.md) for AXI protocol details. diff --git a/.claude/rules/architecture/memory.md b/.claude/rules/architecture/memory.md deleted file mode 100644 index 0acc8e2..0000000 --- a/.claude/rules/architecture/memory.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -paths: - - hdl/cpu/memory/** - - hdl/cpu/instruction_memory/** ---- - -# Memory Architecture - -**Last updated**: 2026-01-05 -**Sources**: [memory_axi.v](hdl/cpu/memory/memory_axi.v), [memory.vh](hdl/cpu/memory/memory.vh) - -AXI4-Lite memory interface for CPU instruction/data access. - -## Memory Map - -| Region | Start | End | Size | Backing | Notes | -|--------|-------|-----|------|---------|-------| -| ROM | `0x0000` | `0x0FFF` | 4 KB | BRAM | Bootstrap, read-only | -| RAM | `0x1000` | (varies) | 256 MB | DDR3 (MIG) | Main memory, stack, heap | -| Peripherals | TBD | TBD | TBD | Memory-mapped | Debug UART (future) | - -**ROM boundary**: `ROM_BOUNDARY_ADDR = 0x1000` - see [memory.vh](hdl/cpu/memory/memory.vh) and [tests/cpu/constants.py](tests/cpu/constants.py) - -## AXI State Machine - -**States**: `IDLE` → `READ_SUBMITTING` → `READ_AWAITING` → `READ_SUCCESS` -**Write**: `IDLE` → `WRITE_SUBMITTING` → `WRITE_AWAITING` → `WRITE_SUCCESS` - -**Latency**: 2-4 cycles (BRAM fast, DDR3 slower) - -## Load/Store Types - -**Supported**: -- `LW/SW`: 32-bit word -- `LH/LHU/SH`: 16-bit halfword (signed/unsigned) -- `LB/LBU/SB`: 8-bit byte (signed/unsigned) - -**Byte alignment**: AXI write strobes (`wstrb`) enable byte-level writes without read-modify-write. Load data extraction uses `i_Addr[1:0]` offset with sign-extension for LB/LH. - -See [memory_axi.v](hdl/cpu/memory/memory_axi.v) for alignment logic. - -## Access Patterns - -**Instruction fetch**: -- Address < 0x1000: Fast BRAM access -- Address >= 0x1000: AXI transaction to DDR3 -- Interface: `s_instruction_memory_axil_*` (read-only) - -**Data load/store**: -- Typically RAM (ROM is read-only) -- Interface: `s_data_memory_axil_*` (read/write) - -## Constants - -Constants defined in `.vh` files: -- [memory.vh](hdl/cpu/memory/memory.vh): `LS_TYPE_*`, state machine states, ROM boundary -- [cpu_core_params.vh](hdl/cpu/cpu_core_params.vh): Register widths, control signal widths - -Python mirror: [tests/cpu/constants.py](tests/cpu/constants.py) - must stay in sync with `.vh` files. - -## Current Status - -- DDR3 operational @ 81.25 MHz (MIG initialized 2026-01-04) -- No memory protection (CPU can write to ROM, slave may ignore) -- No alignment checks (misaligned loads/stores may behave unexpectedly) diff --git a/.claude/rules/architecture/mig-vivado.md b/.claude/rules/architecture/mig-vivado.md deleted file mode 100644 index efc8b2f..0000000 --- a/.claude/rules/architecture/mig-vivado.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -paths: - - hdl/reset_timer.v - - config/arty-s7-50.xdc ---- - -# MIG DDR3 Configuration (Arty S7-50) - -**Last updated**: 2026-01-05 -**Status**: ✅ MIG CALIBRATION SUCCESSFUL - DDR3 functional @ 81.25 MHz - -## Critical Success Factors - -**MUST HAVE** for DDR3 calibration: -1. **200 MHz reference clock** to MIG `clk_ref_i` (MANDATORY for IDELAYCTRL - won't calibrate without it) -2. **Bank 34 only** for all DDR3 signals (SSTL135 @ 1.35V) -3. **200µs reset hold time** for MIG `sys_rst` (20,000 cycles @ 100 MHz) -4. CPU reset from `ui_clk_sync_rst`, NOT `peripheral_reset` (stays HIGH) - -**Vivado project**: NOT in repository (binary files, too large). Recreate from notes below if needed. - -## Working MIG Configuration - -**Memory part**: MT41K128M16XX-15E -- 16-bit DDR3L, 128 Mb, -15E speed grade, **1.35V operation** -- I/O standard: **SSTL135** (NOT SSTL15) -- Bank: **Bank 34 only** (all byte groups: DQ[0-15], Address/Ctrl) -- Internal Vref: ENABLED (0.675V for Bank 34) - -**MIG Parameters**: -- AXI interface: 128-bit data width (SmartConnect converts from CPU's 32-bit) -- Address width: 28-bit -- Input clock period: 10000 ps (100 MHz) → `sys_clk_i` -- Memory clock: 3077 ps (324.99 MHz, MIG-generated internally) -- Reference clock: **200 MHz (5000 ps)** → `clk_ref_i` ⚠️ CRITICAL -- PHY ratio: 4:1 -- **UI clock: 81.25 MHz** (324.99 MHz ÷ 4) - CPU runs at this speed - -## Clock Architecture - -**Input**: 12 MHz from board oscillator (pin F14, LVCMOS33) - -**Clock Wizard** (MMCM): -- VCO: 12 MHz × 50 = 600 MHz -- Output 1: **100 MHz** (÷6) → MIG `sys_clk_i` + reset_timer -- Output 2: **200 MHz** (÷3) → MIG `clk_ref_i` ⚠️ CRITICAL - -**MIG-generated**: -- Memory interface: 324.99 MHz (internal) -- UI clock: **81.25 MHz** (CPU domain) - -## Reset Architecture - -**Custom reset timer** ([reset_timer.v](hdl/reset_timer.v)): -- Counts **20,000 cycles @ 100 MHz = 200µs** -- Holds MIG `sys_rst` LOW during startup (ACTIVE-LOW reset) -- Releases when count completes -- Parameters: `COUNTER_WIDTH=15`, `HOLD_CYCLES=20000` - -**CPU reset**: -- Connected to MIG's `ui_clk_sync_rst` (ACTIVE-HIGH, synchronized to ui_clk) -- ❌ **NOT using** `proc_sys_reset_0/peripheral_reset` (stays perpetually HIGH - known issue) - -## Bank Selection - CRITICAL - -**Why Bank 34 only**: -- Bank 34: Powered at **1.35V** for DDR3L (SSTL135) -- Bank 15: Has RGB LEDs requiring **3.3V** (LVCMOS33) - voltage conflict with DDR3 -- Bank 14: UART signals (**3.3V** LVCMOS33) -- **Separate banks = independent VCCO rails** = no voltage conflict - -**All DDR3 signals must be on Bank 34**: -- DQ[0-7] (Byte Group T0) -- DQ[8-15] (Byte Group T1) -- Address/Control-0 (Byte Group T2) -- Address/Control-1 (Byte Group T3) - -## Key Lessons - -1. **200 MHz ref_clk is MANDATORY**: DDR3 WILL NOT calibrate without it (IDELAYCTRL requirement) -2. **Bank voltage isolation**: Check board schematic for VCCO rail voltages before assigning pins -3. **SSTL135 for DDR3L**: Use SSTL135 (1.35V), NOT SSTL15 (1.5V) - wrong I/O standard prevents calibration -4. **Reset timing matters**: MIG requires minimum 200µs reset hold time -5. **ui_clk_sync_rst for CPU**: Use MIG's `ui_clk_sync_rst`, not Processor System Reset IP (broken output) - -## Vivado Block Diagram Components - -**If recreating from scratch**: - -1. **Clock Wizard**: - - Input: 12 MHz - - Outputs: 100 MHz (sys_clk), 200 MHz (ref_clk) - -2. **Reset Timer** (custom Verilog): - - Input: 100 MHz clock, Clock Wizard `locked` - - Output: ACTIVE-LOW reset to MIG `sys_rst` - - Hold: 20,000 cycles - -3. **MIG 7-Series**: - - Part: MT41K128M16XX-15E - - Clocks: 100 MHz sys_clk_i, 200 MHz clk_ref_i - - AXI: 128-bit interface - - Bank: 34 (SSTL135) - - Internal Vref: ENABLED - -4. **AXI SmartConnect**: - - Masters: CPU instruction + data (32-bit each) - - Slave: MIG (128-bit) - - Handles width conversion - -5. **Processor System Reset**: - - Generates AXI reset signals for MIG/SmartConnect - - **Do NOT use for CPU reset** (use ui_clk_sync_rst instead) - -## Troubleshooting - -**Calibration fails**: -- Check 200 MHz ref_clk connected to MIG `clk_ref_i` -- Verify Bank 34 for all DDR3 pins -- Verify SSTL135 I/O standard (not SSTL15) -- Check reset hold time (minimum 200µs) - -**Wrong data/corruption**: -- Verify AXI connections (SmartConnect to MIG) -- Check ui_clk domain crossing -- Verify CPU reset from ui_clk_sync_rst - -**Build errors**: -- Vivado project not in repo - must recreate block diagram -- Constraint file: [arty-s7-50.xdc](config/arty-s7-50.xdc) has pin assignments - -## Reference - -**Board**: Arty S7-50 (xc7s50-csga324, speed grade -1) -**Memory**: 256 MB DDR3L @ 1.35V (MT41K128M16XX-15E) -**Oscillator**: 12 MHz (pin F14) - -See Arty S7 reference manual for schematic and VCCO rail assignments. diff --git a/.claude/rules/debug/debug.md b/.claude/rules/debug/debug.md deleted file mode 100644 index 4fe6bbd..0000000 --- a/.claude/rules/debug/debug.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -paths: - - hdl/debug_peripheral/** - - tools/debugger/** ---- - -# Debug Protocol - -**Last updated**: 2026-01-05 -**Sources**: [debug_peripheral.v](hdl/debug_peripheral/debug_peripheral.v), [debug_peripheral.vh](hdl/debug_peripheral/debug_peripheral.vh) - -UART debug peripheral for CPU control via serial commands (115200 baud, 8N1). - -## Overview - -**Module**: [debug_peripheral.v](hdl/debug_peripheral/debug_peripheral.v) - -**Ports**: -- `i_Uart_Tx_In` - UART RX from host (host → FPGA) -- `o_Uart_Rx_Out` - UART TX to host (FPGA → host) -- `o_Halt_Cpu` - Stops CPU when high -- `o_Reset_Cpu` - Holds CPU in reset when high -- `i_PC[31:0]` - Program counter (for READ_PC command) - -## Command Set - -Single-byte opcodes: - -| Opcode | Command | Action | Response | -|--------|---------|--------|----------| -| `0x00` | NOP | No operation | None | -| `0x01` | RESET | Assert CPU reset | None | -| `0x02` | UNRESET | Deassert CPU reset | None | -| `0x03` | HALT | Halt CPU | None | -| `0x04` | UNHALT | Resume CPU | None | -| `0x05` | PING | Test connectivity | `0xAA` | -| `0x06` | READ_PC | Read program counter | 4 bytes (little-endian) | -| `0x07` | WRITE_PC | Write PC (stub) | None | -| `0x08` | READ_REGISTER | Read register (stub) | TBD | -| `0x09` | WRITE_REGISTER | Write register (stub) | TBD | - -**Implemented**: NOP, RESET, UNRESET, HALT, UNHALT, PING, READ_PC -**Stubs**: WRITE_PC, READ_REGISTER, WRITE_REGISTER (opcodes defined, logic incomplete) - -## State Machine - -**States**: `IDLE` → `DECODE_AND_EXECUTE` → `IDLE` - -**Flow**: -1. IDLE: Wait for UART byte -2. DECODE_AND_EXECUTE: Execute opcode, queue response (if any), return to IDLE - -**Output buffer**: 256-byte FIFO for responses (PING → `0xAA`, READ_PC → 4 bytes, etc.) - -## UART Timing - -**Baud rate**: 115200 bps -**CPU clock**: 81.25 MHz (MIG ui_clk) -**Clocks per bit**: 81,250,000 / 115,200 ≈ **706 clocks** - -**Modules**: [uart_receiver.v](hdl/debug_peripheral/uart_receiver.v), [uart_transmitter.v](hdl/debug_peripheral/uart_transmitter.v) - -**Interface**: -- RX: `o_Rx_DV` pulses for 1 cycle when byte received, `o_Rx_Byte` contains data -- TX: Assert `i_Tx_DV` for 1 cycle with `i_Tx_Byte`, wait for `o_Tx_Done` pulse - -## Go Debugger Tool - -**Location**: [tools/debugger/](tools/debugger/) -**Run**: `go run tools/debugger/main.go` - -**Status**: -- ✓ Halt, Unhalt, Reset, Unreset, Ping implemented in Go tool -- ✗ Read PC, Write PC, Read/Write Register not yet in tool -- ✓ All basic commands work on FPGA (PING returns `0xAA`) - -**Opcode constants**: See [opcodes.go](tools/debugger/opcodes.go) - must match [debug_peripheral.vh](hdl/debug_peripheral/debug_peripheral.vh) - -## Testing - -**Integration tests**: See [tests/cpu/integration_tests/](tests/cpu/integration_tests/) -- [test_debug_ping.py](tests/cpu/integration_tests/test_debug_ping.py) - PING command verification -- [test_debug_read_pc.py](tests/cpu/integration_tests/test_debug_read_pc.py) - READ_PC command verification - -**Test pattern**: -```python -from cpu.utils import uart_send_byte, uart_wait_for_byte -from cpu.constants import DEBUG_OP_PING - -# Send PING -await uart_send_byte(dut.i_Clock, dut.i_Uart_Tx_In, dut.cpu.debug.uart_rx.o_Rx_DV, DEBUG_OP_PING) - -# Wait for response -response = await uart_wait_for_byte(dut.i_Clock, dut.o_Uart_Rx_Out, dut.cpu.debug.uart_tx.o_Tx_Done) - -assert response == 0xAA # PING_RESPONSE_BYTE -``` - -## Pin Assignments - -**UART**: See [arty-s7-50.xdc](config/arty-s7-50.xdc) -- TX (FPGA → host): Pin D10, Bank 14, LVCMOS33 -- RX (host → FPGA): Pin A9, Bank 14, LVCMOS33 - -**Bank 14**: 3.3V I/O (separate from Bank 34's 1.35V DDR3) - -## Future Extensions - -**Register access**: WRITE_PC, READ_REGISTER, WRITE_REGISTER need: -- Ports to CPU register file (`o_Reg_Write_Enable`, `o_Reg_Write_Addr`, `o_Reg_Write_Data`) -- Multi-byte command support (opcode + address + data) -- Currently commented out in [debug_peripheral.v](hdl/debug_peripheral/debug_peripheral.v) - -**Memory access**: Read/write arbitrary addresses -**Breakpoints**: Trigger halt on PC match -**Single-step**: Execute one instruction then halt - -See commented-out ports in [debug_peripheral.v](hdl/debug_peripheral/debug_peripheral.v) lines 144-176 for register access stubs. diff --git a/.claude/rules/process.md b/.claude/rules/process.md deleted file mode 100644 index 0ec7f18..0000000 --- a/.claude/rules/process.md +++ /dev/null @@ -1,126 +0,0 @@ -# Documentation Process - -**Last updated**: 2026-01-05 - -This document defines how to write and maintain documentation for this project. - -## Core Principle - -Update documentation **continuously as you learn**, without explicit instruction. When user corrects your understanding, STOP and update docs immediately before continuing. - -Proactive notification: Alert user when identifying opportunities for skills/agents/MCP servers, missing documentation, or structural improvements. - -## Key Guidelines - -**1. Specificity over vagueness** -- ✓ "Run `make test` from `/home/emma/gpu/tests/`" -- ✗ "You might want to run tests" - -**2. Keep it short** -- Target: ~200 words per section, ~2000 words per file -- Don't document granular details (individual test files, function implementations) -- Code should be self-documenting - -**3. Front-load the why** -- ✓ "Use Verilator for fast simulation + cocotb integration. See `tests/Makefile`." -- ✗ "Verilator is used. It has features." - -**4. Avoid over-constraint** -- ✓ "Prefer editing test files when debugging" -- ✗ "NEVER modify HDL without tests" -- Exception: Unsafe actions warrant clear prohibitions - -**5. Don't over-optimize for LLMs** -- Trust contextual understanding -- Skip pedantic rules that add verbosity without clarity -- Expand ambiguous acronyms only (not UART, DDR3, BRAM) - -## When to Update - -Update immediately when: -- Discovering patterns, gotchas, or state changes -- Fixing errors or ambiguities -- Adding new modules/tests/tools -- Learning why something works (or doesn't) -- **User corrects you** - STOP, update docs BEFORE continuing -- Realizing a guideline is wrong/pedantic - fix it - -Before commits: Verify doc timestamps match source file mtimes. - -## When to Reorganize - -Only if: -- Information is in wrong place -- Two docs overlap (consolidate) -- File exceeds ~2000 words (split with links) -- New logical groupings emerge - -Do NOT rewrite repeatedly for style - preserve learned context. - -## Evaluation Checklist - -After updates, verify: -1. **Specificity**: Can someone follow without questions? -2. **Clarity**: Is path to answer obvious? -3. **Brevity**: Could this be shorter without losing meaning? -4. **Structure**: Right place in hierarchy? -5. **Completeness**: Success and failure paths covered? - -If "no" to any, revise before finishing. - -## Language & Tone - -- **Imperative**: "Run tests" not "you can run tests" -- **Concrete**: "ALU doesn't handle SRA; add test" not "there might be issues" -- **Honest**: "Blocked on MIG initialization. Here's why." - -## Safe Editing - -✓ **Safe**: -- Update docs when learning -- Add sections for new modules -- Fix typos, clarify sentences -- Link to external resources -- Add test/command examples - -✗ **Unsafe**: -- Delete information (move/consolidate instead) -- Break links between docs -- Add outdated/speculative info - -## What to Document - -- **Patterns**: Cross-cutting behaviors, common approaches -- **Setup**: Environment, tools, commands -- **Architecture**: Module purposes, how they fit together -- **Constraints**: Critical requirements (DDR3 bank selection, timing) -- **Gotchas**: Non-obvious issues, known bugs - -Don't document: -- Individual test files (only testing patterns) -- Function-level implementations (read code) -- Lists of every file (use git ls-files) -- Obvious information Claude can infer - -## Path-Scoped Rules - -This project uses path-scoped rules in `.claude/rules/`: -- Files auto-load when working with matching paths -- Reduces token usage (only load relevant context) -- YAML frontmatter specifies paths: - -```yaml ---- -paths: hdl/cpu/** ---- -``` - -Keep rules focused and under word targets. - -## Critical: This Document Applies to Itself - -When revising this file: -1. Does new guidance conflict with existing rules? -2. Is example clear and actionable? -3. Could future Claude follow unambiguously? -4. Rewrite if unclear before committing. diff --git a/.claude/rules/testing/tests.md b/.claude/rules/testing/tests.md deleted file mode 100644 index 87b403a..0000000 --- a/.claude/rules/testing/tests.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -paths: tests/** ---- - -# Test Environment - -**Last updated**: 2026-01-05 -**Sources**: [Makefile](tests/Makefile), [utils.py](tests/cpu/utils.py), [constants.py](tests/cpu/constants.py) - -cocotb (Python) + Verilator (C++) test framework for CPU verification. - -## Running Tests - -```bash -cd tests -source ./test_env/bin/activate # CRITICAL: Activate venv first -make TEST_TYPE=unit # Unit tests only -make TEST_TYPE=integration # Integration tests only -make TEST_TYPE=all # Both (cleans between runs) -make TEST_TYPE=integration TEST_FILE=test_add_instruction # Single test -``` - -**Must activate venv** - tests fail with import errors otherwise. - -## Test Types - -**Unit tests** ([tests/cpu/unit_tests/](tests/cpu/unit_tests/)): -- Test individual modules (ALU, register file, control unit, memory) -- Harness: `cpu_unit_tests_harness.v` -- Examples: `test_arithmetic_logic_unit.py`, `test_comparator_unit.py` - -**Integration tests** ([tests/cpu/integration_tests/](tests/cpu/integration_tests/)): -- Test full CPU instruction execution (fetch → decode → execute → writeback) -- Harness: `cpu_integration_tests_harness.v` -- Examples: `test_add_instruction.py`, `test_beq_instruction.py`, `test_lw_instruction.py` - -## Common Test Pattern - -```python -import cocotb -from cocotb.clock import Clock -from cocotb.triggers import ClockCycles -from cpu.utils import gen_r_type_instruction, write_instructions -from cpu.constants import * - -@cocotb.test() -async def test_add_instruction(dut): - """Test ADD R-type instruction""" - - # Start clock - clock = Clock(dut.i_Clock, 1, "ns") - cocotb.start_soon(clock.start()) - - # Generate ADD instruction: rd=3, rs1=1, rs2=2 - add_instr = gen_r_type_instruction( - rd=3, funct3=FUNC3_ALU_ADD_SUB, rs1=1, rs2=2, funct7=0 - ) - - # Write to ROM - write_instructions(dut.cpu.rom_memory, 0x0, [add_instr]) - - # Set register values - dut.cpu.register_file.registers[1].value = 5 - dut.cpu.register_file.registers[2].value = 3 - - # Reset - await reset_cpu(dut) - - # Wait for instruction completion - await ClockCycles(dut.i_Clock, PIPELINE_CYCLES) - - # Verify result - assert dut.cpu.register_file.registers[3].value == 8 -``` - -## Utilities (tests/cpu/utils.py) - -**Instruction generators** - create RISC-V instruction encodings: -- `gen_r_type_instruction(rd, funct3, rs1, rs2, funct7)` - R-type (ADD, SUB, AND, OR, XOR, SLT, shifts) -- `gen_i_type_instruction(opcode, rd, funct3, rs1, imm)` - I-type (ADDI, loads, JALR) -- `gen_s_type_instruction(funct3, rs1, rs2, imm)` - S-type (stores) -- `gen_b_type_instruction(funct3, rs1, rs2, offset)` - B-type (branches) -- `gen_u_type_instruction(opcode, rd, imm)` - U-type (LUI, AUIPC) -- `gen_j_type_instruction(rd, imm)` - J-type (JAL) - -**Memory helpers**: -- `write_word_to_mem(mem_array, addr, value)` - 32-bit little-endian write -- `write_half_to_mem(mem_array, addr, value)` - 16-bit little-endian -- `write_byte_to_mem(mem_array, addr, value)` - 8-bit -- `write_instructions(mem_array, base_addr, instructions)` - Write instruction list -- `write_instructions_rom(mem_array, base_addr, instructions)` - ROM variant (word-indexed) - -**UART helpers**: -- `uart_send_byte(clock, i_rx_serial, o_rx_dv, data_byte)` - Send byte over UART RX -- `uart_send_bytes(clock, i_rx_serial, o_rx_dv, byte_array)` - Send multiple bytes -- `uart_wait_for_byte(clock, i_tx_serial, o_tx_done)` - Receive byte from UART TX - -**Reset/setup**: -- `reset_cpu(dut)` - Reset CPU and wait for DDR3 calibration -- `setup_cpu_test(dut)` - Clock + reset - -## Constants (tests/cpu/constants.py) - -**Don't duplicate constant values in docs** - reference the file instead. - -**Contains**: -- Opcodes: `OP_R_TYPE`, `OP_I_TYPE`, `OP_LOAD`, `OP_STORE`, `OP_B_TYPE`, `OP_J_TYPE`, etc. -- Function codes: `FUNC3_ALU_ADD_SUB`, `FUNC3_BRANCH_BEQ`, etc. -- ALU selectors: `ALU_SEL_ADD`, `ALU_SEL_SUB`, `ALU_SEL_AND`, etc. -- Debug opcodes: `DEBUG_OP_HALT`, `DEBUG_OP_PING`, `DEBUG_OP_READ_PC`, etc. -- Timing: `CLOCK_FREQUENCY`, `UART_BAUD_RATE`, `UART_CLOCKS_PER_BIT`, `PIPELINE_CYCLES` -- Memory: `ROM_BOUNDARY_ADDR = 0x1000` - -## UART Timing - -**Baud rate**: 115200 -**CPU clock**: 81.25 MHz (MIG ui_clk) -**Clocks per bit**: 81,250,000 / 115,200 ≈ **706 clocks** - -Use `uart_send_byte()` / `uart_wait_for_byte()` from [utils.py](tests/cpu/utils.py) - timing handled internally. - -## Makefile - -**Auto-discovery**: -- Finds all `.v` and `.vh` files: `find $(SRC_DIR) -name "*.v" -o -name "*.vh"` -- Adds all subdirectories as Verilator include paths - -**Key variables**: -- `SIM=verilator` - Simulator -- `TOPLEVEL` - Top-level module (set by TEST_TYPE) -- `MODULE` - Python test modules to run -- `VERILOG_SOURCES` - All Verilog files - -## Debugging Tests - -**Waveforms**: Verilator generates `.vcd` files - view with GTKWave -**Logging**: cocotb has built-in logging (`dut._log.info()`) -**ILA cores**: For FPGA debugging (not sim), see [arty-s7-50.xdc](config/arty-s7-50.xdc) - -**Common issues**: -- Import errors: Activate venv (`source test_env/bin/activate`) -- Timing failures: Increase wait cycles (`PIPELINE_CYCLES` is conservative) -- UART failures: Check clock frequency matches constant (`81.25 MHz`) diff --git a/.github/workflows/test.yml b/.github/workflows/tests.yml similarity index 100% rename from .github/workflows/test.yml rename to .github/workflows/tests.yml diff --git a/CLAUDE.md b/CLAUDE.md index 2383b80..8d52a37 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,61 +1,50 @@ # GPU FPGA Project -**Last updated**: 2026-01-05 +Minimal computer on Arty S7-50: RV32I CPU + VGA + UART debug. -Minimal computer on Arty S7-50 FPGA: RISC-V RV32I soft core + VGA video + UART debug. - -## Current Status - -- **CPU**: RV32I (no M/F/D extensions), unit + integration tests passing -- **Memory**: DDR3 operational @ 81.25 MHz (MIG initialized 2026-01-04) -- **Video**: VGA module exists, framebuffer not yet DDR3-backed -- **Debug**: UART debug peripheral working (`tools/debugger`) -- **Blocker**: None (DDR3 now functional) - -## Quick Start +## Commands ```bash -# Run tests +# Run tests (must activate venv first) cd tests && source test_env/bin/activate && make -# Debug via UART -go run tools/debugger/main.go - -# Build (future - not yet set up) -# cd tools/compiler && make -``` +# Test types: unit, integration, vga, all +make TEST_TYPE=unit +make TEST_TYPE=integration +make TEST_TYPE=vga +make TEST_TYPE=all -## Key Directories +# Single test file +make TEST_TYPE=integration TEST_FILE=test_add_instruction -- `hdl/` - Verilog sources (cpu/, debug_peripheral/, vga_out.v, framebuffer.v, gpu.v) -- `tests/` - Verilator + cocotb tests (unit_tests/, integration_tests/) -- `tools/` - debugger/ (Go UART CLI), compiler/ (placeholder) -- `config/` - arty-s7-50.xdc (pin constraints, clocks, ILA debug) -- `docs/` - Human-facing setup guides - -## Documentation System +# Debug tool +cd tools && go run ./debugger +``` -This project uses **path-scoped rules** in `.claude/rules/` that auto-load when you work with matching files: +## Memory Map -- **Always loaded**: `.claude/rules/process.md` (documentation workflow) -- **When editing CPU**: `.claude/rules/architecture/cpu.md` -- **When editing memory**: `.claude/rules/architecture/memory.md` -- **When editing tests**: `.claude/rules/testing/tests.md` -- **When editing debug**: `.claude/rules/debug/debug.md` -- **When editing constraints**: `.claude/rules/architecture/mig-vivado.md` +``` +0x80000000 - 0x80000FFF: Boot ROM (4KB BRAM) +0x80001000 - 0x87F1DFFF: RAM (~127MB DDR3) +0x87F1E000 - 0x87F8EFFF: Framebuffer 0 (640x480x12bpp) +0x87F8F000 - 0x87FFFFFF: Framebuffer 1 (640x480x12bpp) +``` -You don't need to manually read docs - the relevant rules load automatically based on which files you're working with. +- PC starts at 0x80000000 on reset +- ROM_BOUNDARY_ADDR = 0x80000FFF +- Framebuffers are 4K-aligned for DMA -## Critical Constraints +## Constraints -- **DDR3**: Requires 200 MHz ref_clk, Bank 34 only (voltage isolation) -- **UART**: 115200 baud @ 81.25 MHz ≈ 706 clocks/bit -- **Memory map**: CPU base 0x80000000, ROM 0x80000000-0x80000FFF, RAM 0x80001000+, FB at end -- **Pipeline**: 3-stage, no hazard detection (insert NOPs manually) +- CPU runs at 81.25 MHz (MIG ui_clk) +- UART: 115200 baud, 706 clocks/bit +- 3-stage pipeline, no hazard detection - insert NOPs between dependent instructions +- DDR3 requires 200 MHz ref_clk, Bank 34 only (1.35V) -## Next Steps +## Gotchas -1. Boot CPU from DDR3 (load program, execute) -2. Connect framebuffer to DDR3 -3. Add game controller peripheral -4. Network peripheral (TBD) +- Tests fail with import errors if venv not activated +- CPU reset must use MIG's `ui_clk_sync_rst`, NOT `peripheral_reset` +- No alignment checks - misaligned loads/stores behave unexpectedly +- Constants in `.vh` files must stay in sync with `tests/cpu/constants.py` +- Vivado project not in repo - recreate from `config/arty-s7-50.xdc` diff --git a/README.md b/README.md index f08daa4..e2c0354 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # RISC-V FPGA Computer -[![Tests](https://github.com/DustTheory/computer/actions/workflows/test-coverage.yml/badge.svg)](https://github.com/DustTheory/computer/actions/workflows/test-coverage.yml) +[![Tests](https://github.com/DustTheory/computer/actions/workflows/tests.yml/badge.svg)](https://github.com/DustTheory/computer/actions/workflows/tests.yml) Building a computer system from scratch on an FPGA, for fun. Features a custom RISC-V RV32I soft-core CPU, VGA video output, and game peripherals. @@ -10,19 +10,16 @@ A minimal computer system targeting the Arty S7-50 FPGA: - Custom RISC-V RV32I CPU core (no multiply/divide, no floating point) - VGA video output (640x480) - DDR3 memory interface -- Game-focused peripherals (TBD) This is a learning project to understand computer architecture from the ground up. ## Current Status -**Working on**: Booting CPU from DDR3 - -- CPU core: RV32I implemented and passing tests -- Memory: DDR3 operational @ 81.25 MHz -- Testing: 57 unit tests + 50+ integration tests passing -- Video: VGA module done, framebuffer designed (not yet DDR3-backed) -- Debug: UART debug peripheral working (`tools/debugger/`) +- **CPU**: RV32I core with 3-stage pipeline, no caches, no M extension yet +- **Memory**: DDR3 via AXI4-Lite (full AXI4 not implemented yet) +- **Video**: 640x480 VGA with double-buffered framebuffer, VDMA for display +- **Debug**: UART interface enables halting, stepping, register/memory inspection +- **Input**: Not implemented ## Development Approach @@ -44,23 +41,27 @@ While auxiliary tools like the debugger are coded with AI assistance, the CPU it - `hdl/` - Verilog source files (CPU, video, peripherals) - `tests/` - Verilator + cocotb tests - `tools/` - Debug utilities and toolchain -- `docs/` - Architecture docs and guides - `config/` - FPGA constraints -## Running Tests +## Setup -Test dependencies: Verilator, Python 3, cocotb +**Dependencies**: Verilator, Python 3, Go + +```bash +# Create Python venv and install test dependencies +cd tests +python3 -m venv test_env +source test_env/bin/activate +pip install cocotb pytest +``` + +## Running Tests ```bash cd tests source test_env/bin/activate make TEST_TYPE=unit # Run unit tests make TEST_TYPE=integration # Run integration tests +make TEST_TYPE=vga # Run VGA tests make TEST_TYPE=all # Run all tests ``` - -## Documentation - -- [docs/getting-started.md](docs/getting-started.md) - Setup and getting started -- [docs/architecture.md](docs/architecture.md) - CPU details, memory map, and system design -- [CLAUDE.md](CLAUDE.md) - Project context for AI assistants diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index fae91cc..0000000 --- a/docs/architecture.md +++ /dev/null @@ -1,83 +0,0 @@ -# Architecture Overview - -This document provides a high-level overview of the system architecture. - -## CPU Core (RV32I) - -The CPU is a custom RISC-V implementation: - -- 32-bit RISC-V base integer instruction set (RV32I) -- No multiplication/division (no M extension) -- No floating point (no F/D extensions) -- Harvard architecture with separate instruction/data paths -- Pipeline stages: Fetch → Decode → Execute → Memory → Writeback - -For detailed CPU internals, see [../ai/cpu-architecture.md](../ai/cpu-architecture.md). - -## Memory Map - -``` -0x80000000 - 0x80000FFF: Boot ROM (4KB BRAM, internal to CPU) -0x80001000 - 0x87F1DFFF: General RAM (~127MB DDR3 via MIG) -0x87F1E000 - 0x87F8EFFF: Framebuffer 0 (462,848 bytes, 640x480x12bpp) -0x87F8F000 - 0x87FFFFFF: Framebuffer 1 (462,848 bytes, 640x480x12bpp) -``` - -Key addresses: -- CPU_BASE_ADDR: 0x80000000 (PC starts here on reset) -- ROM_BOUNDARY_ADDR: 0x80000FFF (last ROM address) -- RAM_START_ADDR: 0x80001000 (first DDR3 address) -- Framebuffers are 4K-aligned for DMA compatibility - -See [../ai/memory-map.md](../ai/memory-map.md) for detailed memory layout. - -## Video System - -- VGA output: 640x480 @ 60Hz -- Dual framebuffer for tear-free rendering -- Pixel format: 12-bit RGB (4 bits per channel) - -The video system uses double buffering to prevent tearing. While one framebuffer is being displayed, the CPU can write to the other. A register controls which buffer is active. - -## Debug Interface - -The system includes a UART-based debug peripheral for development: - -- Protocol: 115200 baud, 8N1 -- Commands: halt, resume, reset, read/write registers, read/write memory -- See [../ai/debug-protocol.md](../ai/debug-protocol.md) for protocol details - -Debug tool is in `tools/debugger/` (written in Go). - -## System Block Diagram - -``` -┌─────────────────────────────────────────────────────┐ -│ Top Level (gpu.v) │ -│ │ -│ ┌──────────┐ ┌──────────────┐ │ -│ │ CPU Core │────────▶│ Memory │ │ -│ │ (RV32I) │ │ Interface │ │ -│ └──────────┘ │ (MIG DDR3) │ │ -│ │ └──────────────┘ │ -│ │ │ -│ ├──────────────▶ Framebuffer ──────┐ │ -│ │ │ │ -│ └──────────────▶ Debug Peripheral │ │ -│ ▼ │ -│ ┌──────────┐ │ -│ │ VGA Out │────▶ Monitor -│ └──────────┘ │ -└─────────────────────────────────────────────────────┘ -``` - -## Development Roadmap - -Current focus is on getting the system booting and running basic programs: - -1. **DDR3 Memory** (current) - Get MIG working for CPU memory access -2. **Boot ROM** - Simple boot code to initialize system -3. **Framebuffer Integration** - Connect video output to DDR3-backed buffer -4. **Peripherals** - Add game-specific I/O (buttons, audio, etc.) - -Once the basic system is working, the focus will shift to writing programs and games for it. \ No newline at end of file diff --git a/docs/getting-started.md b/docs/getting-started.md deleted file mode 100644 index ad3fda0..0000000 --- a/docs/getting-started.md +++ /dev/null @@ -1,105 +0,0 @@ -# Getting Started - -This guide will help you set up the test environment and run your first simulation. - -## Prerequisites - -For testing and simulation, you'll need: - -1. **Verilator** (for simulation) - ```bash - # Ubuntu/Debian - sudo apt-get install verilator - - # macOS - brew install verilator - ``` - -2. **Python 3** (for test framework) - ```bash - # Ubuntu/Debian - sudo apt-get install python3 python3-pip - - # macOS - brew install python3 - ``` - -3. **cocotb** (Python test framework) - ```bash - pip3 install cocotb pytest - ``` - -4. **Go** (for debug tools, optional) - ```bash - # Ubuntu/Debian - sudo apt-get install golang - - # macOS - brew install go - ``` - -## Running Your First Test - -1. Clone the repository and navigate to the test directory: - ```bash - cd tests - ``` - -2. Run the complete test suite: - ```bash - make - ``` - - You should see output like: - ``` - Running CPU unit tests... - ✓ ALU test passed - ✓ Register file test passed - ... - All tests passed! - ``` - -3. Run specific test categories: - ```bash - make cpu # CPU tests only - make integration # Integration tests only - ``` - -4. Clean build artifacts: - ```bash - make clean - ``` - -## Understanding Test Output - -When tests run, you'll see: - -- **PASS**: Test succeeded -- **FAIL**: Test failed (check error messages) - -## Project Structure at a Glance - -``` -hdl/ # All Verilog source files -├── cpu/ # CPU core modules -├── debug_peripheral/ # Debug interface -└── *.v # Other system modules - -tests/ # All tests -├── cpu/unit_tests/ # Test individual modules -└── cpu/integration_tests/ # Test full instructions - -tools/ # Utilities -├── debugger/ # UART debug tool -└── compiler/ # Toolchain (TBD) - -docs/ -├── ai/ # Technical specs -└── everyone/ # This guide -``` - -## Next Steps - -- **To understand the CPU**: See [../ai/cpu-architecture.md](../ai/cpu-architecture.md) -- **To write tests**: See [../ai/test-guide.md](../ai/test-guide.md) -- **To debug hardware**: See [../ai/debug-protocol.md](../ai/debug-protocol.md)