Skip to content

Yashsingh045/COREP-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COREP Reporting Assistant

AI-powered regulatory reporting assistant for UK banks. Converts natural-language scenarios into populated COREP templates using GPT-4o-mini, regulatory text retrieval, and automated validation.

Prototype Scope: Demonstrates C 01.00 (Own Funds) template only

🎯 Features

  • Regulatory Text Retrieval - Hybrid search (keyword + semantic) over PRA Rulebook & EBA COREP instructions
  • LLM Integration - GPT-4o-mini generates structured JSON with justifications
  • Validation Engine - Automated checks for mandatory fields, ranges, and cross-field consistency
  • HTML Rendering - Color-coded templates with hover tooltips
  • Audit Logging - Complete JSON audit trail for compliance
  • CLI Interface - Command-line tool for rapid testing

🏗️ Architecture

Natural Language → Retrieval → LLM → Validation → HTML + Audit
     Query          (pgvector)   (GPT-4o)  (Rules)    (Jinja2)

🌍 Using the Deployed Prototype (No Setup Required)

If you just want to test the assistant without setting up anything locally, use these links:

Quick Test via cURL

curl -X POST https://corep-assistant.onrender.com/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the CET1 capital?",
    "scenario": "Bank has £500m in ordinary shares.",
    "template": "C_01_00"
  }'

🚀 Quick Start Locally

Prerequisites

  • Python 3.12+
  • PostgreSQL 14+ with pgvector
  • OpenAI API key

Installation

# 1. Clone repository
git clone https://github.com/Yashsingh045/COREP-Assistant.git
cd COREP-Assistant

# 2. Install PostgreSQL + pgvector (macOS)
brew install postgresql@14
brew services start postgresql@14

# 3. Create database
createdb corep_assistant

# 4. Backend setup
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 5. Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

# 6. Initialize database and populate sample data
python db/schema.py
python populate_db_mock.py  # Uses mock embeddings (or populate_db.py for real)

# 7. Start backend
python main.py

Backend runs at http://localhost:8000

� Detailed Usage Flow

Follow these steps to experience the full prototype workflow:

Step 1: Input Scenario via CLI

Open a new terminal and ask a regulatory question about a capital scenario:

source backend/venv/bin/activate
python cli/query.py \
  --question "What is the Common Equity Tier 1 capital?" \
  --scenario "The bank has £500m in ordinary shares and £200m in retained earnings."

Step 2: Review Structured JSON Output

The CLI will display a JSON response containing:

  • Populated Fields: Row 010 populated with £700m.
  • Justification: Explanation of why the shares and earnings were combined per CRR Article 26.
  • Validation: Signals if the math is correct or if fields are missing.

Step 3: Check Audit Trails

Every query is logged for compliance. Browse the recent logs:

python cli/view_logs.py --limit 5

Step 4: Visualize as HTML Template

To see what a human analyst would see in a COREP form:

cd backend
python test_render.py

Open the generated file file:///tmp/corep_c01_sample.html in your browser to see the color-coded table and hover tooltips.

Step 5: Run Batch Test Scenarios

To see how the engine handles complex or incomplete data:

bash tests/test_scenarios.sh

�📖 Usage

CLI Query

source backend/venv/bin/activate

python cli/query.py \
  --question "What are the Tier 1 capital components?" \
  --scenario "Bank has £500m CET1 capital and £100m AT1 instruments"

Output: JSON with populated fields, justifications, and validation warnings

View Audit Logs

python cli/view_logs.py          # Show 10 recent logs
python cli/view_logs.py --limit 20
python cli/view_logs.py --log-id 20260209_123456_789012

Run Test Scenarios

bash tests/test_scenarios.sh

🔌 API Endpoints

Core Endpoints

Endpoint Method Description
/health GET Health check with system info
/api/retrieve POST Retrieve regulatory paragraphs
/api/analyze POST Analyze scenario and generate COREP output
/api/render POST Render COREP output as HTML

Example: Analyze Scenario

curl -X POST http://localhost:8000/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the capital components?",
    "scenario": "Bank has £500m CET1 and £100m AT1",
    "template": "C_01_00",
    "top_k": 5
  }'

Response:

{
  "template": "C_01_00",
  "fields": [
    {
      "row": "010",
      "metric_name": "Common Equity Tier 1 capital",
      "value": 500000000.0,
      "currency": "GBP",
      "status": "populated",
      "justification": "Bank has £500m CET1 capital...",
      "source_paragraphs": ["CRR Article 26", "COREP C0100_010"]
    }
  ],
  "validation_warnings": []
}

📁 Project Structure

COREP-Assistant/
├── backend/
│   ├── main.py                 # FastAPI application
│   ├── config.py               # Configuration management
│   ├── requirements.txt
│   ├── api/                    # API endpoints
│   │   ├── analyze.py         # Scenario analysis
│   │   ├── retrieve.py        # Text retrieval
│   │   └── render.py          # HTML rendering
│   ├── db/                     # Database
│   │   ├── schema.py          # PostgreSQL + pgvector schema
│   │   └── loader.py          # Data loading utilities
│   ├── llm/                    # LLM integration
│   │   ├── client.py          # OpenAI API wrapper
│   │   ├── prompts.py         # Prompt templates
│   │   └── schema.py          # Pydantic output models
│   ├── retrieval/              # Retrieval system
│   │   ├── embeddings.py      # OpenAI embeddings
│   │   └── search.py          # Hybrid search
│   ├── validation/             # Validation engine
│   │   └── engine.py          # Validation rules
│   ├── renderer/               # HTML rendering
│   │   └── template.py        # Jinja2 templates
│   └── audit/                  # Audit logging
│       └── logger.py          # JSON audit logger
├── cli/
│   ├── query.py               # CLI query tool
│   └── view_logs.py           # Log viewer
├── data/
│   └── pra_corep_c01.json     # Sample regulatory text (10 docs)
├── tests/
│   └── test_scenarios.sh      # E2E test scenarios
└── logs/                       # Audit trail (generated)

🧪 Testing

Unit Tests

cd backend
source venv/bin/activate

# Test validation engine
python test_validation.py

# Test HTML rendering
python test_render.py

End-to-End Tests

# Run all 4 test scenarios
bash tests/test_scenarios.sh

Test Scenarios:

  1. Basic CET1 + AT1 capital
  2. Complete own funds with T2
  3. Missing Tier 2 data
  4. Edge case: Zero AT1

🎨 HTML Output

The /api/render endpoint generates professional HTML with:

  • Color-coded status:
    • 🟢 Green (populated)
    • 🔴 Red (missing)
    • 🟡 Yellow (inconsistent)
  • Hover tooltips with justifications and regulatory sources
  • Validation warnings section
  • Responsive design

📊 Validation Rules

  1. Mandatory Fields - Rows 010, 030, 050 must be populated
  2. Numeric Ranges - Detects negative/unreasonable values
  3. Data Types - Ensures capital fields are numeric
  4. Consistency - Validates:
    • T1 (030) = CET1 (010) + AT1 (020)
    • Total (050) = T1 (030) + T2 (040)

🔍 Sample Data

10 regulatory documents (PRA Rulebook + EBA COREP):

  • CRR Articles on capital definitions
  • COREP C 01.00 instructions
  • Own funds calculation rules

📝 Environment Variables

# .env file
OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://localhost/corep_assistant
OPENAI_MODEL=gpt-4o-mini
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
ENVIRONMENT=development

🚧 Limitations

  • Prototype scope: C 01.00 template only
  • Mock embeddings: Due to OpenAI quota, using random embeddings for demo
  • Sample data: 10 regulatory documents (production would need full rulebook)
  • No authentication: Not production-ready

🔮 Future Enhancements

  • Support for all COREP templates (C 02.00, C 03.00, etc.)
  • Real-time OpenAI embeddings (requires quota increase)
  • React frontend UI
  • Multi-user authentication
  • Export to Excel/PDF
  • Regulatory update tracking

📚 Tech Stack

Component Technology
Backend FastAPI, Python 3.12
Database PostgreSQL 14 + pgvector
LLM OpenAI GPT-4o-mini
Embeddings OpenAI text-embedding-3-small
Validation Pydantic
Templates Jinja2
CLI Python argparse, httpx

📄 License

Prototype for demonstration purposes.

👤 Author

Yash Singh - GitHub


Built with ❤️ for regulatory reporting automation

About

COREP Assistant supports the interpretation and preparation of COREP (Common Reporting) regulatory returns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors