Evala is an internal AI experimentation workspace for teams that need more than a chat box.
It organizes prompt work around a simple product flow:
Tasks -> Prompts -> Experiments -> Library
The gallery below is generated from the actual app UI with Playwright and published to docs/screenshots/latest. It covers guest flows, invitations, registration, unverified email verification, the main workspace, prompt iteration, experiments, library views, admin pages, and account screens in both themes.
Dark theme gallery
Light theme gallery
Instead of keeping prompts in chats, docs, or random notes, Evala gives teams one place to:
- define business-facing AI tasks
- version prompt drafts over time
- run quick tests, compare runs, and batch experiments
- validate structured outputs
- review quality manually and automatically
- promote strong prompt versions into a reusable internal library
- manage workspace-level model connections, permissions, and audit visibility
The result is a portfolio project that behaves much closer to a real internal product than a typical AI demo.
Evala is easiest to understand through concrete internal workflows:
An operations team defines a task for incoming support tickets, stores representative examples as scenarios, iterates on classification prompts, and compares prompt versions before promoting the most reliable one into the shared library.
A service team creates a prompt workflow that turns long customer threads into short internal summaries, tests tone and structure on real examples, and keeps an experiment trail that shows which prompt version actually improved clarity.
A communications or back-office team drafts prompts that rewrite rough internal text into a more consistent business tone, reviews outputs manually, and keeps approved versions reusable across the workspace instead of rewriting the same prompt from scratch.
The strongest seeded example is a customer support email summarization workflow that starts with a weak baseline prompt and ends with a reusable handoff prompt promoted into the internal library.
- full write-up:
docs/case-study.md - quality lift in the seeded compare run: average score improved from
3.0to4.5 - business outcome: faster support triage, clearer urgency handling, and more consistent ownership
flowchart LR
UI[Vue 3 + Inertia UI] --> HTTP[Laravel controllers + form requests]
HTTP --> Services[Workflow services]
Services --> DB[(Workspace data)]
Services --> Jobs[Queued experiment jobs]
Jobs --> Providers[LLM providers]
Jobs --> Gepa[GEPA optimizer]
Jobs --> Events[Reverb updates]
Events --> UI
This is a workflow-driven Laravel app rather than a thin wrapper around one model API. A more detailed breakdown lives in docs/architecture.md.
Key engineering decisions:
- experiments are created first and executed asynchronously after commit
- models are validated against a workspace-specific whitelist on the server
- realtime channels are authorized with workspace membership rules
- experiment retries are limited to transient upstream failures
- evaluation data is stored next to prompt versions and experiment runs
- provider integrations live behind contracts instead of leaking into controllers
- optimization still returns to a human review flow
On a fresh clone, the fastest way to see the project working is:
php scripts/demo.phpWhat this command does:
- installs Composer dependencies if
vendor/is missing - installs frontend dependencies if
node_modules/is missing - creates
.envfrom.env.example - switches a fresh local install to SQLite demo defaults
- runs migrations and seeds the workspace demo data
- builds production assets
- starts the Laravel server, queue worker, and Reverb websocket server
The local app opens at http://127.0.0.1:8000.
Useful variants:
php scripts/demo.php --setup-onlyto prepare the project without starting servicesphp scripts/demo.php --run-onlyto start the services after setupphp scripts/demo.php --with-gepato also install the local GEPA optimization runtime
Evala is built around the idea that AI work inside a company should be:
- structured instead of ad hoc
- testable instead of intuitive-only
- visible instead of hidden in chat history
- reusable instead of repeatedly reinvented
In the UI, business workflows are called Tasks. In the backend model, they are stored as UseCase.
Each task can have:
- scenarios with expected output
- prompts
- multiple prompt versions
- experiment history
- evaluations
- best-performing prompt signals
- Create prompts with an initial version in one request
- Maintain prompt history with change summaries and model preferences
- Run quick draft tests without creating a full experiment
- Promote approved prompt versions into a shared prompt library
singleexperiments for one prompt on one inputcompareexperiments for multiple prompt versions on the same inputbatchexperiments across saved scenarios- queued execution with progress tracking and retry classification
- realtime experiment updates via Laravel Reverb
- manual review with clarity, correctness, completeness, tone, and hallucination risk
- structured JSON output validation
- automatic checks against expected text fragments and JSON subsets
- analytics summaries for prompts, models, and use cases
- multi-workspace structure
- team switching and role-based access
- workspace-scoped AI connection management
- audit visibility for important actions
- start a prompt optimization run from a saved prompt version
- reuse eligible scenarios as train/validation examples
- run a GEPA-backed optimization job
- create a derived prompt draft from the best candidate
Most prompt work inside teams breaks down quickly:
- prompts live in chat history
- nobody remembers which version actually worked
- experiments are repeated manually
- outputs are hard to compare
- business stakeholders cannot see what improved and why
Evala turns that into a proper internal workflow. It is meant to feel like the kind of AI tool a digital unit or product team could actually use for experimentation, demos, and internal learning.
Start from a business task, not a model picker. The task defines the context, goal, and test data before prompt iteration begins.
Prompts can evolve through multiple saved versions with explicit metadata, notes, and a preferred model.
Instead of guessing whether a prompt is better, users can run structured experiments and compare outputs directly.
Good prompt versions are not just "saved". They are reviewed, scored, and then moved into a safer reuse layer through the library.
Prompt optimization is treated as another workflow step, not magic. It starts from a real prompt version and real scenarios, then produces a derived draft that can still be reviewed by a human.
The seeded examples are intentionally business-facing:
- Customer Email Summarization
- Ticket Categorization
- Rewrite for Business Tone
- Meeting Note Summarization
These scenarios make the system easier to demo to both technical and non-technical audiences.
- PHP 8.2+
- Laravel 12
- MariaDB / MySQL
- Laravel Reverb
- queued jobs for experiment processing
- Vue 3
- Inertia.js
- Vite
- Tailwind CSS
- Blade app shell
- mock provider for local development
- OpenAI-compatible provider integration
- Python-backed GEPA runtime for prompt optimization
- Repository folder name:
PromptFactory - Product name in the app:
Evala - Additional planning notes:
PLAN.md - Architecture notes:
docs/architecture.md - UX flow reference:
docs/user-life-cycle-map.md
- PHP 8.2+ with SQLite support for the default local demo
- Composer
- Node.js 20+ and npm
- optional MySQL / MariaDB if you do not want to use the SQLite demo database
- optional Playwright Chromium if you want to regenerate the README screenshots
- optional Python 3.11+ only if you want to run the local GEPA optimization runtime
php scripts/demo.phpThis is the intended portfolio demo path. It uses SQLite by default on a fresh install, seeds the workspace with business-facing examples, builds the frontend, and starts the app.
composer setupThat command prepares dependencies, .env, database schema, seeded data, and built assets without launching the long-running processes.
If you want to manage the environment yourself:
composer install
npm ci
cp .env.example .env
php artisan key:generate
php artisan migrate --seed
npm run buildStandard local loop:
composer run devRealtime experiment updates:
php artisan reverb:startThe demo bootstrap uses built assets. For active frontend work, use composer run dev instead of the one-command demo launcher.
showcase@evala.local/passwordadmin@promptlab.local/passwordteam@promptlab.local/passwordunverified@evala.local/password
The seeded invitation flow is available at /join/evala-showcase-invite.
php artisan testYou can also use:
composer testnpm run buildnpm run ui:screenshotsEvala can regenerate its GitHub screenshots directly from the browser UI.
npm run build
npm run ui:screenshots:install
npm run ui:screenshotsDefault behavior:
- reads
APP_URLfrom.envor.env.example - signs in with the seeded
showcase@evala.localandunverified@evala.localaccounts where needed - uses the seeded
evala-showcase-invitetoken for invitation flows - resolves the seeded customer-support showcase flow from the app API
- captures guest auth, invitation, verification, workspace, prompt, experiment, library, admin, and account pages
- publishes both light and dark theme galleries
- stores a timestamped run in
interface-screenshots-auto - archives the previous
docs/screenshots/latestbundle intodocs/screenshots/archivebefore replacing it - keeps
docs/screenshots/archiveout of the public repository via.gitignore - republishes the current README gallery into
docs/screenshots/latest/lightanddocs/screenshots/latest/dark
Useful overrides:
SCREENSHOT_BASE_URLSCREENSHOT_AUTH_EMAILSCREENSHOT_AUTH_PASSWORDSCREENSHOT_UNVERIFIED_EMAILSCREENSHOT_UNVERIFIED_PASSWORDSCREENSHOT_INVITATION_TOKENSCREENSHOT_VIEWPORTSCREENSHOT_OUTPUT_DIRSCREENSHOT_PUBLISH_DIR
Possible next steps:
- richer automatic evaluation heuristics
- prompt diffs between versions
- CSV dataset import
- comments around experiments and approvals
- additional provider integrations
- exportable experiment history
- stronger approval policies
This project is shared as a portfolio and internal-tool showcase built on top of the Laravel ecosystem.





































































