VisionXai is an advanced AI-powered image analysis platform that combines Google's Gemini models with intelligent web search capabilities to provide comprehensive visual content understanding and interactive Q&A functionality.
VisionXai integrates cutting-edge machine learning models with a modern web application architecture, featuring a responsive Angular frontend, a robust FastAPI backend, and an advanced LangGraph-based LLM processing pipeline. The system enables users to upload images and receive detailed AI-generated analysis, enhanced with real-time web search when additional context is needed.
Contains the core AI/ML logic with LangGraph workflow orchestration:
- Advanced image analysis using Google Gemini 2.5 Flash
- Intelligent web search integration with Tavily API
- Stateful conversation management with memory persistence
- Bounding box detection and visual analysis
- Modular graph-based workflow architecture
Modern Angular application with server-side rendering:
- Responsive design with Tailwind CSS
- Real-time streaming response support
- Dynamic state management
- Nx monorepo tooling for build optimization
- Jest testing framework
FastAPI-based REST API service:
- Image processing and base64 encoding
- LangGraph workflow execution
- CORS-enabled cross-origin support
- Asynchronous streaming endpoints
- Integration with LLM analysis pipeline
- Python 3.8 or higher
- Node.js 18 or higher and npm
- Google API Key for Gemini models
- Tavily API Key for web search functionality
-
Clone the repository:
git clone https://github.com/manu042k/VisionXai.git cd VisionXai -
Set up environment variables:
Create a
.envfile in bothBackendandLLMdirectories:GOOGLE_API_KEY=your_google_api_key_here TAVILY_API_KEY=your_tavily_api_key_here
-
Install Backend dependencies:
cd Backend python -m venv env source env/bin/activate # On Windows: env\Scripts\activate pip install -r requirements.txt
-
Install Frontend dependencies:
cd Frontend npm install
Navigate to the Backend directory and start the FastAPI server:
cd Backend
source env/bin/activate # On Windows: env\Scripts\activate
uvicorn app.main:app --reloadThe backend API will be available at http://localhost:8000
Navigate to the Frontend directory and start the development server:
cd Frontend
npx nx serve FrontendThe frontend application will be available at http://localhost:4200
Run tests using Jest:
cd Frontend
npx nx test FrontendRun example scripts to test the image analysis pipeline:
cd LLM/examples
python run_analyzer.pyThe ImageAnalyzer class provides sophisticated visual understanding capabilities:
- Multi-turn Conversation Memory: Maintains context across multiple queries about the same image
- Bounding Box Detection: Identifies and analyzes specific regions within images
- Intelligent Search Integration: Automatically determines when web search is needed for accurate responses
- Comprehensive Summarization: Generates detailed descriptions with proper citations
- Streaming Support: Real-time response generation for enhanced user experience
The system uses LangGraph to orchestrate a stateful workflow:
- Memory Loading & Image Encoding: Retrieves conversation history and processes base64 image data
- Initial Query Analysis: Examines the user's question to understand intent
- Bounding Box Detection: Identifies objects and regions of interest
- Search Decision: Determines if external context is required
- Web Search: Fetches relevant information using Tavily API (when needed)
- Final Analysis: Generates comprehensive response incorporating all available context
- Memory Persistence: Stores conversation for future interactions
GET /: Health check endpointPOST /chat/: Synchronous image analysis endpointPOST /chat/stream/: Streaming image analysis endpoint
- FastAPI for REST API framework
- LangChain and LangGraph for LLM orchestration
- Google Generative AI (Gemini 2.5 Flash)
- Tavily Python SDK for web search
- Uvicorn ASGI server
- Angular 17+ with TypeScript
- Tailwind CSS for styling
- Nx for monorepo management
- Jest for unit testing
- Server-side rendering support
- Google Gemini 2.5 Flash for vision and language
- LangGraph for stateful workflow management
- Memory persistence with MemorySaver
The backend uses a modular architecture with clear separation of concerns:
app/main.py: FastAPI application and route definitionsapp/models.py: Pydantic models for request/response validationapp/config.py: Environment configuration loadingapp/graphs/image_analyzer.py: Core LangGraph workflow implementationapp/core/states.py: TypedDict state definitionsapp/tools/web_search.py: Tavily search integration
The frontend follows Angular best practices with Nx tooling:
- Component-based architecture
- Reactive state management
- Lazy loading for performance
- Responsive design patterns
- Comprehensive test coverage
The user interface features an intuitive design with seamless AI integration, allowing users to upload images and interact through natural language queries.
This demonstration shows the complete workflow from image upload to AI-powered analysis with integrated web search capabilities, highlighting the real-time response streaming.

