A Streamlit application that combines video analysis and web search capabilities using Google's Gemini 2.5 model. This agent can analyze uploaded videos and answer questions by combining visual understanding with web-search.
- Video analysis using Gemini 2.5 Flash/Pro
- Web research integration via DuckDuckGo
- Support for multiple video formats (MP4, MOV, AVI)
- Real-time video processing
- Combined visual and textual analysis
- Clone the GitHub repository
git clone https://github.com/Hasan8123/Multimodal-AI-Agent.git
cd Multimodal AI Agent- Install the required dependencies:
pip install -r requirements.txt- Get your Google Gemini API Key
- Sign up for an Google AI Studio account and obtain your API key.
- Set up your Gemini API Key as the environment variable
GOOGLE_API_KEY= your_api_key_here- Run the Streamlit App
streamlit run multimodal_agent.py