Skip to content

Hasan8123/Multimodal-AI-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Multimodal AI Agent

A Streamlit application that combines video analysis and web search capabilities using Google's Gemini 2.5 model. This agent can analyze uploaded videos and answer questions by combining visual understanding with web-search.

Features

  • Video analysis using Gemini 2.5 Flash/Pro
  • Web research integration via DuckDuckGo
  • Support for multiple video formats (MP4, MOV, AVI)
  • Real-time video processing
  • Combined visual and textual analysis

How to get Started?

  1. Clone the GitHub repository
git clone https://github.com/Hasan8123/Multimodal-AI-Agent.git
cd Multimodal AI Agent
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Get your Google Gemini API Key
  1. Set up your Gemini API Key as the environment variable
GOOGLE_API_KEY= your_api_key_here
  1. Run the Streamlit App
streamlit run multimodal_agent.py

About

This Multimodal AI Agent is a Streamlit application that uses Gemini 2.0 Flash to analyze video content alongside real-time web research. It enables users to upload videos and receive comprehensive, data-driven answers by synthesizing visual insights with live information from the internet.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages