Skip to content

saadkh1/Textual-and-Visual-Image-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

CLIP-Based Image and Text Retrieval

This project demonstrates how to leverage the CLIP model to encode images and text descriptions into embeddings, enabling efficient retrieval of images based on text queries or other image inputs.

Overview

CLIP (Contrastive Language–Image Pre-Training) is a powerful model developed by OpenAI that creates a shared vector space for images and text. This allows for a variety of tasks such as retrieval (finding the image most similar to a given query) and zero-shot classification (identifying which labels fit an image best).

Key Components

  • Image Encoder: Generates a vector representation from an image.
  • Text Encoder: Generates a vector representation from a text description.

By encoding both images and text into the same vector space, CLIP enables effective similarity comparisons between images and text.

CLIP Illustration

For more information about the CLIP model, please visit the official CLIP repository and the FashionCLIP repository.

Image Retrieval

How Retrieval Works

  1. Encoding the Query: Encode a search query using the FashionCLIP text encoder or an image using the FashionCLIP image encoder.
  2. Finding Similarity: Compare the encoded query or image with image vectors using a dot product. Higher dot product values indicate greater similarity between text and image.

Usage

This notebook is also available on Google Colab. You can access it here.

Training the CLIP Model

For those interested in learning how to train the CLIP model, please refer to my other repository: CLIP Dual Encoder.

Applications

This notebook can be used for various applications, including:

  • Image Retrieval: Find images that match a given text query.
  • Recommendation Systems: Leverage image and text similarity capabilities for personalized recommendations.
  • Zero-Shot Classification: Identify suitable labels for an image without needing a pre-trained classifier for each label.

About

Utilizing CLIP Architecture for Efficient Image Retrieval: Seamlessly Find Similar Images or Retrieve Images from Textual Queries

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors