This directory contains scripts for generating embeddings from various multi-modal models. These embeddings are used for downstream retrieval tasks.
Generates CLIP embeddings for images or text.
python clip_model.py --modality [text|image] --dataset [coco|flickr]Generates FLAVA embeddings for images, text, or both.
python flava_model.py --dataset [coco|flickr] --modality [image|text|both] --batch-img [int] --batch-text [int]Generates MiniLM embeddings (text only).
python miniLM_model.py --modality text --dataset [coco|flickr]Runs PreFLMR indexing/embedding generation.
python test_preflmr.py --dataset [coco|flickr] --checkpoint [path] --image-processor [path] --index-root [path] --experiment [name] --index-name [name] --nbits [int] --doc-maxlen [int] --use-gpuGenerates UniIR embeddings with support for different variants (CLIP-SF, BLIP-FF).
python uniir_model.py --dataset [coco|flickr] --variant [clip_sf|blip_ff] --modality [all|image|text|joint] --batch-img [int] --batch-text [int] --fp16 --w3 [float] --w4 [float] --suffix [str]Newer version of UniIR embedding generation.
python uniir_model_new.py --dataset [coco|flickr] --modality [all|image|text|joint] --batch-img [int] --batch-text [int] --fp16 --w3 [float] --w4 [float]