-
-
Notifications
You must be signed in to change notification settings - Fork 16
[IMPROVE] Embedding model encoding time for search and file upload #441
Copy link
Copy link
Open
Description
Title: [IMPROVE] Embedding model encoding time for search and file upload
Background
Search and file uploading call TransformerModel.get_instance() directly at request time
Current State
balancer-main/server/api/services/sentencetTransformer_model.py
Lines 1 to 22 in 75c1a14
| from sentence_transformers import SentenceTransformer | |
| import logging | |
| logger = logging.getLogger(__name__) | |
| class TransformerModel: | |
| _instance = None | |
| def __new__(cls): | |
| if cls._instance is None: | |
| logger.info("Loading SentenceTransformer model") | |
| cls._instance = super(TransformerModel, cls).__new__(cls) | |
| cls._instance.model = SentenceTransformer( | |
| 'paraphrase-MiniLM-L6-v2') | |
| return cls._instance | |
| @classmethod | |
| def get_instance(cls): | |
| if cls._instance is None: | |
| cls._instance = cls() | |
| return cls._instance |
Acceptance Criteria
- The model is warm before the first search or upload request hits
Approach
Preload SentenceTransformer model at Django startup before traffic is routed to the application instance
Add tests for the embeddings services by pulling apart the core logic to make testing easier
References
Risks and Rollback
Fall back to lazy load using try except block if preloading embedding model at startup fails
Screenshots / Recordings
Related PR
Balancer PR #461
Reactions are currently unavailable