This project is a transformer-based framework for generating dense and sparse vector embeddings of text and multimodal data. It serves as a library for fine-tuning models to perform semantic similarity tasks, retrieval, and reranking.
The system is distinguished by its support for diverse architectural patterns, including bi-encoders for fast similarity search and cross-encoders for high-precision reranking. It provides dedicated pipelines for multimodal embeddings, mapping text and images into a shared vector space, and implements knowledge distillation to compress large models into smaller, lower-latency versions.
The framework covers a broad range of capabilities including model training and optimization, semantic search execution, and text analysis. It includes tools for contrastive-loss training, negative mining, and multilingual model extensions, as well as utilities for semantic clustering, paraphrase identification, and extractive summarization.
Users can publish trained weights and configurations to a central model hub for versioning and sharing.