This project is a framework for training and deploying transformer-based models that map text, images, audio, and video into dense or sparse vector representations. It functions as a multimodal embedding library and semantic search engine used to retrieve relevant documents by calculating vector similarity between meanings. The framework provides specialized tools for both cross-encoder reranking, which calculates precise similarity scores to refine search results, and vector quantization to compress embedding vectors for reduced memory usage and increased retrieval speed. The project covers