1 dépôt
Workflows for cleaning, slicing, and normalizing unstructured documents for embedding.
Distinct from Document Stores: Focuses on the preparation and slicing of raw text before storage, distinct from the database storage mechanism itself.
Explore 1 awesome GitHub repository matching data & databases · Document Preprocessing Pipelines. Refine with filters or upvote what's useful.
llm-universe is a structured learning resource and technical guide focused on the development of large language model applications. It serves as a curriculum for mastering model orchestration, the creation of autonomous conversational agents, and the implementation of retrieval-augmented generation systems. The project provides detailed instructions on connecting model APIs with memory and tools to create execution chains. It specifically covers the construction of retrieval pipelines, including the process of cleaning raw documents, generating embeddings, and integrating vector databases to
Provides detailed instructions on cleaning and slicing diverse document types before storing them in vector databases.