LEANN is a framework for local retrieval augmented generation and vector indexing. It functions as a system for building local knowledge bases and source code search engines that combine large language models with retrieved private data to generate context-aware responses.
The project distinguishes itself through a vision-model based document layout extractor for parsing complex PDF figures and diagrams, and a source code search engine that employs structure-aware chunking to preserve function and class boundaries. It also implements the Model Context Protocol to integrate real-time data sources into the retrieval pipeline.
The system provides hybrid information retrieval combining semantic search, exact keyword matching, and boolean metadata filtering. It supports the indexing of diverse data sources, including web browsing history, communication logs, and technical documentation.