Why is opendataloader-project/opendataloader-pdf a recommended Document Splitters GitHub Repositories repository?

Functions as a document loader that integrates structured PDF content into the LangChain orchestration framework.

Why is mikefarah/yq a recommended Document Splitters GitHub Repositories repository?

Separates individual items from a collection into distinct documents within a single output stream.

Why is tmc/langchaingo a recommended Document Splitters GitHub Repositories repository?

Ships a pipeline of loaders and text splitters to transform diverse file formats into chunked data.

3 مستودعات

Awesome GitHub RepositoriesDocument Splitters

Tools for separating collection items into distinct documents within a stream.

Distinct from Document Subscriptions: None of the candidates were relevant; this is a core data processing function.

Explore 3 awesome GitHub repositories matching data & databases · Document Splitters. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

opendataloader-project/opendataloader-pdf
opendataloader-project/opendataloader-pdf
25,769عرض على GitHub
This project is a PDF data extraction tool and document preprocessor designed to convert PDF files into structured formats such as Markdown, JSON, and HTML. It functions as an OCR document parser for scanned files, an accessibility automator for generating PDF/UA compliant metadata, and a loader for AI orchestration frameworks like LangChain. The software distinguishes itself through specialized handling of complex document elements, including the conversion of mathematical formulas into LaTeX and the generation of natural-language descriptions for charts and images. It utilizes recursive seg
Functions as a document loader that integrates structured PDF content into the LangChain orchestration framework.
Javaa11yaccessibilityai
عرض على GitHub25,769
mikefarah/yq
mikefarah/yq
14,913عرض على GitHub
This tool is a command-line processor designed for querying, updating, and transforming structured data files. It functions as a versatile engine for manipulating YAML, JSON, TOML, and XML documents, allowing users to perform complex operations directly from the terminal. By utilizing a path-based expression language, it enables precise navigation and modification of data structures within configuration files and infrastructure-as-code workflows. What distinguishes this tool is its ability to perform in-place document mutations while preserving original formatting, comments, and metadata. It
Separates individual items from a collection into distinct documents within a single output stream.
Gobashclicsv
عرض على GitHub14,913
tmc/langchaingo
tmc/langchaingo
9,416عرض على GitHub
langchaingo is an LLM application framework for Go designed for building language model-powered applications and autonomous agents. It serves as an orchestration library and tool integration framework that allows developers to link prompt sequences and model calls into complex, multi-step workflows. The project provides a toolkit for implementing retrieval-augmented generation pipelines by processing unstructured documents and retrieving relevant context via vector search. It includes a dedicated integration layer for indexing high-dimensional embeddings and performing similarity searches acr
Ships a pipeline of loaders and text splitters to transform diverse file formats into chunked data.
Go
عرض على GitHub9,416

Awesome Document Splitters GitHub Repositories

opendataloader-project/opendataloader-pdf

mikefarah/yq

tmc/langchaingo

استكشف الوسوم الفرعية