3 Repos
Converting diverse dialogue and QA datasets into a consistent multi-turn conversation structure.
Distinct from Data Format Converters: Distinct from tensor or trace normalization; focuses on natural language dialogue structure.
Explore 3 awesome GitHub repositories matching data & databases · Conversation Format Normalization. Refine with filters or upvote what's useful.
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m
Applies templates and control tokens to structure multi-turn dialogues for consistent model interaction.
AdalFlow is an autonomous AI agent framework and LLM application library designed for building modular workflows. It serves as a model-agnostic interface and RAG pipeline orchestrator, allowing users to develop ReAct agents that utilize iterative reasoning and external tool execution to solve complex tasks. The project distinguishes itself through a prompt optimization system that uses textual gradient descent to automatically refine prompt templates and few-shot examples. It treats model feedback as a differentiable signal, enabling a form of LLM backpropagation to iteratively improve output
Converts raw chat completion streams into a standardized format for consistent event handling.
MNBVC is a dataset pipeline and toolkit designed for the collection, cleaning, and normalization of massive text and code corpora used to train large language models. It provides specialized tools for harvesting source code, commit histories, and repository metadata from version control platforms, alongside a multilingual text corpus collector for gathering parallel text and academic papers. The project distinguishes itself through comprehensive capabilities for processing diverse document types, including a PDF-to-text converter that transforms complex layouts and formulas into structured JS
Converts specialized test data into a consistent multi-turn conversation format.