ParlAI

ParlAI is a conversational AI research framework designed for training, evaluating, and sharing dialogue models using a unified interface for datasets and agents. It functions as a PyTorch-based training platform and a dialogue data collection system, providing a centralized model zoo for the distribution of versioned pretrained agents.

The project distinguishes itself through a knowledge-grounded retrieval system that combines dense and sparse indexing to ground responses in external information. It also provides a comprehensive infrastructure for gathering human-AI interaction data via integrated crowdsourcing workflows, comparative evaluations, and human-model chat facilitation.

The framework covers a broad range of capabilities, including multimodal dialogue development for visual content, safety classification for toxicity detection, and complex model evaluation through self-chat simulations. It supports diverse data management tasks such as disk-based dataset streaming, multi-task weighted sampling, and the implementation of custom teacher agents.

The system is implemented in Python and utilizes a centralized registry to manage pretrained model checkpoints and metadata.

Features

Conversational AI Frameworks - Provides a comprehensive framework for building, training, and evaluating interactive conversational AI models.

Conversational Model Training - Provides a comprehensive platform for training dialogue agents on specified datasets with configurable hyperparameters.

Sparse-Dense Hybrid Retrievers - Combines sparse TF-IDF and dense FAISS indexes to ground conversational responses in external knowledge sources.

Conversational Response Generation - Produces AI-driven conversational responses using sequence-to-sequence generators.

Long-term Memory Stores - Implements persistent storage mechanisms to retain personal knowledge and conversation history across multiple dialogue turns.

Response Grounding - Synthesizes responses grounded in verified facts retrieved from external knowledge bases.

Hybrid Retrieval Engines - Combines FAISS and TF-IDF indexing to ground conversational responses in external information sources.

Agent Class Abstractions - Enables the creation of custom models by inheriting base agent classes and defining specific training and evaluation logic.

Pretrained Agent Execution - Enables the execution of saved agent models within target environments to observe and evaluate behavioral performance.

Streaming Dataset Loaders - Streams large dialogue datasets from disk in chunks to maintain memory efficiency during training.

Generative Model Training Tools - Provides a command-line interface for training and fine-tuning generative models on custom datasets.

Pre-trained Model Zoos - Maintains a centralized repository of pretrained neural network architectures ready for deployment or fine-tuning.

Model Registries - Provides a centralized registry for distributing pretrained models with versioned checkpoints and metadata.

Dialogue Task Abstractions - Defines dialogue tasks as teacher agents that provide observations and labels through a standardized interface.

Pretrained Model Zoos - Deno AI provides programmatic access to download and load pretrained dialogue models from a shared repository.

Multi-Task Samplings - Supports training and evaluation across multiple datasets by sampling tasks according to configurable weights.

PyTorch Training Loops - Implements PyTorch-based training loops for conversational agents with support for multi-task weighted sampling.

Model Evaluation and Benchmarking - Offers integrated tools for measuring agent performance through automated benchmarks and human-in-the-loop evaluation workflows.

Model Fine-Tuning - Optimizes pretrained transformer models for new dialogue tasks by initializing from stored zoo checkpoints.

Dialogue Data Collectors - Provides a comprehensive system for gathering human-AI interaction data via crowdsourcing and interactive chat interfaces.

Unified Data Access Interfaces - Provides a unified interface to load and stream diverse conversational datasets across different sources.

Custom Teacher Agents - Provides an abstraction for teacher agents that format and serve dataset examples to student models.

Dialogue Episode Loops - Manages multi-turn conversation flows using a world object to control turn order and termination.

Persona-Conditioned Generation - Generates dialogue responses conditioned on specific persona descriptions to maintain character consistency.

Conversational Model Benchmarking - Measures the quality of conversational models against specific validation datasets to assess performance.

Conversational AI Frameworks - Serves as a research framework for training, evaluating, and sharing neural dialogue models using a unified interface.

AI Safety Guardrails - Detects toxic or unsafe content in single-turn and multi-turn conversations using pretrained safety classifiers.

Crowdsourcing Submission Approvers - Programmatically approves or rejects completed crowdsourcing tasks based on specific completion criteria.

Baseline Model Architectures - Runs pre-built dialogue models or human baselines to establish performance benchmarks for comparison.

Batched Response Generation - Produces multiple model completions for fixed utterances by pairing models with datasets and logging the results.

Command-Line - Provides a terminal-based conversational interface for interacting with trained AI models in real time.

Retrieval-Augmented Generation - Combines a DPR retriever with a BART generator to produce responses grounded in Wikipedia passages.

Crowdsourced Dialogue Collection - Collects natural language dialogue data from human workers through custom tasks deployed on external platforms.

Pretrained Model Snapshots - Maintains a centralized model zoo providing versioned pretrained checkpoints for immediate deployment and benchmarking.

Automated Dataset Evaluation - Computes standard performance metrics on a held-out dataset to measure model quality after training.

Dialogue Speaker Identification - Predicts which character spoke a given utterance based on conversation history and character profiles.

Dynamic Dialogue Logic - Enables the creation of custom agents that adjust responses based on real-time input instead of fixed logs.

Encoder-Decoder Architectures - Implements neural network designs that map input sequences to output sequences via intermediate representations using LSTM-based encoders and decoders.

External LLM API Wrappers - Integrates external GPT-3 APIs as wrapper agents for dialogue reply generation.

External Language Model Wrappers - Wraps external language model APIs as drop-in agents to integrate them into the research framework.

Crowdsourced Model Evaluators - Integrates dialogue models with external crowdsourcing platforms to collect human judgments.

Human Agent Connectors - Connects human agents to dialogue worlds via messaging platforms to facilitate data collection.

Human-Human Dialogue Collectors - Sets up multi-turn conversations between two human participants to collect natural dialogue data.

Image-Grounded Dialogue Generators - Generates conversational responses conditioned on image inputs for multimodal dialogue.

Interactive AI Conversations - Gathers dialogue data from human participants interacting with a model during a task.

Agentic Interaction Patterns - Defines interaction loops and environments to manage how multiple agents exchange messages in sequences or batches.

Beam Search Implementations - Implements beam search decoding with configurable beam size and n-gram blocking for sequence generation.

Model Bias Mitigation - Implements specialized training techniques to prevent dialogue models from defaulting to societal or gender biases.

Autoregressive Model Interfaces - Provides a standard interface for autoregressive models to automate their training and evaluation processes.

Dialogue Consistency Optimization - Uses specialized training methods to reduce contradictions and improve the coherence of chatbot responses.

Model Self-Chat Generation - Simulates conversations between two model instances and logs the dialogue for analysis.

Persona-Conditioned Dialogue - Loads and executes dialogue datasets where conversations are conditioned on specific user profiles or personas.

Word Embeddings - Incorporates pre-trained vectors from Fasttext or GLOVe to initialize model embeddings for improved linguistic representation.

Response Ranking Logic - Ranks candidate replies to select the most appropriate response for a given conversational context.

Conversation Comparators - Facilitates side-by-side comparison of two full conversations for human workers to select the superior response.

Task-Oriented Dialogue Simulation - Simulates goal-directed conversations by interacting with API schemas to resolve specific user requests.

Training and Testing Splits - Implements dataset partitioning into training, validation, and testing sets via file path suffixes.

Vocabulary Generators - Generates token-to-index mappings from task text data for model vectorization.

Agent Behavior Simulation - Simulates conversations between agents using specific models and personas to evaluate their conversational behavior.

Human-Model Chat Evaluators - Facilitates interactive sessions where humans converse with model agents and annotate responses.

Multimodal Dialogue and Interaction - Supports the development of conversational agents capable of processing and generating responses grounded in visual content.

Model Card Generation - Produces structured documentation describing a model's intended use, training data, and evaluation results via standardized model cards.

Conversation Annotators - Presents pre-recorded conversations to humans for annotating speaker responses using checkboxes.

Dataset Loading - Provides specialized mechanisms for loading conversational datasets from JSON files into the research framework.

Crowdsourcing Task Builders - Enables the creation of custom dialogue logic and worker onboarding by subclassing base world and blueprint classes.

Conversational Task Definitions - Defines new conversational tasks by implementing dedicated data building scripts and teacher agents.

Worker Qualification Management - Filters crowdsourced participants through onboarding stages and qualification checks to ensure data quality.

Crowdsourcing Task Blueprints - Implements a YAML-configurable overworld-subworld pattern for defining multi-stage human evaluation tasks.

Interactive Model Interfaces - Provides a live web-based interface to send messages to trained models and inspect generated responses and metadata.

facebookresearchParlAIArchived

Features

Star history