Instructor | Awesome Repository

Instructor is a framework designed for structured data extraction, validation, and language model integration. It functions as a library that transforms unstructured text into validated, type-safe objects by leveraging schema definitions and model-specific tool-calling capabilities. By acting as a validation middleware, the project ensures that language model outputs strictly conform to defined data structures.

The library distinguishes itself through a robust validation-based retry loop that automatically re-submits failed responses with error feedback to iteratively correct schema compliance. It provides a provider-agnostic client abstraction that normalizes diverse model interfaces into a unified execution layer, while its schema-driven prompt synthesis automatically generates model instructions by introspecting class definitions and field annotations. Additionally, the framework supports polymorphic schema mapping for complex data structures and enables incremental stream processing to yield validated objects in real-time as they are generated.

Beyond its core extraction capabilities, the project offers a comprehensive suite of tools for managing the full lifecycle of model interactions. This includes support for asynchronous execution, multimodal data processing, and extensive observability features such as token usage tracking and event-driven lifecycle hooks. Developers can also utilize built-in mechanisms for caching, safety management, and automated error recovery to maintain reliable production workflows.

The library is distributed as a Python package and provides a unified interface that extends existing client objects without requiring modifications to their original source code.

Features

Structured Data Extraction - Transforms unstructured text into validated, type-safe objects using schema definitions and model-specific tool-calling capabilities.
Structured Output Parsers - Provides a framework for transforming unstructured text into validated, type-safe objects using schema definitions and automated validation-based retry loops.
LLM Integration Frameworks - Provides a unified interface for interacting with diverse language model providers while maintaining consistent extraction and error handling.
Model Provider Interfaces - Provides a unified client interface to interact with diverse language model providers, allowing developers to switch backends without changing core logic.

Features

Structured Data Extraction - Transforms unstructured text into validated, type-safe objects using schema definitions and model-specific tool-calling capabilities.
Structured Output Parsers - Provides a framework for transforming unstructured text into validated, type-safe objects using schema definitions and automated validation-based retry loops.
LLM Integration Frameworks - Provides a unified interface for interacting with diverse language model providers while maintaining consistent extraction and error handling.
Model Provider Interfaces - Provides a unified client interface to interact with diverse language model providers, allowing developers to switch backends without changing core logic.

The library is distributed as a Python package and provides a unified interface that extends existing client objects without requiring modifications to their original source code.