# microsoft/data-formulator

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/microsoft-data-formulator).**

14,907 stars · 1,357 forks · TypeScript · mit

## Links

- GitHub: https://github.com/microsoft/data-formulator
- Homepage: https://arxiv.org/abs/2408.16119
- awesome-repositories: https://awesome-repositories.com/repository/microsoft-data-formulator.md

## Description

Data Formulator is an automated data analysis and visualization platform that uses large language models to interpret natural language instructions for data preparation and reporting. It functions as an interactive workbench where users can clean, filter, and aggregate datasets while simultaneously generating visual representations. By combining conversational interfaces with automated transformation tools, the system enables users to explore data patterns and refine schemas without manual coding.

The platform distinguishes itself through an agentic architecture that translates natural language queries into executable data transformation scripts. It maintains a reactive pipeline that links data cleaning operations directly to visualization rendering, ensuring that every modification to the underlying structure triggers an immediate visual update. The system also supports structured data extraction, utilizing specialized parsing models to convert unstructured inputs like images, text, and web content into normalized tabular formats.

Beyond its core analysis capabilities, the platform provides a sandboxed environment for secure code execution and supports stateful session serialization to persist interaction history. Users can connect to various data sources, including local files and cloud storage, to ingest information for iterative exploration. The project is distributed as a TypeScript-based tool, offering both a conversational interface and command-line automation for managing analysis workflows.

## Tags

### Data & Databases

- [Data Transformation Tools](https://awesome-repositories.com/f/data-databases/data-transformation-tools.md) — Provides a platform for cleaning and shaping datasets while simultaneously generating interactive charts through natural language instructions and direct manipulation.
- [Data Analysis Environments](https://awesome-repositories.com/f/data-databases/data-analysis-environments.md) — Provides an environment that uses large language models to interpret user queries for iterative data preparation and visual representation tasks.
- [Data Visualization](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/visualization-frameworks-libraries/data-visualization.md) — Generates and refines interactive charts and reports by interpreting user instructions to select the best visual representation for data.
- [Conversational Data Exploration](https://awesome-repositories.com/f/data-databases/data-collections-datasets/conversational-data-exploration.md) — Analyzes datasets through natural language queries to iteratively refine insights and explore data patterns without writing complex code. ([source](https://cdn.jsdelivr.net/gh/microsoft/data-formulator@main/README.md))
- [Analysis Interfaces](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/document-processing-tools/llm-powered-parsers/analysis-interfaces.md) — Uses large language models to interpret natural language instructions for automated data preparation and reporting.
- [Automated Data Extraction](https://awesome-repositories.com/f/data-databases/automated-data-extraction.md) — Employs parsing models to convert unstructured inputs like images or text into normalized tabular formats for analysis.
- [Natural Language Querying](https://awesome-repositories.com/f/data-databases/data-visualization-charts/natural-language-querying.md) — Creates and refines interactive charts and reports by interpreting user instructions to select appropriate chart types and apply stylistic adjustments. ([source](https://cdn.jsdelivr.net/gh/microsoft/data-formulator@main/README.md))
- [Data Visualization Platforms](https://awesome-repositories.com/f/data-databases/data-visualization-platforms.md) — Combines natural language instructions with automated data transformation to clean datasets and generate interactive charts.
- [Data Cleaning Utilities](https://awesome-repositories.com/f/data-databases/data-cleaning-utilities.md) — Prepares raw datasets for analysis by filtering, aggregating, and modifying schemas through a combination of manual and automated tools.
- [Structured Data Extraction](https://awesome-repositories.com/f/data-databases/structured-data-extraction.md) — Parses and converts information from screenshots, websites, text blocks, and images into clean datasets using automated agents. ([source](https://cdn.jsdelivr.net/gh/microsoft/data-formulator@main/README.md))
- [Data Source Connections](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/data-integration/data-source-connections.md) — Establishes reusable connections to databases, warehouses, cloud storage, and local files to ingest structured data for analysis. ([source](https://cdn.jsdelivr.net/gh/microsoft/data-formulator@main/README.md))
- [Data Parsing and Extraction](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-transformation/data-parsing-extraction.md) — Employs specialized parsing models to convert unstructured inputs like images or text into normalized tabular formats for downstream analysis.

### Artificial Intelligence & ML

- [Interactive Workbenches](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/data-ingestion-preparation/data-preparation-tools/interactive-workbenches.md) — Provides an interactive workspace for filtering, aggregating, and modifying data schemas with immediate visual feedback on charts and graphs.
- [Data Preparation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/data-ingestion-preparation/data-preparation-tools.md) — Cleans and shapes raw datasets by filtering, aggregating, and modifying schemas with immediate visual feedback on the resulting charts.

### Part of an Awesome List

- [Data Science Agents](https://awesome-repositories.com/f/awesome-lists/ai/data-science-agents.md) — AI-assisted tool for iterative data transformation and visualization.
- [Visualization and Analysis](https://awesome-repositories.com/f/awesome-lists/ai/visualization-and-analysis.md) — AI-assisted tool for iterative data transformation and visualization.

### Development Tools & Productivity

- [LLM-Driven](https://awesome-repositories.com/f/development-tools-productivity/project-scaffolding-config-code-generation/code-generation/llm-driven.md) — Translates natural language instructions into executable data transformation scripts that manipulate datasets and generate visual specifications.

### Graphics & Multimedia

- [Declarative Visualization Grammars](https://awesome-repositories.com/f/graphics-multimedia/visualization-mapping/declarative-visualization-grammars.md) — Uses a structured schema to define chart properties and mappings, allowing the system to programmatically adjust visual outputs based on data changes.

### Security & Cryptography

- [Agentic Session Persistence](https://awesome-repositories.com/f/security-cryptography/identity-access-management/session-management/stateful-session-persistence/agentic-session-persistence.md) — Saves workspaces and analysis threads across restarts to help users revisit, compare, and continue previous data exploration tasks. ([source](https://cdn.jsdelivr.net/gh/microsoft/data-formulator@main/README.md))

### Software Engineering & Architecture

- [Reactive Data Stores](https://awesome-repositories.com/f/software-engineering-architecture/architectural-design-patterns/state-management/reactive-subscription-systems/reactive-data-stores.md) — Links data cleaning operations directly to visualization rendering so that every modification to the underlying structure triggers an immediate visual update.

### Programming Languages & Runtimes

- [Sandboxed Code Execution Environments](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/runtime-environments/runtimes/sandboxed-code-execution-environments.md) — Runs generated data processing code in an isolated container to ensure system security while performing complex transformations on user datasets.
