30 open-source projects similar to microsoft/data-formulator, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Data Formulator alternative.
This project is a Python-based framework that functions as a generative AI agent for programmatic data analysis. It enables users to interact with structured data sources through natural language prompts, translating these requests into executable code to perform analysis, data cleaning, and visualization. By maintaining conversational context across multi-turn interactions, the system allows for iterative exploration and the building of complex data narratives. The framework distinguishes itself through a robust semantic layer and secure execution model. It maps raw datasets to descriptive m
Matplotlib is a Python data visualization library and 2D plotting engine used to generate publication-quality figures and charts from numerical data. It serves as a numerical graphics library and data visualization toolkit for mapping data to visual elements. The library provides capabilities for producing static, animated, and interactive visualizations. This includes creating high-resolution figures for professional documents, generating moving graphics to illustrate data evolution over time, and building dynamic plots for interactive data exploration. The toolkit supports scientific plott
Vega is a reactive visualization engine that translates structured specifications into interactive, browser-based graphical representations. It functions as a declarative grammar for data visualization, allowing users to define complex charts and maps through a JSON-based configuration format rather than imperative code. The system operates on a dataflow-based reactive graph that automatically propagates updates through the visualization whenever input data or user interactions change. By integrating a modular transformation pipeline, the engine handles data filtering, sorting, and aggregatio
Altair is a declarative data visualization library for Python that generates Vega-Lite specifications. It functions as a tool for mapping data to graphical marks using a high-level syntax, allowing users to describe the desired visual outcome instead of writing imperative drawing commands. The framework enables the creation of interactive charts and graphics, including linked views and filtered displays that respond to user input in real time. It supports the design of multi-view dashboards by combining visualizations into layered or faceted layouts. The library provides capabilities for sta
Redash is a self-hosted analytics platform and SQL data visualization tool. It provides a web-based SQL query editor for writing, executing, and scheduling database queries, and functions as a business intelligence dashboard for monitoring metrics via visual widgets. The platform distinguishes itself through its data source connectors, which integrate with various SQL, NoSQL, and API-based stores to retrieve information for analysis. It enables self-service analytics by allowing users to run queries with dynamic parameters and supports shared data reporting via public links or embedded dashbo
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Plotly.py is a comprehensive framework for building production-ready data applications and interactive dashboards directly from Python code. It functions as both a high-performance visualization library for browser-based charts and a full-stack tool for transforming analytical scripts into responsive, web-based interfaces. By abstracting away the need for manual HTML or JavaScript, it allows developers to define complex layouts and functional logic using modular, reusable components. The framework distinguishes itself through a robust architecture that handles event orchestration and state sy
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
🔥 基于大模型和 RAG 的智能问数系统,对话式数据分析神器。Text-to-SQL Generation via LLMs using RAG.
Miller is a command-line data processor used for filtering, transforming, and aggregating name-indexed tabular data. It functions as a tool for querying and reshaping records across multiple file formats, serving as a converter between CSV, JSON, and YAML. The tool distinguishes itself by using a name-indexed data model, allowing users to manipulate fields by name rather than numeric position. It utilizes single-pass streaming algorithms to compute statistics and summaries on large datasets that exceed available system memory. Its capabilities cover data transformation and analysis, includin
Crawlee-python is a web crawling framework for building scalable scrapers using Python. It serves as a comprehensive tool for web scraping automation, providing a system to extract structured data from websites using both lightweight HTTP requests and headless browser automation. The framework is distinguished by its anti-bot evasion capabilities, which include browser fingerprint impersonation and tiered proxy rotation to bypass detection systems and solve challenges such as Cloudflare. It also incorporates artificial intelligence for autonomous website navigation and schema-based data extra
ggplot2 is an R data visualization library and statistical graphics engine. It implements a grammar of graphics that functions as a declarative plotting framework, allowing users to specify what a plot should contain rather than how to draw it. The system builds visualizations by mapping data variables to visual aesthetics through a structured set of layering rules. This approach enables the composition of complex graphics by stacking independent components, such as geometric objects and scales, on top of a shared coordinate system. The framework supports scientific plotting and exploratory
This is a grammar of graphics visualization library used to build charts by mapping tabular data to visual marks. It functions as an SVG data visualization tool and an exploratory data analysis API, allowing users to render complex visualizations and geographic maps. The library features a GeoJSON map renderer that projects spherical coordinates into two-dimensional pixel space and an Apache Arrow visualization interface for high-efficiency data processing. Its capability surface covers data transformation through binning and grouping, visual encoding via automatic scale inference and color
unioffice is a comprehensive document processing suite that provides a PDF document processor, an Open XML document library, a document security toolkit, and a document content extractor. It is designed to programmatically create, read, and modify Word, Excel, and PowerPoint files, as well as generate and edit PDF documents. The project is distinguished by its native language implementation of the Open XML standard, which removes native binary dependencies to simplify container deployments. It features advanced capabilities for digital document security, including hardware-based PDF signing,
FriendsDontLetFriends is a scientific data visualization guide and framework designed to help users create accurate plots while avoiding common data representation mistakes. It provides a collection of scripts and guidelines for selecting distribution plots, color scales, and layouts that accurately represent complex experimental data. The project distinguishes itself through specialized toolkits for revealing hidden patterns in large datasets. It includes systems for heatmap optimization via dimension reordering and outlier management, as well as spatial layout algorithms to improve the inte
MPAndroidChart is an Android charting library and data visualization framework that provides a set of reusable view components for rendering statistical data. It enables the display of numerical datasets through various chart types, including line, bar, pie, radar, bubble, and candlestick charts. The library focuses on an interactive graphing workflow, allowing users to explore complex data sets through scaling, panning, and animations. It includes specific support for financial charting to track market trends and price movements, as well as tools for building mobile dashboards.
Charts is a mobile data visualization library designed for rendering interactive graphical representations of complex datasets. It provides a declarative configuration interface that maps data structures to visual components, supporting a variety of chart types including line, bar, pie, scatter, and radar plots. The library distinguishes itself through a hardware-accelerated drawing layer that ensures high-performance rendering across mobile platforms. It features a gesture-driven transformation engine that enables users to pan, zoom, and scale views, alongside an interpolated animation syste
This project is a collection of responsive CSS Grid dashboard templates and a data visualization UI kit. It provides a set of HTML layouts designed for building analytics interfaces and monitoring views for KPIs and business metrics that adapt to different screen sizes. The toolkit is library-agnostic, allowing the connection of static HTML templates to any external data source or third-party charting library without requiring custom adapter code. It uses a template-driven approach to separate the visual structure of the dashboard from the underlying data. The capabilities cover the assembly
Rath is an LLM-powered data analytics platform and augmented analytics engine designed for automated data exploration and visualization. It serves as a self-service tool for discovering patterns within large datasets, translating natural language queries into visual charts, and identifying causal relationships between variables using graphical models. The platform distinguishes itself through an automated data visualization system that recommends optimal chart types and layouts to minimize perception errors. It integrates large language models to enable natural language data querying and empl
Hellocharts-android is a data visualization library and charting framework for Android applications. It provides a collection of custom view components used to render datasets as visual elements, such as line, column, and pie charts. The library supports interactive visualizations that allow users to navigate data through touch gestures, including pinching, scrolling, and panning. It also includes built-in capabilities for animating data points and chart elements to create smooth visual transitions during dataset updates. The framework covers a broad range of visualization needs, including c
nvd3 is a data visualization framework and reusable web graphing library. It provides a collection of interactive charting components built on top of the D3.js library to render complex datasets as graphics within a web browser. The library functions as a wrapper for D3.js, offering predefined chart types and modular templates. This implementation allows for the creation of custom data graphs and web dashboards without requiring the author to write low-level SVG code from scratch. The system utilizes SVG-based vector rendering and attribute-driven styling to generate visualizations. It incor
This project is a client-side data visualization framework and SVG charting library used to render responsive, interactive charts in a web browser. It functions as a lightweight utility for generating scalable vector graphics and data annotations without external dependencies. The library enables the creation of custom SVG charts with adjustable colors and animations to meet specific design requirements. It supports dynamic data updates and the addition of markers, regions, and tooltips to provide context to specific data points. The system covers broad capability areas including responsive
DeepAnalyze is an autonomous data science agent and research pipeline designed to transform raw datasets into comprehensive analysis reports. It operates by generating and executing Python code to perform data preparation, modeling, and visualization. The system utilizes a secure, containerized execution environment to run generated scripts in isolation from the host system. It includes a benchmarking tool to evaluate the accuracy and performance of large language models against standardized data science tasks and a standardized API gateway for managing model completions and file uploads. Th
sigma.js is a JavaScript graph visualization library and WebGL network renderer designed for drawing large-scale network graphs in web browsers. It functions as a high-performance engine capable of rendering network structures containing thousands of nodes and edges interactively. The library provides a customizable graph engine that allows for the creation of specialized visualizations using low-level graphics primitives and custom drawing layers. It supports the rendering of diverse node shapes, such as images or piecharts, and enables the integration of graph visualizations as overlays on
DBeaver is a universal database client and administration environment designed for managing diverse relational and non-relational database systems. It provides a unified graphical interface that enables users to perform data manipulation, schema migration, and performance monitoring across multiple platforms. By utilizing a standardized driver abstraction layer, the application translates generic requests into database-specific commands, ensuring consistent interaction regardless of the underlying technology. The project distinguishes itself through an extensible, plugin-based architecture th
This project is a declarative visualization library and geospatial framework designed for rendering large-scale data sets within web browsers. It functions as a high-performance graphics engine that leverages hardware acceleration to display complex 2D and 3D visual layers, enabling the visualization of millions of data points through a structured, component-based syntax. The framework distinguishes itself through its ability to synchronize custom data visualizations with third-party mapping platforms. By managing camera states and coordinate systems, it allows developers to overlay high-perf
WrenAI is a platform designed to enable natural language interaction with relational and analytical databases. By combining a text-to-SQL engine with semantic data modeling, it allows users to explore structured data through plain language questions, removing the requirement for manual code generation. The system functions by grounding natural language requests in a predefined business logic layer rather than raw database schemas. This semantic approach, supported by context-aware prompt engineering, ensures that generated queries remain consistent and accurate across an organization. The pla
Plotnine is a data visualization library for Python based on the Grammar of Graphics. It serves as a declarative statistical plotting framework and multi-panel plotting engine, allowing users to create complex charts by mapping data variables to visual properties such as position, color, and size. The project is distinguished by its use of a layered composition model and a statistical transformation engine that performs aggregations and computations before rendering visuals. It features a comprehensive system for multi-panel faceting, which enables the splitting of a single visualization into
ggplot2 is a data visualization library for R based on a formal grammar of graphics. It provides a declarative plotting framework that allows users to create complex graphics by combining geometric objects, statistical summaries, and coordinate systems. The system is distinguished by a layered approach to composition, where visualizations are built incrementally by stacking independent geometric, statistical, and coordinate layers. It utilizes a hierarchical styling engine to manage non-data elements such as backgrounds, fonts, and margins, and includes a multi-panel faceting tool for splitti
GoLearn is a machine learning library for the Go programming language. It provides a supervised learning framework and a toolkit for building, training, and evaluating predictive models through a standardized interface. The project implements a data frame system that loads CSV files into structured grids for matrix operations. It includes a preprocessing library for discretizing continuous variables and a model evaluation toolkit that utilizes confusion matrices and cross-validation to measure precision and recall. The library covers data engineering and management, including the ability to