Why is nushell/nushell a recommended Dataframe Engines GitHub Repositories repository?

Supports importing data as eager or lazy dataframes for optimized query execution.

Why is wesm/pydata-book a recommended Dataframe Engines GitHub Repositories repository?

Provides dataframe-based relational modeling for filtering, joining, and aggregating structured datasets.

Why is vonng/ddia a recommended Dataframe Engines GitHub Repositories repository?

Provides engines for cleaning and transforming tabular data using dataframe abstractions.

Why is kanaries/pygwalker a recommended Dataframe Engines GitHub Repositories repository?

Provides interactive drag-and-drop visualization capabilities specifically for dataframe-based tabular data.

Why is modin-project/modin a recommended Dataframe Engines GitHub Repositories repository?

Provides a distributed dataframe engine for loading and processing tabular data that exceeds system memory.

Why is lancedb/lancedb a recommended Dataframe Engines GitHub Repositories repository?

Ingests Pandas DataFrames directly into tables to bridge vector storage and data analysis workflows.

Why is apache/datafusion a recommended Dataframe Engines GitHub Repositories repository?

Provides a lazy DataFrame API for building and executing analytic queries programmatically.

Why is saulpw/visidata a recommended Dataframe Engines GitHub Repositories repository?

Integrates with Pandas dataframe abstractions to load and process complex tabular data.

Why is polakowo/vectorbt a recommended Dataframe Engines GitHub Repositories repository?

Represents all financial time series, signals, and portfolio states as pandas DataFrames.

Why is bukosabino/ta a recommended Dataframe Engines GitHub Repositories repository?

Integrates computed indicators directly into pandas DataFrame structures while preserving time series alignment.

12 Repos

Awesome GitHub RepositoriesDataframe Engines

Support for loading and processing tabular data using dataframe abstractions.

Distinguishing note: Focuses on dataframe-specific loading rather than general file parsing.

Explore 12 awesome GitHub repositories matching data & databases · Dataframe Engines. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

nushell/nushell
nushell/nushell
39,743Auf GitHub ansehen
Nushell is a cross-platform shell and programming language designed to treat all input and output as structured data rather than raw text streams. By enforcing data types and command signatures, it provides a consistent environment for building robust, pipeline-oriented workflows. The shell allows users to chain commands that pass structured objects between stages, enabling complex data processing and automation tasks that remain predictable across different operating systems. What distinguishes the project is its focus on interactive data exploration and modular extensibility. Users can quer
Supports importing data as eager or lazy dataframes for optimized query execution.
Rustnushellrustshell
Auf GitHub ansehen39,743
wesm/pydata-book
wesm/pydata-book
24,668Auf GitHub ansehen
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
Provides dataframe-based relational modeling for filtering, joining, and aggregating structured datasets.
Jupyter Notebook
Auf GitHub ansehen24,668
vonng/ddia
Vonng/ddia
22,648Auf GitHub ansehen
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Provides engines for cleaning and transforming tabular data using dataframe abstractions.
Pythonbookdatabaseddia
Auf GitHub ansehen22,648
kanaries/pygwalker
Kanaries/pygwalker
15,628Auf GitHub ansehen
Pygwalker is a library that transforms tabular data into interactive, drag-and-drop interfaces for exploratory analysis and visualization. It functions as a grammar-based framework that translates user interactions into declarative chart definitions, allowing for the creation of dynamic data exploration environments directly within notebooks or embedded web applications. The system distinguishes itself by offloading heavy analytical computations to backend kernels, which maintains responsiveness when visualizing large datasets. It supports the serialization of visual states into portable conf
Provides interactive drag-and-drop visualization capabilities specifically for dataframe-based tabular data.
Pythondata-analysisdata-explorationdataframe
Auf GitHub ansehen15,628
modin-project/modin
modin-project/modin
10,389Auf GitHub ansehen
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Provides a distributed dataframe engine for loading and processing tabular data that exceeds system memory.
Pythonanalyticsdata-sciencedataframe
Auf GitHub ansehen10,389
lancedb/lancedb
lancedb/lancedb
9,031Auf GitHub ansehen
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Ingests Pandas DataFrames directly into tables to bridge vector storage and data analysis workflows.
HTMLapproximate-nearest-neighbor-searchimage-searchnearest-neighbor-search
Auf GitHub ansehen9,031
apache/datafusion
apache/datafusion
8,908Auf GitHub ansehen
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Provides a lazy DataFrame API for building and executing analytic queries programmatically.
Rustarrowbig-datadataframe
Auf GitHub ansehen8,908
saulpw/visidata
saulpw/visidata
8,834Auf GitHub ansehen
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
Integrates with Pandas dataframe abstractions to load and process complex tabular data.
Pythonclicsvdatajournalism
Auf GitHub ansehen8,834
polakowo/vectorbt
polakowo/vectorbt
6,720Auf GitHub ansehen
VectorBT is a vectorized trading strategy backtesting framework that simulates thousands of strategy configurations in a single pass over historical price data. It operates as a parameter optimization engine, a portfolio performance analyzer, a technical indicator calculator, and a financial data fetcher, all built around a DataFrame-centric data model that uses NumPy broadcasting for signal alignment and compiled code acceleration for performance. The framework distinguishes itself through its ability to run large-scale parameter sweeps by constructing every combination of strategy parameter
Represents all financial time series, signals, and portfolio states as pandas DataFrames.
Pythonalgorithmic-tradingalgorithmic-traidingbacktesting
Auf GitHub ansehen6,720
bukosabino/ta
bukosabino/ta
4,890Auf GitHub ansehen
This is a pandas-based technical analysis library and financial feature engineering tool. It serves as a vectorized indicator calculator that transforms raw price and volume data into derived metrics for time series analysis. The library uses a NumPy-based engine to perform mathematical operations across entire arrays, avoiding iterative loops to maintain high performance. It organizes technical indicators into a modular class hierarchy with a consistent interface, allowing for bulk feature generation and the direct appending of results as new columns to a pandas DataFrame. The system covers
Integrates computed indicators directly into pandas DataFrame structures while preserving time series alignment.
Jupyter Notebookfinancialfundamental-analysismomentum
Auf GitHub ansehen4,890
freqtrade/freqtrade-strategies
freqtrade/freqtrade-strategies
4,861Auf GitHub ansehen
This is a library of cryptocurrency trading algorithms and technical analysis strategies designed for use with the Freqtrade trading bot. The project provides a collection of pre-defined rules and mathematical indicators used to automate the buying and selling of digital assets. The repository focuses on algorithmic trading strategies and bot-driven asset management to remove manual execution from cryptocurrency trades. It enables quantitative trading analysis by allowing the development and testing of rule-based logic against historical market data. The system utilizes class-based strategy
Uses pandas dataframes to perform vectorized calculations on historical candle data for fast technical indicator analysis.
Pythonbitcoincryptocurrencyfreqtrade-strategies
Auf GitHub ansehen4,861
jmcnamara/xlsxwriter
jmcnamara/XlsxWriter
3,911Auf GitHub ansehen
XlsxWriter is a library for generating spreadsheets in the XLSX format, functioning as an Excel workbook writer and file generator. It provides the capability to write data, apply cell formatting, and build complex layouts across multiple worksheets. The project distinguishes itself with a memory-optimized writing mode that flushes large datasets to disk row-by-row, enabling the creation of files exceeding 4 GB while minimizing RAM consumption. It also includes a specialized mechanism for embedding binary project files and digital signatures to enable VBA macros and signed scripts within work
Supports writing external dataframes to specific worksheets and exact cell coordinates.
Pythonchartslibxlsxwriterpandas
Auf GitHub ansehen3,911