17 个仓库
Mechanisms for appending computed results as new columns to tabular data structures.
Distinct from Distributed Dataframes: Existing candidates focus on disk storage or distributed dataframes, not the specific act of adding columns to an in-memory pandas DataFrame.
Explore 17 awesome GitHub repositories matching data & databases · DataFrame Integration. Refine with filters or upvote what's useful.
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Converts pandas or polars DataFrame objects into internal high-performance tables while preserving indexing.
FastUI is a server-driven UI system and Pydantic UI framework that transforms backend data models into functional web interfaces. It operates as a model-based frontend generator where the server controls the layout and behavior of the user interface through structured data schemas, enabling a low-code approach to web development. The project allows for the definition of visual hierarchies and component properties on the backend, using a JSON-based protocol to communicate UI structure between the server and client. It utilizes schema-driven generation to automate the creation of interfaces, in
Displays tabular data from models with configurable columns, interactive links, and formatted fields.
Jeesite is a full-stack low-code development framework designed for building enterprise administrative portals using Spring Boot, MyBatis, and Vue. It functions as a comprehensive platform for creating administrative dashboards with integrated role-based access control and organizational data permission systems. The framework distinguishes itself through a combination of automated CRUD code generation and an integrated RAG platform that connects large language models to enterprise data via vector stores. It further incorporates a BPMN-based workflow engine to automate complex business process
Provides interactive data tables featuring sorting, pagination, and frozen columns for efficient administrative data management.
Mesop is a stateful, declarative Python web UI framework and component library designed for building interactive web applications and AI demos. It allows for the construction of data-driven interfaces and chat systems using only Python, removing the need to write separate HTML or CSS. The framework is specifically tailored for AI application development, offering dedicated tools for conversational UI design and the creation of dashboards for large language model applications. It distinguishes itself with a visual UI editor for real-time property adjustments and the ability to embed custom Jav
Renders data frames as interactive tables with sticky headers, columns, and clickable cells.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Converts Spark DataFrames into offline segment files and writes them to a specified filesystem path for ingestion.
dtale 是一个基于 Web 的 pandas 数据框交互式网格与可视化工具,设计为探索性数据分析工具。它提供了一个基于浏览器的界面用于分析表格数据结构,允许用户在无需编写手动代码的情况下计算统计数据、检测异常值并计算相关性。 该项目作为嵌入式数据查看器运行,可通过 iframe 或自定义路由集成到 Web 应用中,并对 Django、Flask 与 Streamlit 提供特定支持。它通过交互式数据网格与能够生成直方图、箱线图与 3D 散点图的数据可视化库的组合,实现了对数据集的探索。 该平台涵盖了广泛的数据管理与分析能力,包括表格数据清理、重塑与交互式过滤。它包括用于缺失数据分析、相关性计算与预测能力评分的观测工具。对于会话管理,它支持多实例追踪与跨并发工作进程的状态持久化。 该界面受用户名与密码认证保护,并支持从分隔文件、电子表格与 ArcticDB 数据存储中进行数据摄入。
Connects to high-performance ArcticDB datastores to load and filter large-scale dataframes.
This is a pandas-based technical analysis library and financial feature engineering tool. It serves as a vectorized indicator calculator that transforms raw price and volume data into derived metrics for time series analysis. The library uses a NumPy-based engine to perform mathematical operations across entire arrays, avoiding iterative loops to maintain high performance. It organizes technical indicators into a modular class hierarchy with a consistent interface, allowing for bulk feature generation and the direct appending of results as new columns to a pandas DataFrame. The system covers
Appends computed indicator results as new columns to a pandas DataFrame to maintain time series alignment.
Mimesis 是一个 Python 合成数据生成器,用于为软件测试和开发创建逼真的虚假数据集和模拟数据。它作为一个基于模式的数据集生成器,能够生成结构化记录和关系数据集,同时也可作为生产数据脱敏工具,用合成值替换敏感信息。 该库的特色在于全面的多语言支持,允许生成特定区域的信息以模拟区域用户画像。它通过使用种子进行确定性数据生成来确保可重复性,从而在不同运行中创建一致的数据集。 该工具涵盖了广泛的合成内容,包括个人身份、财务数据、地理地址、网络元数据和科学序列。其功能扩展到通过条件逻辑和管道进行数据转换,以及与 DataFrame 和工厂模式的集成。它还支持生成标准化的系统代码、加密令牌和二进制文件模拟。 该框架可通过自定义数据提供程序和字段处理器进行扩展,允许用户集成特定领域的逻辑和外部 JSON 文件以进行专门的数据生成。
Generates synthetic columns for use in tabular data structures like pandas DataFrames.
statsforecast 是一个高性能统计时间序列预测库,旨在生成点预测和预测区间。它作为一个分布式时间序列框架,利用基于 C 的预测引擎和自动模型选择器来识别并拟合数据集中每个唯一序列的最佳统计模型。该系统还包括一个时间序列异常检测器,通过将观测值与概率预测区间进行比较来识别异常数据点。 该项目的特色在于其处理数百万个独立序列的大规模并行预测的能力。它通过分布式计算框架、多核并行执行和加速核心 ARIMA 及指数平滑逻辑的编译 C 内核来实现这一点。该系统进一步利用长格式数据布局和惰性求值数据流水线来优化大规模处理,以减少内存开销。 该库提供了一套全面的模型,包括 AutoARIMA、用于间歇性或季节性需求的各种指数平滑方法、Theta 分解以及用于金融风险的 GARCH 波动率建模。它涵盖了更广泛的功能领域,例如带有外生变量的多元预测、时间序列分解以及通过历史交叉验证和滑动窗口分析进行模型评估。 该库与 Polars 等高性能数据结构集成,并提供将保存的模型作为 REST 端点提供服务以进行网络可访问预测的实用程序。
Integrates with Polars data structures to accelerate memory management and processing during forecasting.
Plotnine 是一个基于“图形语法”(Grammar of Graphics)的 Python 数据可视化库。它作为一个声明式统计绘图框架和多面板绘图引擎,允许用户通过将数据变量映射到位置、颜色和大小等视觉属性来创建复杂的图表。 该项目的特点在于其分层组合模型和统计转换引擎,后者在渲染视觉效果前执行聚合和计算。它具有全面的多面板分面(faceting)系统,能够根据分类变量将单个可视化图表拆分为子图网格。 该库涵盖了广泛的功能,包括用于分布图、面积图和散点图的多种几何表示,以及用于渲染地理边界的地理空间可视化。它提供了丰富的工具用于比例映射、坐标投影和基于主题的样式设置,从而将数据驱动元素与非数据美学属性分离开来。 该框架利用 Matplotlib 后端进行渲染,并通过管道操作与表格数据框(DataFrames)集成。
Integrates tabular dataframes via piping operations, converting external pandas or polars objects into internal plotting formats.
aws-sdk-pandas 是一个 Python 库,将 pandas 数据帧与 AWS 服务集成,充当云数据 ETL 工具和数据湖连接器。它提供了一个统一界面,用于在内存中数据帧与云存储、数据库和数据仓库之间移动和转换数据。 该项目作为分布式计算编排器脱颖而出,能够将基于 pandas 的工作负载提交到 EMR 集群和无服务器处理环境。它进一步专门通过 Ray 集群初始化来协调分布式数据处理,以处理超出单机内存的数据集。 该库涵盖了广泛的功能,包括 S3 的对象存储管理、Athena 和 Redshift 的 SQL 查询执行,以及与 NoSQL、图和时间序列数据库的集成。它还包括通过 Glue 目录进行元数据管理、OpenSearch 数据索引以及在 QuickSight 中管理商业智能资产的实用程序。 其他功能包括检索密钥、分析 CloudWatch 日志以及管理数据质量规则集。
Wraps multiple cloud service APIs to convert remote query results directly into Pandas dataframes.
dcat-admin 是一个 Laravel 管理面板框架,用于快速构建数据驱动的管理界面。它作为一个 CRUD 生成器和后端脚手架工具,根据数据库表模式自动生成创建、读取、更新和删除界面。 该系统通过基于插件的扩展架构以及在单个安装中运行多个独立管理实例的能力脱颖而出。它提供了将外部 API 映射到表单和表格的专用工具,以及用于在解析和提交期间执行自定义逻辑的事件驱动表单生命周期。 该框架涵盖了广泛的功能领域,包括用于管理分层权限的基于角色的访问控制、一套包含内联编辑功能的综合数据管理网格,以及多步表单工作流。它还包括用于操作仪表板的数据可视化工具,以及各种用于分块大文件上传和富文本编辑的内容处理实用程序。 提供了命令行实用程序来自动化管理组件和操作类的生成。
Renders database records in an expandable tree structure with lazy-loading for child nodes.
Vizro is a low-code Python framework for building production-ready data visualization applications. It functions as a UI orchestrator that allows users to define multi-page analytical dashboards through structured configurations in Python, YAML, or JSON, reducing the need for extensive frontend engineering. The project distinguishes itself through generative AI integration, utilizing a model context protocol server to translate natural language descriptions into validated dashboard configurations, charts, and layouts. It also features a decoupled data cataloging system that separates data sou
Displays dataframes in interactive tables with pre-configured sorting and pagination.
This project is a Python library that wraps official NBA endpoints to retrieve player, team, and game statistics as structured data. It serves as a programmatic interface for fetching professional basketball league records and real-time scoreboards via HTTP requests. The library integrates with Pandas to transform raw JSON responses from sports servers into DataFrames for statistical analysis and data science. It functions as a data retrieval utility for tracking league-wide performance trends and scouting professional basketball players. The tool covers a broad range of capabilities includi
Transforms raw JSON responses from sports servers into Pandas DataFrames for statistical analysis and data science.
This is a structured deep learning curriculum for programmers, delivered as a collection of Jupyter notebooks. It teaches the fundamentals of training neural networks for computer vision, natural language processing, tabular data analysis, and collaborative filtering using PyTorch and the fastai library. The course is designed to be hands-on, guiding learners from building a training loop from scratch to fine-tuning pretrained models for a variety of practical tasks. The curriculum distinguishes itself by covering the full lifecycle of a deep learning project, from data preparation and augmen
Reads column values from DataFrame rows as labels for supervised learning tasks.
该项目是一套为 Svelte 框架构建的可访问、可重用的界面组件集合。它作为一套完整的设计系统实现,为构建符合既定设计语言和可访问性准则的响应式、包容性用户界面提供了标准化的工具包。 该库的特色在于与 Svelte 框架的深度集成,利用基于编译器的转换来优化组件渲染和响应式状态同步。它具有强大的主题管理系统,通过 CSS 自定义属性应用视觉样式,允许在运行时动态切换主题。此外,该库为浮动 UI 元素采用了基于 Portal 的渲染方式,以确保覆盖层不会被父容器约束所遮挡。 该组件套件涵盖了广泛的界面需求,包括结构化数据表管理、带有集成验证的动态表单构建以及响应式布局容器。它还提供了用于跟踪屏幕断点、管理应用状态持久化以及通过内联或模态系统发送用户通知的专用实用程序。该库旨在通过在构建过程中剔除未使用的样式并优化资源交付,来支持高效的开发工作流。
Renders structured datasets into sortable, interactive tables with defined headers and row identifiers.
React Base Table is a library of reusable interface components designed for building complex, responsive data grids within web applications. It provides a high-performance foundation for rendering large datasets by utilizing window-based row virtualization, which ensures the user interface remains responsive even when displaying extensive collections of data. The library distinguishes itself through flexible layout and navigation capabilities, including support for hierarchical data structures that can be rendered as expandable tree rows. It allows for precise control over table geometry thro
Organizes and renders nested data structures as expandable tree rows to allow exploration of parent-child relationships.