6 个仓库
Accessing data using integer-based slicing.
Distinguishing note: Focuses on positional access rather than label-based access.
Explore 6 awesome GitHub repositories matching data & databases · Position-Based Data Selection. Refine with filters or upvote what's useful.
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Supports standard integer-based slicing for precise data retrieval.
Python is a high-level, interpreted programming language designed for readability and versatility. It operates via a bytecode-based virtual machine and manages memory automatically through reference-counting garbage collection. The language supports multiple programming paradigms, including object-oriented, imperative, and functional styles, and provides a comprehensive standard library for system operations, networking, and data handling. The language is distinguished by its dynamic nature, allowing for runtime object introspection and metaclass-driven class creation. It utilizes protocol-ba
Python retrieves specific items or sub-sequences from a collection using zero-based index positions or range-based slicing.
Dask 是一个并行计算框架和分布式任务调度器,旨在将 Python 数据科学工作流从单机扩展到大型集群。它作为一个集群资源管理器,通过将任务及其依赖项表示为有向无环图来编排计算逻辑。这种架构允许系统在管理复杂执行要求的同时,自动将工作负载分配到可用硬件上。 该项目通过一个延迟评估引擎脱颖而出,该引擎将数据操作推迟到明确请求时才执行,从而实现全局图优化和高效的资源分配。它结合了内存感知数据溢出功能,以防止在处理超过可用内存的数据集时系统崩溃,并利用任务图融合将操作序列组合成单个执行步骤,从而最大限度地减少调度开销和节点间通信。 该平台为大规模数据分析提供了全面的功能面,包括对分布式机器学习、高性能计算集成和并行数据处理的支持。它提供了用于集群生命周期管理、性能分析和任务执行实时监控的广泛工具。用户可以在各种基础设施上部署这些环境,包括本地硬件、云提供商、容器化系统和高性能计算集群。
Extracts specific columns from a dataset using integer-based positional indexing while maintaining the underlying distributed structure.
Hexyl is a colored hex dump utility and binary data viewer for the terminal. It allows for the inspection of binary files by rendering contents as a colored hex dump to distinguish between different byte categories, such as printable text, whitespace, and null bytes. The tool includes a C-style hex exporter that transforms binary data into C include files for direct integration into source code. It supports visual layout customization through configurable panels and borders, as well as the ability to define colors for byte categories and offsets using terminal colors or RGB hex codes via envi
Enables precise slicing of binary streams by specifying start offsets and data lengths.
Danfo.js 是一个 JavaScript 数据分析和预处理库,提供高性能的标签化数据结构。它实现了数据帧(DataFrames)和序列(Series),以支持复杂的数据分析、统计计算和结构化表格数据的操作。 该项目作为一个机器学习预处理库,提供用于分类标签编码、独热编码(One-hot encoding)以及数值特征缩放和标准化的实用程序。它特别促进了将标签化数据结构转换为张量(Tensors)以进行模型训练和评估的过程。 该库涵盖了广泛的能力,包括描述性统计、合并和连接等关系操作以及时间序列处理。它包括用于数据清洗、过滤和分组的工具,以及用于直接从数据帧生成交互式图表和绘图的视觉化界面。 该系统支持通过 CSV、JSON 和 Excel 格式导入和导出数据。
Retrieves specific data subsets using integer indices, arrays of positions, or slice notation.
This is a Python library providing sorted list, set, and dictionary data structures that maintain their order automatically during insertions and deletions. The library provides a sorted list for fast random access and logarithmic lookups, a sorted set for unique elements and set-theoretic operations, and a sorted dictionary for managing key-value pairs where keys remain sorted. These collections support custom sorting logic through user-defined key functions to determine the order of elements. Core capabilities include positional indexing, range queries, and the use of bisection methods to
Enables retrieving elements by their integer position using optimized lookups across internal sublists.