15 个仓库
Utilities for parsing and serializing tabular data.
Distinguishing note: No candidates provided; maps to data management.
Explore 15 awesome GitHub repositories matching data & databases · CSV Processing. Refine with filters or upvote what's useful.
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Reads and writes CSV files to and from datasets using standard file-based operations.
PapaParse is a delimited text processing library that converts CSV files into JSON objects or arrays. It provides a suite of tools for parsing delimited text and transforming structured data objects back into CSV formats through bidirectional serialization. The library is characterized by its ability to process massive datasets using incremental streaming and chunk-based processing to prevent memory overload. It includes an automatic delimiter detector to identify separator characters without manual configuration and utilizes web workers to offload parsing logic to background threads, keeping
Transforms structured JSON data back into delimited text formats through a reverse parsing process.
Jackson is a Java data binding framework and multi-format data serializer used to translate data structures into native language objects. It functions as a JSON data binding library and a streaming parser that reads and writes data as discrete tokens to process large datasets with minimal memory. The project distinguishes itself through a bytecode serialization accelerator that replaces standard reflection with generated bytecode to increase data binding speed. It employs a module-based extensibility model to support a wide range of formats beyond JSON, including XML, YAML, CSV, TOML, and bin
Includes utilities for parsing and serializing tabular CSV data, including header and empty row handling.
jc is a tool that transforms plain-text results from command-line utilities, system tools, log formats, and text tables into structured JSON data. It functions as a structured data transformer capable of converting various file formats, including CSV, INI, XML, and YAML, into JSON representations for programmatic use. The project includes a collection of specific parsers for Unix commands and system tools such as df, blkid, and various package managers. It also features specialized converters for web server logs, Common Log Format, and Common Event Format strings. The tool covers broad capab
Transforms comma-separated values files into JSON by detecting delimiters and using the first row as headers.
elasticsearch-dump is a command line tool for importing, exporting, and transferring data between Elasticsearch and OpenSearch instances. It functions as an index dump utility that saves documents, mappings, and analyzers to local files or standard output. The tool enables the movement of data between clusters using local files as an intermediary and can flatten nested JSON documents into CSV files for external analysis. It allows for the modification or anonymization of documents during the transfer process through the use of custom JavaScript functions. The utility covers data extraction a
Transforms JSON search documents into CSV format by flattening nested data structures.
Proxyman is a cross-platform HTTP debugging proxy that captures, inspects, and modifies HTTP, HTTPS, and WebSocket traffic. It functions as a man-in-the-middle proxy, decrypting SSL/TLS traffic to allow real-time inspection and modification of encrypted requests and responses. The tool is designed for debugging web and mobile applications, with capabilities for API mocking and simulation, scriptable traffic modification, and team collaboration on network logs. What distinguishes Proxyman is its deep integration with mobile and cross-platform development workflows. It provides automated certif
Exports captured requests and sessions as HAR, CSV, or Postman files for sharing.
This project is an administrative GIS toolset that provides a comprehensive dataset of China's administrative divisions, including provinces, cities, districts, and townships. It functions as a coordinate system transformer and a boundary converter for transforming geographic data into standard formats. The toolset distinguishes itself through the ability to convert administrative boundary data between CSV, GeoJSON, Shapefiles, and SQL. It includes specialized utilities for coordinate system transformation between GCJ-02, BD-09, WGS-84, and CGCS2000 standards to ensure accuracy across differe
Transforms tabular CSV data into JSON structures to populate frontend multi-level dropdown menus.
Webhook.site is a web-based tool that captures, inspects, and debugs incoming HTTP requests and emails sent to a unique URL, without requiring any server setup. It also functions as an API mocking and response modification tool, capable of generating mock APIs from OpenAPI specifications and altering HTTP responses, headers, and status codes for testing purposes. Beyond inspection, it serves as a platform for webhook automation and workflow orchestration, triggering multi-step automations—including database queries, SSH commands, and HTTP calls—when a webhook is received. The service distingu
Webhook.site fetches requests from the API, flattens nested JSON into columns, and generates a CSV file with variable fields.
csvkit is a composable Unix-style command-line toolkit for converting, filtering, and analyzing CSV files directly from the terminal. It provides a suite of focused single-purpose commands that can be combined via pipes to build complex data processing workflows, with a modular architecture that includes a column-type inference engine for automatically detecting data types and a streaming-pipeline design for efficient handling of tabular data. The toolkit distinguishes itself through its SQL-engine abstraction layer, which allows users to run SQL queries directly against CSV files without req
Outputs CSV data as JSON, enabling interchange with web and application formats.
GAM is a command-line tool for administering Google Workspace and Cloud Identity. It translates command-line arguments into structured API calls, enabling administrators to manage users, groups, organizational units, and domain settings across a Google Workspace environment. The tool handles authentication through OAuth2 flows, service accounts, and workload identity federation, and supports multi-tenant configurations for managing multiple domains or cloud projects from a single installation. GAM distinguishes itself through its batch processing and automation capabilities. It can process la
Provides the ability to execute bulk administrative commands based on target lists sourced from CSV files or Google Sheets.
Mapshaper 是一个用于处理、简化和转换地理矢量数据的工具,提供命令行界面、Web 浏览器工具和 Node.js 库。它作为一个坐标投影器、矢量数据转换器和 Web 地图资产优化器,旨在在不同的坐标参考系统和文件格式之间转换空间数据集。 该项目以其拓扑保持几何简化而著称,在减少顶点数量的同时保持共享边界,以防止间隙和重叠。它还通过坐标量化和属性过滤进一步优化 Web 资产,以减小文件大小。 该系统涵盖了广泛的功能,包括使用 PROJ 字符串和 EPSG 代码进行坐标重投影,以及跨 Shapefile、GeoJSON、TopoJSON、GeoPackage 和 KML 等格式的数据转换。它提供了广泛的几何处理工具,用于缓冲、裁剪、溶解和修复拓扑,以及用于属性连接、过滤和转换的数据管理实用程序。此外,它还包括用于生成样式化 SVG 导出、经纬网和比例符号地图的视觉功能。 空间处理功能可以通过其 Node.js 库直接集成到 JavaScript 应用程序和构建流水线中。
Reads and writes plain JSON arrays of objects as records with support for nested paths.
Exportify is a web-based music metadata tool and data archive utility designed to export Spotify playlists and liked songs into portable CSV files. It serves as a multi-language data export utility that allows users to save track, album, and artist details for long-term preservation or data analysis. The tool provides capabilities for bulk playlist archiving, where entire collections of playlists are saved as CSV files and bundled into a single ZIP archive. It also supports targeted extraction through playlist search and filtering based on ownership or collaboration status. The system handle
Transforms JSON-formatted track information from web services into comma-separated value files.
qsv is a high-performance command line toolkit for querying, transforming, and analyzing comma-separated value files. It functions as a data wrangling interface and a tabular data profiler, featuring a query engine capable of executing SQL statements and joins directly on flat files without requiring a database. The project is distinguished by its ability to process massive datasets that exceed available system memory. This is achieved through disk-based external memory processing, including multithreaded merge sorting, on-disk hash tables for deduplication, and lightweight file indexing for
Transforms nested or line-delimited JSON structures into flat CSV tables.
Tippecanoe is a command-line tool used to generate optimized vector tiles for web maps. It converts large-scale geospatial datasets, including GeoJSON, CSV, and Geobuf files, into binary vector tiles or MBTiles SQLite databases. The project is designed to maintain map performance and visual quality across different zoom levels. It achieves this through geospatial data downsampling, which includes simplifying geometries and thinning point density to prevent tile overcrowding and keep tile sizes within specific limits. The tool provides extensive data transformation capabilities, such as attri
Transforms geospatial JSON data into CSV format or extracts property keys into structured text.
This project is an Amazon web scraper and e-commerce data extractor designed to retrieve product names, prices, and ratings. It functions as a headless browser crawler that converts unstructured web content from product listings into structured JSON and CSV formats. The tool incorporates anti-bot bypass capabilities to circumvent CAPTCHAs and security challenges. It achieves this through the use of residential proxy integration, automatic proxy rotation, and the modification of browser fingerprints to simulate human interaction patterns. The system provides broad web scraping capabilities, i
Transforms structured product data from JSON API responses into portable CSV files.