10 repository-uri
Execution of custom scripts to filter, modify, and reshape data payloads within a pipeline.
Distinct from Data Transformation: Specifically covers the use of custom script logic for payload reshaping rather than general data format conversion.
Explore 10 awesome GitHub repositories matching data & databases · Script-Based Transformations. Refine with filters or upvote what's useful.
Huginn is an open-source automation platform that functions as an event-driven task automator and webhook integration engine. It enables the creation of agents that monitor web data and automate tasks across various web services, operating as a self-hosted web scraper and JavaScript workflow orchestrator. The system uses a directed graph of event flows to route and transform data between external APIs. It differentiates itself by allowing custom JavaScript execution within workflows to modify data payloads and by integrating human-in-the-loop automation to insert manual judgment or data entry
Allows custom JavaScript execution to filter and reshape data payloads as they move between agents.
elasticsearch-dump is a command line tool for importing, exporting, and transferring data between Elasticsearch and OpenSearch instances. It functions as an index dump utility that saves documents, mappings, and analyzers to local files or standard output. The tool enables the movement of data between clusters using local files as an intermediary and can flatten nested JSON documents into CSV files for external analysis. It allows for the modification or anonymization of documents during the transfer process through the use of custom JavaScript functions. The utility covers data extraction a
Executes custom JavaScript scripts to modify or anonymize document fields during the data migration process.
Venera is a multi-source content reader and aggregator that allows users to browse and download media from various remote websites and local files through a unified interface. It functions as a local-remote media manager, synchronizing online content with local storage to enable offline viewing. The project utilizes a JavaScript-based content parser and aggregator to scrape and parse data from external web sources. This system allows for the definition of custom data extraction rules using JavaScript to fetch and display content from external websites. The platform covers remote media manage
Uses custom JavaScript definitions to scrape and structure data from external websites.
This project is a research data sharing framework and provenance protocol designed to ensure computational reproducibility. It provides a standardized set of guidelines for transforming raw source data into tidy formats through documented processing scripts and cleaning workflows. The framework distinguishes itself by emphasizing a strict provenance-based packaging system. It requires the organization of raw data, processing recipes, and code books into a single package, ensuring that original unmodified sources are preserved to allow for independent verification of all transformation steps.
Converts raw data into tidy formats using reproducible scripts to ensure consistent processing results.
Serial Studio is a desktop application for connecting to, decoding, visualizing, and recording data from hardware devices over multiple communication protocols. It functions as an embedded device debugging toolkit that ingests live data from Serial, Bluetooth, CAN, Modbus, MQTT, and network sockets into a unified dashboard, while also serving as a programmatic automation platform with over 320 commands exposed over TCP, gRPC, and MCP for external control. The application distinguishes itself through a scriptable frame pipeline that routes incoming bytes through configurable detection, decodin
Decodes raw frames using built-in templates, JavaScript, or Lua scripts, then applies per-dataset transforms like filtering and scaling.
Uptrace is an OpenTelemetry-based observability platform designed to collect, store, and analyze distributed traces, metrics, and logs. It functions as a centralized logging backend, a distributed tracing system, and a metrics engine to monitor application performance and system health. The platform is distinguished by AI-powered operational capabilities, allowing users to query telemetry data and manage monitoring dashboards using natural language. It specifically includes specialized monitoring for generative AI pipelines, tracking token usage and response quality for LLM interactions and r
Executes arbitrary expressions to modify telemetry, such as normalizing cardinality or parsing strings.
hledger is a plain text accounting tool and double-entry ledger manager that stores financial transactions in human-readable text files. It functions as a financial reporting engine for generating balance sheets and income statements, and as a multi-currency investment tracker for managing commodity lots and capital gains. The project distinguishes itself by providing multi-interface data access, allowing users to interact with their financial data via a command line interface, a terminal user interface, and a web server. It features a market-price valuation system to calculate the current va
Executes external scripts as readers, processors, or writers to handle custom financial data formats.
qsv is a high-performance command line toolkit for querying, transforming, and analyzing comma-separated value files. It functions as a data wrangling interface and a tabular data profiler, featuring a query engine capable of executing SQL statements and joins directly on flat files without requiring a database. The project is distinguished by its ability to process massive datasets that exceed available system memory. This is achieved through disk-based external memory processing, including multithreaded merge sorting, on-disk hash tables for deduplication, and lightweight file indexing for
Integrates external interpreters and scripting languages to perform complex data wrangling and custom transformations.
Cimoc is a manga reader application and cross-platform ebook viewer designed for reading digital comics and image-based documents. It functions as both an online content aggregator and an offline media library, supporting the display of media from local files and remote web sources. The application integrates various web providers through a custom parser system to fetch and display online content. It includes a synchronization system to save application settings and reading progress to a remote server, maintaining consistency across different devices. Users can customize their reading experi
Uses custom parsers to standardize diverse remote web data into a common internal format for display.
CyberScraper-2077 is an AI-powered web scraping tool that uses large language models to extract and structure data from websites into organized formats. It functions as an LLM web scraper and AI content parser, transforming unstructured raw web text into specific data schemas. The project distinguishes itself through a suite of anonymity and evasion tools, including proxy rotation, SOCKS-based identity masking, and the ability to route traffic through the Tor network to access hidden onion services. It further includes a bot detection bypass system that employs stealth parameters and custom n
Uses AI-powered parsing to structure raw web text into specific desired data schemas.