30 open-source projects similar to nonoum/ecl, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best ECL alternative.
WaveStitch is a deep generative framework for conditional time series synthesis. It enables the generation of realistic time series data conditioned on auxiliary features (e.g., labels, metadata) and signal anchors (e.g., partial observations). This codebase provides tools for experimentation…
Olmocr is a distributed document processing framework designed to convert PDF and image files into structured markdown. It functions as a vision-based document parser that utilizes multimodal neural networks to interpret complex visual layouts and translate them into standardized text representations. The system operates as a remote inference orchestrator, offloading heavy document analysis tasks to external servers or cloud APIs to minimize local computational requirements. By employing a stateless worker architecture, it decouples document ingestion from inference, allowing for the distribu
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Fx is a command-line processing suite designed for the transformation, conversion, exploration, and visualization of structured data. It functions as a terminal-based utility that handles both automated shell pipelines and interactive navigation of complex, nested data hierarchies. The tool distinguishes itself by integrating a JavaScript-based engine that executes user-provided logic to filter, map, or modify data fields within a sandboxed runtime. It maintains a responsive interface by decoupling data processing from the display loop, allowing users to explore large datasets through an inte
Hadoop is a big data infrastructure suite and distributed data processing framework designed to store and process massive datasets across clusters of computers. It consists of a distributed storage system for managing large files across multiple nodes and a parallel computing engine for processing data across a distributed cluster. The framework implements a distributed file system to ensure fault tolerance and high throughput, paired with a programming model that processes large datasets in parallel. It manages the underlying hardware and software environment required for distributed big dat
Apache IoTDB is a time-series database designed for the Internet of Things, purpose-built to ingest high-volume data from millions of low-power devices and store timestamp-value pairs with configurable data types and encoding schemes. It organizes time series data and device metadata in a tree-like hierarchy, enabling efficient management of complex industrial sensor networks. The database supports rich querying capabilities, including time-aligned data retrieval across multiple devices, time-based aggregation like downsampling, and frequency-domain signal analysis. It provides high-throughpu
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
CMSIS-DSP is an optimized compute library for embedded systems (DSP is in the name for legacy reasons).
data compression library for embedded/real-time systems
Amazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.
This package provides an interface to the Amazon Kinesis Client Library (KCL) MultiLangDaemon for the .NET Framework.
Amazon Kinesis Client Library for Node.js
Amazon Kinesis Client Library for Python
A Ruby interface for the Amazon Kinesis Client Library. Allows developers to easily create robust application to process Amazon Kinesis streams in Ruby.
The Amazon Kinesis Connector Library helps Java developers integrate Amazon Kinesisaws-kinesis with other AWS and non-AWS services. The current version of the library provides connectors for Amazon DynamoDBaws-dynamodb, Amazon Redshiftaws-redshift, Amazon S3aws-s3,…
Amazon Kinesis Data Visualization Sample Application
Learning Amazon Kinesis Development
The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment
Amazon Kinesis output plugin for Fluentd
Amazon Redshift Database Loader implemented in AWS Lambda
The Amazon DynamoDB Streams Adapter implements the Amazon Kinesis interface so that your application can use KCL to consume and process data from a DynamoDB stream.
ARCHIVED: Log4J Appender for writing data into a Kinesis Stream
Simple multi-threaded Kinesis Poster and Worker Python examples