Pathway

Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources.

The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features integrated vector-aware data ingestion, which automates the creation and maintenance of searchable document indexes that update instantly as new data arrives. Developers can connect language models directly into their pipelines, utilizing built-in capabilities for document chunking, embedding generation, and result reranking to maintain synchronized, context-aware information retrieval.

Beyond its core processing capabilities, the platform provides a robust infrastructure for deploying data applications. It supports the transition from batch to streaming workflows by simply updating input connectors, while its containerized deployment model allows for scaling services across local and cloud environments. The system is designed to handle large-scale event-driven tasks, providing a consistent programming model for both analytics and automated content generation workflows.

Features

RAG Pipelines - Constructs retrieval-augmented generation workflows that chunk, rerank, and integrate private data for accurate model responses.
Data Processing Frameworks - Executes high-performance data transformations using a unified engine capable of managing both batch and streaming sources.
Data Stream Processors - Manages complex data transformations on real-time flows via an engine compatible with standard programming environments.
Declarative Pipeline Construction - Defines complex data transformation workflows as static, optimized graphs before execution.
Differential Dataflow Engines - Propagates data updates incrementally through a directed graph of operators to maintain real-time consistency.
Exactly-Once Processing Semantics - Ensures every input record is processed exactly once through reliable checkpointing and deterministic execution.
Stream Processing Engines - Delivers low-latency computation on real-time data streams by applying consistent logic across diverse data sources.
Real-Time Data Processors - Processes continuous data streams in real-time to facilitate immediate event-driven analytics.
Model Provider Adapters - Applies model wrappers to data columns to normalize requests and responses across various language model providers.
Vector-Aware Data Ingestion - Embeds automated document chunking and vector generation directly into data pipelines to keep searchable indexes synchronized.
Vector Data Ingestion Frameworks - Automates the generation and real-time updating of searchable vector data for artificial intelligence applications.
Stream-Oriented Data Pipelines - Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
Vector Search Indexes - Automates the creation and maintenance of searchable document indexes that update instantly as new data arrives.
Incremental State Management - Caches intermediate computation results in memory to eliminate redundant re-processing within data pipelines.
Feature Flagging Systems - Tools for managing feature toggles and conditional code execution in production. Distinguishing note: None available; no candidates provided.
Real-Time AI Pipelines - Connects live data streams to language models for instant, context-aware content generation and analysis.
RAG Frameworks - Performant Python ETL framework with Rust runtime for data ingestion.
Retrieval Augmented Generation - ETL framework for real-time RAG and stream processing.
Data Analysis - Real-time data processing framework.
Data Orchestration - Performant Python ETL framework with a Rust runtime.
Databases and RAG - ETL framework for real-time RAG and pipelines.
Stream Processing - High-performance Python ETL framework powered by a Rust runtime.
Streaming Engines - Unified engine for batch, streaming, and LLM applications.
Distributed Data Platforms - Scales containerized data services across local and cloud environments with robust performance and network connectivity.
Reranking Engines - Evaluates the relevance of retrieved documents against user queries using reranking models to filter significant information.
Vector Document Indexing - Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing.
Data Application Deployment - Packages data processing services into containerized images to ensure reliable scaling and deployment across infrastructure.
Agentic Systems Frameworks - Supports the creation of autonomous agents by providing the underlying infrastructure for complex, event-driven decision logic.
Unified Batch and Stream Processing Engines - Synchronizes historical record analysis and real-time event ingestion within a single, consistent programming interface.
Document and LLM Preparation - Converts unstructured files into machine-readable segments using specialized parsers optimized for downstream model consumption.
Deployment Management and Strategies - Streamlines the lifecycle management of data-intensive applications through robust orchestration of containerized service releases.

Star history

pathwaycompathway

Name: pathwaycom/pathway
Author: pathwaycom

View on GitHub

62,959 stars1,677 forksPython20 viewspathway.com

Pathway

Features

RAG Pipelines - Constructs retrieval-augmented generation workflows that chunk, rerank, and integrate private data for accurate model responses.
Data Processing Frameworks - Executes high-performance data transformations using a unified engine capable of managing both batch and streaming sources.
Data Stream Processors - Manages complex data transformations on real-time flows via an engine compatible with standard programming environments.
Declarative Pipeline Construction - Defines complex data transformation workflows as static, optimized graphs before execution.
Differential Dataflow Engines - Propagates data updates incrementally through a directed graph of operators to maintain real-time consistency.
Exactly-Once Processing Semantics - Ensures every input record is processed exactly once through reliable checkpointing and deterministic execution.
Stream Processing Engines - Delivers low-latency computation on real-time data streams by applying consistent logic across diverse data sources.
Real-Time Data Processors - Processes continuous data streams in real-time to facilitate immediate event-driven analytics.
Model Provider Adapters - Applies model wrappers to data columns to normalize requests and responses across various language model providers.
Vector-Aware Data Ingestion - Embeds automated document chunking and vector generation directly into data pipelines to keep searchable indexes synchronized.
Vector Data Ingestion Frameworks - Automates the generation and real-time updating of searchable vector data for artificial intelligence applications.
Stream-Oriented Data Pipelines - Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
Vector Search Indexes - Automates the creation and maintenance of searchable document indexes that update instantly as new data arrives.
Incremental State Management - Caches intermediate computation results in memory to eliminate redundant re-processing within data pipelines.
Feature Flagging Systems - Tools for managing feature toggles and conditional code execution in production. Distinguishing note: None available; no candidates provided.
Real-Time AI Pipelines - Connects live data streams to language models for instant, context-aware content generation and analysis.
RAG Frameworks - Performant Python ETL framework with Rust runtime for data ingestion.
Retrieval Augmented Generation - ETL framework for real-time RAG and stream processing.
Data Analysis - Real-time data processing framework.
Data Orchestration - Performant Python ETL framework with a Rust runtime.
Databases and RAG - ETL framework for real-time RAG and pipelines.
Stream Processing - High-performance Python ETL framework powered by a Rust runtime.
Streaming Engines - Unified engine for batch, streaming, and LLM applications.
Distributed Data Platforms - Scales containerized data services across local and cloud environments with robust performance and network connectivity.
Reranking Engines - Evaluates the relevance of retrieved documents against user queries using reranking models to filter significant information.
Vector Document Indexing - Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing.
Data Application Deployment - Packages data processing services into containerized images to ensure reliable scaling and deployment across infrastructure.
Agentic Systems Frameworks - Supports the creation of autonomous agents by providing the underlying infrastructure for complex, event-driven decision logic.
Unified Batch and Stream Processing Engines - Synchronizes historical record analysis and real-time event ingestion within a single, consistent programming interface.
Document and LLM Preparation - Converts unstructured files into machine-readable segments using specialized parsers optimized for downstream model consumption.
Deployment Management and Strategies - Streamlines the lifecycle management of data-intensive applications through robust orchestration of containerized service releases.

Open-source alternatives to Pathway

Similar open-source projects, ranked by how many features they share with Pathway.

pathwaycom/llm-app
pathwaycom/llm-app
59,341View on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
Jupyter Notebookchatbothugging-facellm
View on GitHub59,341
apache/flink
apache/flink
26,086View on GitHub
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Java
View on GitHub26,086
hazelcast/hazelcast
hazelcast/hazelcast
6,570View on GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Javabig-datacachingdata-in-motion
View on GitHub6,570

Frequently asked questions

What does pathwaycom/pathway do?

What are the main features of pathwaycom/pathway?

The main features of pathwaycom/pathway are: RAG Pipelines, Data Processing Frameworks, Data Stream Processors, Declarative Pipeline Construction, Differential Dataflow Engines, Exactly-Once Processing Semantics, Stream Processing Engines, Real-Time Data Processors.

What are some open-source alternatives to pathwaycom/pathway?

Open-source alternatives to pathwaycom/pathway include: pathwaycom/llm-app — This project is a data processing engine and AI application platform designed for building production-grade machine… apache/flink — Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite… hazelcast/hazelcast — Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to… risingwavelabs/risingwave — RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process… vonng/ddia — This project serves as a comprehensive technical reference for the architecture and design of data-intensive… openai/chatgpt-retrieval-plugin — This project is a retrieval-augmented generation pipeline designed for building custom ChatGPT plugins that allow…