# pathwaycom/pathway

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/pathwaycom-pathway).**

62,959 stars · 1,677 forks · Python · NOASSERTION

## Links

- GitHub: https://github.com/pathwaycom/pathway
- Homepage: https://pathway.com
- awesome-repositories: https://awesome-repositories.com/repository/pathwaycom-pathway.md

## Topics

`batch-processing` `data-analytics` `data-pipelines` `data-processing` `dataflow` `etl` `etl-framework` `iot-analytics` `kafka` `machine-learning-algorithms` `pathway` `python` `real-time` `rust` `stream-processing` `streaming` `time-series-analysis`

## Description

Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources.

The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features integrated vector-aware data ingestion, which automates the creation and maintenance of searchable document indexes that update instantly as new data arrives. Developers can connect language models directly into their pipelines, utilizing built-in capabilities for document chunking, embedding generation, and result reranking to maintain synchronized, context-aware information retrieval.

Beyond its core processing capabilities, the platform provides a robust infrastructure for deploying data applications. It supports the transition from batch to streaming workflows by simply updating input connectors, while its containerized deployment model allows for scaling services across local and cloud environments. The system is designed to handle large-scale event-driven tasks, providing a consistent programming model for both analytics and automated content generation workflows.

## Tags

### Artificial Intelligence & ML

- [RAG Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation/rag-pipelines.md) — Constructs retrieval-augmented generation workflows that chunk, rerank, and integrate private data for accurate model responses.
- [Model Provider Adapters](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/model-provider-adapters.md) — Applies model wrappers to data columns to normalize requests and responses across various language model providers. ([source](https://pathway.com/developers/user-guide/llm-xpack/overview))
- [Vector-Aware Data Ingestion](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/vector-aware-data-ingestion.md) — Embeds automated document chunking and vector generation directly into data pipelines to keep searchable indexes synchronized.
- [Real-Time AI Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/domain-specific-processing-pipelines/real-time-ai-pipelines.md) — Connects live data streams to language models for instant, context-aware content generation and analysis.
- [Reranking Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-research/retrieval-augmented-generation-systems/reranking-engines.md) — Evaluates the relevance of retrieved documents against user queries using reranking models to filter significant information. ([source](https://pathway.com/developers/user-guide/llm-xpack/overview))
- [Agentic Systems Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks.md) — Supports the creation of autonomous agents by providing the underlying infrastructure for complex, event-driven decision logic. ([source](https://github.com/pathwaycom/pathway))

### Data & Databases

- [Data Processing Frameworks](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks.md) — Executes high-performance data transformations using a unified engine capable of managing both batch and streaming sources. ([source](https://pathway.com/developers/user-guide/introduction/welcome))
- [Data Stream Processors](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/data-stream-processors.md) — Manages complex data transformations on real-time flows via an engine compatible with standard programming environments. ([source](https://cdn.jsdelivr.net/gh/pathwaycom/pathway@main/README.md))
- [Declarative Pipeline Construction](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/declarative-pipeline-construction.md) — Defines complex data transformation workflows as static, optimized graphs before execution.
- [Differential Dataflow Engines](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/differential-dataflow-engines.md) — Propagates data updates incrementally through a directed graph of operators to maintain real-time consistency.
- [Exactly-Once Processing Semantics](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/exactly-once-processing-semantics.md) — Ensures every input record is processed exactly once through reliable checkpointing and deterministic execution.
- [Stream Processing Engines](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/stream-processing-engines.md) — Delivers low-latency computation on real-time data streams by applying consistent logic across diverse data sources.
- [Real-Time Data Processors](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/real-time-data-processors.md) — Processes continuous data streams in real-time to facilitate immediate event-driven analytics. ([source](https://github.com/pathwaycom/pathway))
- [Vector Data Ingestion Frameworks](https://awesome-repositories.com/f/data-databases/data-engineering/vector-ai-data-pipelines/vector-data-ingestion-frameworks.md) — Automates the generation and real-time updating of searchable vector data for artificial intelligence applications.
- [Stream-Oriented Data Pipelines](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/stream-processing-systems/data-streaming/stream-oriented-data-pipelines.md) — Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic. ([source](https://pathway.com/developers/user-guide/connecting-to-data/switch-from-batch-to-streaming))
- [Vector Search Indexes](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/vector-search-indexes.md) — Automates the creation and maintenance of searchable document indexes that update instantly as new data arrives.
- [Vector Document Indexing](https://awesome-repositories.com/f/data-databases/database-management-systems/database-engines/vector-databases/vector-document-indexing.md) — Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing. ([source](https://pathway.com/developers/user-guide/llm-xpack/overview))
- [Unified Batch and Stream Processing Engines](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/unified-batch-and-stream-processing-engines.md) — Synchronizes historical record analysis and real-time event ingestion within a single, consistent programming interface.
- [Document and LLM Preparation](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/document-llm-preparation.md) — Converts unstructured files into machine-readable segments using specialized parsers optimized for downstream model consumption. ([source](https://pathway.com/developers/user-guide/llm-xpack/overview))

### Software Engineering & Architecture

- [Incremental State Management](https://awesome-repositories.com/f/software-engineering-architecture/architectural-design-patterns/state-management/state-logic-and-utilities/incremental-state-management.md) — Caches intermediate computation results in memory to eliminate redundant re-processing within data pipelines.
- [Feature Flagging Systems](https://awesome-repositories.com/f/software-engineering-architecture/feature-flagging-systems.md) ([source](https://github.com/pathwaycom/pathway))

### Part of an Awesome List

- [RAG Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/rag-frameworks.md) — Performant Python ETL framework with Rust runtime for data ingestion.
- [Retrieval Augmented Generation](https://awesome-repositories.com/f/awesome-lists/ai/retrieval-augmented-generation.md) — ETL framework for real-time RAG and stream processing.
- [Data Analysis](https://awesome-repositories.com/f/awesome-lists/data/data-analysis.md) — Real-time data processing framework.
- [Databases and RAG](https://awesome-repositories.com/f/awesome-lists/data/databases-and-rag.md) — ETL framework for real-time RAG and pipelines.
- [Stream Processing](https://awesome-repositories.com/f/awesome-lists/data/stream-processing.md) — High-performance Python ETL framework powered by a Rust runtime.
- [Streaming Engines](https://awesome-repositories.com/f/awesome-lists/devtools/streaming-engines.md) — Unified engine for batch, streaming, and LLM applications.

### DevOps & Infrastructure

- [Distributed Data Platforms](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/distributed-data-platforms.md) — Scales containerized data services across local and cloud environments with robust performance and network connectivity.
- [Data Application Deployment](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/deployment-strategies/data-application-deployment.md) — Packages data processing services into containerized images to ensure reliable scaling and deployment across infrastructure. ([source](https://cdn.jsdelivr.net/gh/pathwaycom/pathway@main/README.md))
- [Deployment Management and Strategies](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies.md) — Streamlines the lifecycle management of data-intensive applications through robust orchestration of containerized service releases. ([source](https://pathway.com/developers/user-guide/deployment/render-deploy/))
