← All repositories
59,684 stars1,611 forksPythonother2 views
pathway.com

Pathway

Features

  • AI Pipeline OrchestratorsA development environment for building automated workflows that connect language models with live data sources for intelligent content generation.
  • Declarative Pipeline ConstructionDefines complex data transformation workflows as a static graph of operations that the engine optimizes before execution.
  • Exactly-Once Processing SemanticsGuarantees that every input record is accounted for exactly once through robust checkpointing and deterministic operator execution logic.
  • Data Processing PipelinesRun high-performance data transformation tasks using a unified engine that handles both batch and streaming sources while ensuring every record is processed exactly once.
  • Data Stream ProcessorsExecute complex data transformations by running batch or real-time tasks through a unified engine that maintains full compatibility with standard programming environments for analytics and events.
  • Differential Dataflow EnginesProcesses data updates incrementally by tracking changes through a directed graph of operators to ensure consistent real-time results.
  • Stream Processing EnginesA high-performance data processing framework that executes complex transformations on both batch and real-time streaming data sources with consistent logic.
  • Unified Batch-Stream Processing EnginesExecutes identical logic for both static datasets and continuous event streams by treating batch data as a finite stream.
  • Unified Batch and Stream ProcessorsDeveloping data applications that handle both static historical records and live incoming events using a single, consistent programming model.
  • Stream Processing Engines[](#event-processing-and-real-time-analytics-pipelines)
  • Enterprise RAG FrameworksConstructing robust retrieval-augmented generation systems that process, chunk, and rerank documents to provide accurate answers from private data stores.
  • Vector-Aware Data IngestionIntegrates embedding generation and document chunking directly into the pipeline to maintain synchronized searchable indexes for language models.
  • Language Model ConnectorsConnect data pipelines to external text generation and embedding services by applying model wrappers to specific columns containing prompts for automated content processing.
  • Vector Data Ingestion FrameworksA specialized toolset for automating the creation and real-time updating of searchable document indexes within large-scale data processing pipelines.
  • Streaming Data PipelinesConvert static batch processing pipelines into continuous streaming workflows by updating input connectors while preserving the underlying logic used to transform your data.
  • Vector Search IndexesAutomating the creation and maintenance of searchable document indexes that update instantly as new data arrives from external sources.
  • Feature Flagging Systems[](#features)
  • Incremental State ManagementMaintains intermediate computation results in memory to avoid recomputing entire pipelines when only a small portion of data changes.
  • Real-Time AI PipelinesBuilding automated workflows that connect live data streams to language models for instant, context-aware content generation and analysis.
  • Distributed Data PlatformsA deployment-ready infrastructure for scaling containerized data services across local and cloud environments with reliable performance and network connectivity.
  • Reranking EnginesImprove search accuracy by evaluating the relevance of retrieved documents against user queries using reranking models to filter and select the most significant information.
  • AI Pipelines[](#ai-pipelines)
  • Data Application DeploymentDeploy data processing services into local or cloud environments by using containerized images or standard execution methods that ensure your software scales reliably across different infrastructure setups.
  • Vector Document IndexingAutomate the creation of searchable document indexes that update in real-time by integrating external vector database clients directly into your data ingestion workflows.
  • Document Chunking UtilitiesConvert raw files into structured text and divide large documents into smaller, manageable segments using specialized parsers and token-based splitters for improved model performance.
  • Web Service DeploymentsDeploy containerized web applications to cloud hosting environments by linking your source code repository and defining the necessary network port configurations for public access.