# Data Integration and ETL Platforms

> Search results for `move data between any source and destination with connectors` on awesome-repositories.com. 115 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/move-data-between-any-source-and-destination-with-connectors

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/move-data-between-any-source-and-destination-with-connectors).**

## Results

- [cube-js/cube](https://awesome-repositories.com/repository/cube-js-cube.md) (20,251 ⭐) — Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools.

The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orchestrates these interactions by mapping questions to the underlying semantic model, ensuring that AI-generated insights remain accurate and context-aware. Furthermore, Cube is designed for multi-tenant environments, offering robust infrastructure isolation, row-level security, and dynamic context injection to ensure that data access is strictly governed and personalized for every user or tenant.

Beyond its core modeling and AI features, the platform includes a comprehensive suite of tools for performance optimization, including automated pre-aggregation caching and asynchronous query queuing. It supports a wide range of data sources and deployment models, from self-hosted containers to managed cloud environments. The system also provides extensive programmatic control over report management, dashboard publishing, and user identity synchronization, making it suitable for embedding interactive analytics directly into custom software applications.
- [any-listen/any-listen](https://awesome-repositories.com/repository/any-listen-any-listen.md) (2,578 ⭐) — Any Listen is a TypeScript-based open-source tool designed to capture and process audio from any source on a system. It provides a unified interface for listening to and recording audio output, enabling users to monitor or save sound from applications, browsers, or system-wide audio streams. The project focuses on making audio interception accessible and straightforward for developers and power users.

The tool distinguishes itself by offering a cross-platform approach to audio capture, supporting multiple operating systems without requiring complex configuration. It handles various audio backends and formats, allowing users to select the most appropriate method for their environment. The project emphasizes simplicity in setup and usage, with clear documentation covering installation and basic operations.

The repository includes comprehensive documentation that guides users through installation, configuration, and common use cases. This documentation covers the available options for audio sources, output formats, and integration possibilities, making it easier for users to adapt the tool to their specific needs.
- [sfu-db/connector-x](https://awesome-repositories.com/repository/sfu-db-connector-x.md) (2,561 ⭐) — Connector-X is a high-performance SQL data extraction library and bridge for transferring relational database records into memory-efficient data structures. It functions as a parallel database connector and federated query engine capable of executing and joining queries across multiple remote database connections to aggregate data locally.

The project distinguishes itself through a zero-copy approach to data loading, which transfers SQL query results into memory structures without duplicating data. It maximizes throughput by partitioning SQL queries into threads, employing parallel columnar and numerical data downloading to increase ingestion speed.

The system covers broad capabilities in data integration and extensibility, including a pluggable connector interface for custom data source and destination definitions. It provides a type-mapping translation layer to convert source-specific database types into compatible destination formats, specifically supporting high-efficiency ingestion from SQL into Python data frames.
- [datahub-project/datahub](https://awesome-repositories.com/repository/datahub-project-datahub.md) (12,141 ⭐) — DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations.

The platform distinguishes itself through its focus on grounding artificial intelligence and autonomous agents in verified enterprise context. It provides specialized capabilities to inject provenance-aware lineage, business definitions, and quality signals into AI prompts, ensuring that generated insights are accurate and trustworthy. Through a policy-as-code governance engine, it enforces access controls and compliance rules directly within the metadata graph, allowing for programmatic oversight of data assets across hybrid environments.

Beyond its core identity, the project offers a comprehensive suite of tools for data discovery, observability, and lifecycle management. It includes features for automated lineage extraction, impact analysis, and semantic search, enabling users to navigate data dependencies and resolve quality issues efficiently. The platform also supports collaborative workflows, allowing teams to manage business glossaries, certify data assets, and automate access requests through integrated communication channels.

DataHub is built to scale, utilizing a distributed architecture that allows storage, search, and graph processing layers to operate independently. It provides standardized interfaces and a bridge-based connector framework to facilitate integration with heterogeneous data sources and external AI agent frameworks.
- [metabase/metabase](https://awesome-repositories.com/repository/metabase-metabase.md) (47,696 ⭐) — Metabase is a business intelligence platform designed to connect to various storage systems and relational databases for data exploration, visualization, and reporting. It provides a centralized environment where users can build queries through a graphical interface or raw code, transforming raw information into interactive dashboards and charts. The platform is built to support self-service analytics, allowing non-technical team members to extract insights without requiring deep knowledge of database syntax.

The platform distinguishes itself through a metadata-driven modeling layer that abstracts complex database schemas into user-friendly business entities. It includes an automated workflow engine that enables users to trigger external processes and update records directly from the interface, bridging the gap between data analysis and operational action. For organizations requiring external distribution, the software provides an embedded analytics solution that allows secure integration of dashboards into third-party websites and applications, supported by sandboxing to isolate visual components.

Beyond core visualization, the system incorporates artificial intelligence to assist with query generation and data summarization through natural language interactions. It maintains strict data governance through granular role-based access control, ensuring that permissions are managed consistently across all connected information assets. The platform handles the full lifecycle of data retrieval, including orchestration, caching, and translation of high-level inputs into database-specific syntax.
- [sinaptik-ai/pandas-ai](https://awesome-repositories.com/repository/sinaptik-ai-pandas-ai.md) (23,197 ⭐) — This project is a Python-based framework that functions as a generative AI agent for programmatic data analysis. It enables users to interact with structured data sources through natural language prompts, translating these requests into executable code to perform analysis, data cleaning, and visualization. By maintaining conversational context across multi-turn interactions, the system allows for iterative exploration and the building of complex data narratives.

The framework distinguishes itself through a robust semantic layer and secure execution model. It maps raw datasets to descriptive metadata and relationships, which improves the accuracy of natural language interpretation. To ensure secure operation, all generated data processing code is executed within isolated, sandboxed environments. Users can further refine the system's behavior by registering custom skills, defining semantic schemas, and integrating external vector databases to provide domain-specific context and few-shot learning capabilities.

The platform supports a comprehensive suite of data operations, including cross-source integration, automated transformation, and feature engineering. It provides a unified interface for connecting to various language model providers and data sources, such as local files and relational databases. Users can audit the underlying code logic generated by the system, configure deterministic outputs for reproducibility, and export visualizations directly to local storage.
- [move-language/move](https://awesome-repositories.com/repository/move-language-move.md) (0 ⭐) — This was the home of the Move language from inception to ~2022. This repository is no longer maintained, but development continues in https://github.com/move-language/move-on-aptos and https://github.com/move-language/move-sui.
- [amruthpillai/reactive-resume](https://awesome-repositories.com/repository/amruthpillai-reactive-resume.md) (38,613 ⭐) — This project is a web-based platform designed for creating, managing, and sharing professional resumes. It functions as a structured document builder that integrates artificial intelligence to assist with content generation, editing, and analysis. Users can maintain a collection of resumes, customize their visual presentation through various templates, and export them into multiple formats for job applications.

The platform distinguishes itself through its autonomous AI agent capabilities, which can perform research, suggest incremental edits, and apply data patches directly to documents. It also provides a secure, self-hostable environment that allows users to maintain full control over their data and infrastructure. The system supports advanced authentication methods, including passkeys and federated identity providers, ensuring that personal and professional information remains protected.

Beyond core editing, the application includes tools for document organization, such as tagging, filtering, and legacy data migration. It features a robust document generation engine that separates content from design, allowing for precise layout control and styling. Users can share their resumes via password-protected public URLs and monitor document performance through integrated analytics.

The application is designed for containerized deployment, utilizing Docker Compose to facilitate consistent installation across private infrastructure. It includes built-in health monitoring and feature flagging to manage system performance and functionality without requiring code redeployments.
- [microsoft/data-formulator](https://awesome-repositories.com/repository/microsoft-data-formulator.md) (14,907 ⭐) — Data Formulator is an automated data analysis and visualization platform that uses large language models to interpret natural language instructions for data preparation and reporting. It functions as an interactive workbench where users can clean, filter, and aggregate datasets while simultaneously generating visual representations. By combining conversational interfaces with automated transformation tools, the system enables users to explore data patterns and refine schemas without manual coding.

The platform distinguishes itself through an agentic architecture that translates natural language queries into executable data transformation scripts. It maintains a reactive pipeline that links data cleaning operations directly to visualization rendering, ensuring that every modification to the underlying structure triggers an immediate visual update. The system also supports structured data extraction, utilizing specialized parsing models to convert unstructured inputs like images, text, and web content into normalized tabular formats.

Beyond its core analysis capabilities, the platform provides a sandboxed environment for secure code execution and supports stateful session serialization to persist interaction history. Users can connect to various data sources, including local files and cloud storage, to ingest information for iterative exploration. The project is distributed as a TypeScript-based tool, offering both a conversational interface and command-line automation for managing analysis workflows.
- [airbytehq/airbyte](https://awesome-repositories.com/repository/airbytehq-airbyte.md) (21,472 ⭐) — Airbyte is a data integration platform designed to synchronize information between diverse applications, databases, and data warehouses. It functions as an extract, transform, and load orchestrator that manages automated data movement workflows across cloud, on-premise, and hybrid environments. The platform provides a standardized interface for connectors, enabling the movement of structured and unstructured data while maintaining stateful checkpoints for reliable incremental syncing.

The platform distinguishes itself through a containerized architecture that isolates connectors to prevent dependency conflicts and a log-based change capture system that monitors source databases for real-time modifications. It includes a dedicated connectivity layer that exposes enterprise data and system actions to artificial intelligence agents, allowing for context-aware operations and automated decision-making. Users can manage schema evolution automatically and extend the platform's capabilities by developing custom integration modules using provided software development kits.

Beyond core synchronization, the system supports enterprise-grade data governance, including role-based access control, audit logging, and centralized authentication management. It offers comprehensive observability tools to track sync performance and latency, alongside infrastructure-as-code support for automating pipeline deployments. The platform is built to scale compute resources dynamically, accommodating both high-frequency incremental updates and large-scale historical data backfills.
- [modelcontextprotocol/modelcontextprotocol](https://awesome-repositories.com/repository/modelcontextprotocol-modelcontextprotocol.md) (8,458 ⭐) — Model Context Protocol is a standardized framework for connecting large language models to external data sources and executable tools. It enables the creation of a universal interface where servers expose tools, resources, and prompts that can be discovered and utilized by various AI clients.

The protocol utilizes a JSON-RPC message system that is transport-agnostic, supporting both standard input/output for local processes and HTTP with server-sent events for remote connections. It emphasizes security and control by delegating model sampling to the client to keep API keys secure from servers and requiring explicit user approval for tool execution on local systems.

The system covers broad capabilities including agentic workflow orchestration, URI-based resource mapping for filesystem and database access, and the delivery of interactive HTML-based user interfaces. It also includes comprehensive support for asynchronous task management, enterprise identity integration via OAuth and SSO, and a registry system for server discovery and versioning.

The project provides client and server SDKs, alongside automated scaffolding tools for generating project structures and server boilerplate.
- [raamcosta/compose-destinations](https://awesome-repositories.com/repository/raamcosta-compose-destinations.md) (3,400 ⭐) — Annotation processing library for type-safe Jetpack Compose navigation with no boilerplate.
- [fosrl/pangolin](https://awesome-repositories.com/repository/fosrl-pangolin.md) (21,255 ⭐) — Pangolin is a zero-trust remote access platform designed to provide secure, identity-aware connectivity to private network resources. It functions as a cloud-native network controller that orchestrates encrypted tunnels, traffic routing, and access policies across distributed environments. By leveraging WireGuard for secure data transport, the platform enables authenticated access to internal web applications, terminal sessions, and remote desktops without exposing services to the public internet.

The platform distinguishes itself through a declarative infrastructure model that synchronizes network state using version-controlled manifests. It supports complex connectivity requirements through peer-to-peer NAT traversal, which facilitates direct encrypted connections between nodes, with automatic fallback to server-based relaying when necessary. Additionally, it provides browser-based access to remote resources, eliminating the need for local client software for many common administrative and service-access tasks.

Beyond its core tunneling capabilities, the platform includes a comprehensive suite of tools for traffic management, security, and observability. It features granular access control policies based on user identity, geolocation, and network attributes, alongside automated certificate management and multi-factor authentication. The system also provides extensive monitoring, audit logging, and alerting capabilities to track infrastructure health and security events across multi-site deployments.

Pangolin is designed for containerized and multi-site environments, offering flexible deployment options through standard packaging and automated reconciliation workflows.
- [nocodb/nocodb](https://awesome-repositories.com/repository/nocodb-nocodb.md) (63,466 ⭐) — NocoDB is a visual platform that transforms relational databases into collaborative, spreadsheet-style workspaces. By acting as a headless database backend, it provides a unified environment for designing database structures, managing record relationships, and interacting with data without requiring manual SQL queries. The platform normalizes interactions across various SQL and NoSQL data sources, allowing users to manage complex datasets through a centralized interface.

The project distinguishes itself by automatically generating RESTful and GraphQL APIs from existing database schemas, enabling external applications to interact with data programmatically. It features a robust event-driven engine that monitors database state changes to trigger webhooks and execute custom logic within a sandboxed automation runtime. This allows for the creation of complex business workflows that synchronize information across third-party services based on real-time data updates.

Beyond its core management capabilities, the platform offers a flexible view abstraction layer that renders data in multiple formats, including grids, kanban boards, galleries, forms, and calendars. It supports team collaboration through shared workspaces and provides tools for data visualization, schema design, and automated record manipulation.

Comprehensive documentation is available to guide users through the API reference, script creation, and integration workflows.
- [plotly/plotly.py](https://awesome-repositories.com/repository/plotly-plotly-py.md) (18,270 ⭐) — Plotly.py is a comprehensive framework for building production-ready data applications and interactive dashboards directly from Python code. It functions as both a high-performance visualization library for browser-based charts and a full-stack tool for transforming analytical scripts into responsive, web-based interfaces. By abstracting away the need for manual HTML or JavaScript, it allows developers to define complex layouts and functional logic using modular, reusable components.

The framework distinguishes itself through a robust architecture that handles event orchestration and state synchronization automatically. It utilizes a centralized dependency graph to trigger backend functions in response to user inputs, while maintaining persistent session states to ensure data consistency. Its visualization engine leverages hardware-accelerated primitives to render massive, multi-dimensional datasets, supporting specialized requirements such as 3D scientific modeling and real-time data streaming.

Beyond core visualization, the platform provides extensive capabilities for enterprise-grade application development. This includes integrated security protocols for user access management, tools for background task execution to maintain responsiveness during heavy computations, and automated deployment pipelines for hosting applications in scalable environments. It also supports complex data operations, such as filtering and pivoting, within high-performance grid components, and offers utilities for debugging, testing, and generating annotated analytical reports.
- [duplicati/duplicati](https://awesome-repositories.com/repository/duplicati-duplicati.md) (14,283 ⭐) — Duplicati is a self-hosted backup server designed to perform encrypted, incremental, and compressed backups to a wide range of local, network, and cloud-based storage providers. It functions as a background service that automates recurring data protection tasks, ensuring that only changed data blocks are stored to maximize efficiency and minimize bandwidth usage.

The project distinguishes itself through a centralized management console that allows for the orchestration of multiple distributed backup agents from a single web-based dashboard. It supports multi-tenant management, enabling the organization of users and resources into hierarchical structures for delegated access and data isolation. Furthermore, it provides robust security features, including AES-256 encryption for data at rest, support for OIDC and SAML2 authentication, and provider-level immutability protections to prevent unauthorized modification of backup archives.

Beyond its core backup capabilities, the system includes comprehensive tools for data lifecycle management, such as automated retention policies, versioning, and integrity verification. It offers flexible configuration through both a graphical interface and a command-line utility, supporting automation scripting and dry-run simulations to verify workflows before execution. The software also handles complex environments by managing locked files and providing metadata indexing to ensure rapid restoration even if the primary configuration database is unavailable.

Duplicati is available through various installation formats, including native system packages, portable archives, and containerized deployments, allowing it to run in diverse operating environments.
- [hinell/move.nvim](https://awesome-repositories.com/repository/hinell-move-nvim.md) (14 ⭐) — Gain the power to move lines and blocks and auto-indent them! Updated fork of  fedepujol/move.nvim
- [benyamindsmith/ig.degree.betweenness](https://awesome-repositories.com/repository/benyamindsmith-ig-degree-betweenness.md) (40 ⭐) — Implementation of the "Node Degree+Edge" Betweenness Community Detection Algorithm for 'igraph' Objects with R
- [chartbrew/chartbrew](https://awesome-repositories.com/repository/chartbrew-chartbrew.md) (3,641 ⭐) — Chartbrew is a self-hosted business intelligence platform and data visualization engine designed to transform raw data from SQL databases and external API endpoints into interactive charts and dashboards. It serves as a tool for building analytics dashboards that monitor business metrics and KPIs through a privately hosted environment.

The platform distinguishes itself with an embedded analytics workflow, allowing users to generate secure, time-limited shared links and iframes to display private charts on external websites. It also provides programmatic chart generation via API and integrates with services such as Google Analytics and OpenAI.

The system covers a broad range of capabilities, including multi-tenant resource isolation, automated dataset refreshing via job queues, and result caching. It includes security features such as symmetric data encryption, token-based authentication, and role-based access control for team management. Additionally, the platform supports automated data monitoring with webhook alerts based on chart thresholds.

The application is packaged for consistent deployment using Docker containerization and supports one-click installation via cloud marketplace images.
- [langchain-ai/langchainjs](https://awesome-repositories.com/repository/langchain-ai-langchainjs.md) (17,818 ⭐) — LangChain.js is a framework for building, executing, and monitoring stateful agentic applications. It provides an orchestration engine that models workflows as directed graphs, allowing developers to connect language models, data sources, and external tools into modular, multi-step processes.

The platform distinguishes itself through its focus on stateful execution and human-in-the-loop control. It manages agent lifecycles by persisting execution state across threads, enabling fault tolerance and the ability to pause workflows at designated breakpoints for manual review or modification. This architecture supports both autonomous agent orchestration and complex multi-agent systems, with built-in capabilities for streaming real-time execution updates and managing long-term memory.

Beyond core orchestration, the project offers a comprehensive suite of tools for the entire application lifecycle. This includes integrated observability for tracing and evaluating agent performance, schema-enforced data serialization for reliable communication, and extensive support for deployment, security, and infrastructure management.

The project provides a TypeScript-based software development kit and a command-line interface to facilitate local development, testing, and deployment of agentic workflows.
- [e2b-dev/awesome-ai-agents](https://awesome-repositories.com/repository/e2b-dev-awesome-ai-agents.md) (25,903 ⭐) — This project is a curated repository and directory focused on the artificial intelligence agent ecosystem. It serves as a centralized knowledge base for developers and researchers to discover frameworks, platforms, and autonomous software entities designed for reasoning, planning, and executing complex tasks.

The directory distinguishes itself through a community-driven curation model, where contributors maintain and update the collection via a distributed version control system. This collaborative approach ensures that the index remains current with the latest academic resources, open-source projects, and commercial tools, all organized through a structured categorical taxonomy.

The collection covers a broad range of technical domains, including multi-agent system orchestration, autonomous workflow automation, and general agent development. By aggregating these high-quality references, the repository facilitates the evaluation of technologies for building self-directed digital workers and complex autonomous systems.

The information is structured using lightweight markup files and rendered as a static site to provide a consistent and accessible interface for global users.
- [kiyoon/repeatable-move.nvim](https://awesome-repositories.com/repository/kiyoon-repeatable-move-nvim.md) (25 ⭐) — Make move commands repeatable with `;` and `,`
- [dataease/dataease](https://awesome-repositories.com/repository/dataease-dataease.md) (23,420 ⭐) — DataEase is an open-source, self-hosted business intelligence platform designed for building interactive data visualizations and managing analytical reporting. It provides a centralized environment where users can construct dashboards through a drag-and-drop interface, connecting to diverse data sources including relational databases, data warehouses, and external APIs.

The platform distinguishes itself through its focus on embedded analytics and enterprise-grade governance. It allows for the seamless integration of charts, dashboards, and management modules into third-party web applications using secure iframe containers and token-based authentication. To support complex organizational needs, it includes granular role-based access control, row-level data filtering, and hierarchical organization management, ensuring that data remains secure and isolated across different departments.

Beyond core visualization, the system offers extensive automation and connectivity features. It supports automated report scheduling and distribution, cross-source data modeling, and a plugin-based architecture that allows for the addition of custom data sources and visualization types. The platform also includes robust monitoring tools, such as threshold-based alerting and execution logging, to maintain operational visibility over automated tasks.

The system is built to be highly configurable, offering options for platform branding, global variable definitions, and comprehensive identity management through integrations with external authentication providers.
- [livekit/livekit](https://awesome-repositories.com/repository/livekit-livekit.md) (19,358 ⭐) — LiveKit is a comprehensive framework for building and orchestrating real-time, multimodal AI agents that interact with users through voice, video, and text. It provides a centralized, event-driven architecture to manage the entire lifecycle of automated participants, from initialization and session state management to graceful shutdown. By utilizing a selective forwarding unit, the platform efficiently routes media streams between participants and agents, ensuring low-latency communication and secure, token-based authentication for all connections.

The platform distinguishes itself through its modular pipeline-based media processing, which chains specialized speech-to-text, language, and text-to-speech services into cohesive workflows. It includes advanced capabilities for real-time voice activity detection, enabling natural turn-taking and interruption handling, alongside remote procedure call tooling that allows agents to execute external functions or access local resources during a conversation. Developers can further extend these interactions by integrating photorealistic virtual avatars that synchronize visual expressions with the agent's audio output.

Beyond core conversational logic, the system offers extensive support for telephony integration, allowing agents to connect to public networks via SIP for inbound and outbound calling. It provides a robust suite of observability and monitoring tools to track agent performance, connection quality, and session events, ensuring reliability in production environments. The platform also includes specialized utilities for task automation, such as capturing and validating structured user data, and supports multi-step workflow orchestration to handle complex, context-aware interactions.

The project provides a command-line interface for scaffolding, deploying, and testing agent applications, with documentation available in machine-readable formats to assist in development.
- [erikrhanson/problem-solving-with-algorithms-and-data-structures-using-python](https://awesome-repositories.com/repository/erikrhanson-problem-solving-with-algorithms-and-data-structures-using-python.md) (0 ⭐) — Problem-Solving-with-Algorithms-and-Data-Structures-Using-Python
- [victoriametrics/victoriametrics](https://awesome-repositories.com/repository/victoriametrics-victoriametrics.md) (16,343 ⭐) — VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management.

The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-structured merge trees to optimize write throughput and disk space. It provides robust multi-tenant isolation, allowing organizations to segment data and alerting configurations by account or project while maintaining secure, partitioned access. By offloading long-term data to object storage while retaining local caching, it balances cost-effective persistence with high-performance query execution.

The system covers the entire observability lifecycle, including automated metric scraping, log aggregation, and distributed tracing. It features a sophisticated alerting and recording engine that supports dynamic rule evaluation and high-availability execution. Additionally, the project includes a Kubernetes operator that automates the deployment, configuration, and lifecycle management of monitoring components, ensuring consistent observability across containerized environments.

VictoriaMetrics is distributed as a set of container-native services and can be managed via declarative resource definitions within Kubernetes clusters.
- [anthropics/claude-code](https://awesome-repositories.com/repository/anthropics-claude-code.md) (132,728 ⭐) — Anthropic's terminal-native AI coding agent.
- [mysql/mysql-connector-nodejs](https://awesome-repositories.com/repository/mysql-mysql-connector-nodejs.md) (159 ⭐) — MySQL Connector Node.JS is a MySQL Connector using the X Protocol, which was introduced with MySQL 5.7.12.
- [clickhouse/clickhouse](https://awesome-repositories.com/repository/clickhouse-clickhouse.md) (48,229 ⭐) — ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring.

The platform distinguishes itself through advanced storage and execution techniques, including vectorized query processing and a merge tree storage engine that maintains performance during massive insertions. It features adaptive subcolumn mapping for semi-structured data and supports native vector search for machine learning and generative AI applications. To facilitate efficient data movement, the engine utilizes zero-copy shared memory buffers, minimizing overhead when interacting with external analytical tools or processing diverse file formats like Parquet, JSON, and Arrow.

Beyond its core storage and processing capabilities, the project provides a comprehensive suite of tools for observability, security, and data integration. It includes built-in support for natural language querying, automated workflow orchestration for AI agents, and extensive diagnostic features for query plan inspection. The platform also offers robust cloud infrastructure management, including support for private networking, compliant deployment strategies, and integrated billing consolidation.
- [unstructured-io/unstructured](https://awesome-repositories.com/repository/unstructured-io-unstructured.md) (14,019 ⭐) — Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows.

The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture that supports directed acyclic graph orchestration, allowing users to chain complex transformation pipelines while maintaining metadata, spatial context, and hierarchical relationships across extracted elements.

The system covers a broad capability surface, including extensive connectivity to cloud storage, databases, and collaboration platforms, alongside robust data export options for vector databases and search indices. It enforces enterprise security standards through isolated multi-tenant infrastructure, role-based access control, and private network connectivity, ensuring that sensitive data remains secure throughout the entire transformation lifecycle.

Operational visibility is maintained through integrated job monitoring, event-driven notification systems, and audit logging. The platform is designed for deployment within private cloud environments, supporting scalable, asynchronous processing of high-volume document batches.
- [portswigger/webinspect-connector](https://awesome-repositories.com/repository/portswigger-webinspect-connector.md) (0 ⭐) — Binary-only repository for the HP WebInspect Connector, authored by HP
- [aws/serverless-application-model](https://awesome-repositories.com/repository/aws-serverless-application-model.md) (9,560 ⭐) — This is an infrastructure as code tool and serverless deployment orchestrator that provides a shorthand syntax for defining serverless infrastructure. It functions as a framework for transforming concise resource declarations into full AWS CloudFormation templates to automate the provisioning of cloud functions, APIs, and databases.

The project distinguishes itself by using a macro-based transformation system to expand simplified resource types into detailed infrastructure components. It includes an automated permission mapping system that translates high-level resource interaction intents into scoped identity and access management policies.

The toolset covers local development and testing through containerized simulation and function invocation, as well as deployment automation including real-time cloud syncing and stack parameterization. It also provides operational capabilities for manual resource import and resource output exporting for cross-stack integration.
- [numfocus/getting-started-with-open-source](https://awesome-repositories.com/repository/numfocus-getting-started-with-open-source.md) (0 ⭐) — This repository contains documents and resources on getting started with Open Source projects.
- [getredash/redash](https://awesome-repositories.com/repository/getredash-redash.md) (28,653 ⭐) — Redash is a self-hosted analytics platform and SQL data visualization tool. It provides a web-based SQL query editor for writing, executing, and scheduling database queries, and functions as a business intelligence dashboard for monitoring metrics via visual widgets.

The platform distinguishes itself through its data source connectors, which integrate with various SQL, NoSQL, and API-based stores to retrieve information for analysis. It enables self-service analytics by allowing users to run queries with dynamic parameters and supports shared data reporting via public links or embedded dashboards.

The system covers a broad range of capabilities, including a data visualization engine for creating charts and maps, automated data alerting for monitoring query thresholds, and role-based access control for managing user permissions. It also includes utilities for database schema browsing and exporting query results.

Administration is supported through a command-line interface for system tasks and database schema initialization.
- [mysql/mysql-connector-cpp](https://awesome-repositories.com/repository/mysql-mysql-connector-cpp.md) (704 ⭐) — MySQL Connector/C++ is a MySQL database connector for C++. It lets you develop C++ and C applications that connect to MySQL Server.
- [heyputer/puter](https://awesome-repositories.com/repository/heyputer-puter.md) (42,318 ⭐) — Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure.

The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestration framework that coordinates state, user actions, and lifecycle events between isolated browser windows, allowing for complex, multi-component application workflows.

Beyond its core desktop and storage capabilities, the system includes a comprehensive suite of artificial intelligence tools, including conversational response generation, image and video creation, and speech synthesis. It also provides a serverless backend platform that executes event-driven functions and manages persistent key-value storage, all accessible through a consistent programmatic interface.

The project offers extensive documentation and examples covering AI integration, authentication, and object management to assist developers in building scalable applications.
- [strongloop/loopback](https://awesome-repositories.com/repository/strongloop-loopback.md) (13,159 ⭐) — LoopBack is a Node.js API framework used to build RESTful services and backend applications. It functions as a model-driven API generator that automatically maps predefined data models to network endpoints to create standardized web interfaces.

The project features a database abstraction layer that unifies access across diverse SQL databases, NoSQL stores, and remote data sources. It includes a backend application scaffolder using command-line generators to automate the creation of project structures and data connectors. Additionally, it provides an API authentication system to manage application identities and an access control system that restricts resource access via authentication and authorization lists.

The framework covers broader capabilities including the generation of native client SDKs for multiple platforms and the implementation of mobile backend infrastructure for push notifications, geolocation, and cloud file storage. It also supports the integration of third-party middleware for monitoring and instrumentation.
- [martinmoene/any-lite](https://awesome-repositories.com/repository/martinmoene-any-lite.md) (0 ⭐) — Contents - Example usage - In a nutshell - License - Dependencies - Installation - Synopsis - Features - Reported to work with - Building the tests - Other implementations of any - Notes and references - Appendix
- [great-expectations/great_expectations](https://awesome-repositories.com/repository/great-expectations-great-expectations.md) (11,558 ⭐) — Great Expectations is a data quality testing framework and observability platform designed to monitor the reliability of data pipelines. It provides a structured environment for defining, documenting, and automating data quality assertions, allowing teams to validate datasets against expected structure and content before they move through downstream processes.

The project distinguishes itself through a declarative domain-specific language that stores quality rules as version-controlled configuration files. It utilizes an execution engine abstraction to translate these high-level assertions into native queries for various data processing frameworks, while a rendering engine automatically transforms these rules and validation outcomes into human-readable documentation for stakeholders.

The platform supports a broad range of operational capabilities, including the ability to connect to diverse data sources and persist metadata and validation results across distributed environments. It integrates directly into existing orchestration pipelines to automate recurring quality checks, track data health trends over time, and trigger notifications when datasets deviate from established benchmarks.
- [illacloud/illa-builder](https://awesome-repositories.com/repository/illacloud-illa-builder.md) (12,268 ⭐) — Illa-builder is a low-code internal tool builder and API integration platform used to create business applications and admin panels. It functions as a database GUI dashboard and visual workflow automator, allowing users to connect to databases and external APIs to manage data and automate business processes.

The platform provides a self-hosted app framework that can be deployed on private infrastructure via Docker. It enables the creation of custom dashboards and CRMs while maintaining full control over data and hosting.

The system includes a visual drag-and-drop canvas for designing user interfaces with pre-built components. It covers data integration for SQL and NoSQL sources, real-time collaborative editing, and event-driven workflow automation triggered by schedules or webhooks.
- [calcom/cal.com](https://awesome-repositories.com/repository/calcom-cal-com.md) (45,760 ⭐) — Cal.com is a comprehensive scheduling infrastructure platform designed to manage availability, booking workflows, and calendar synchronization across multiple users and external services. It provides a backend service for automated appointment scheduling, enabling the creation, confirmation, and management of booking lifecycles through a centralized state machine. The platform also offers embeddable user interface components that allow developers to integrate interactive booking experiences directly into third-party websites.

What distinguishes the platform is its extensible app ecosystem and intelligent automation capabilities. Developers can build custom integrations using a modular plugin architecture, while an AI-driven interface allows for complex scheduling operations and configuration updates via natural language commands. The system includes a sophisticated event routing engine that automatically assigns meetings to hosts based on availability, round-robin rules, and organizational hierarchy, supported by real-time webhook orchestration to keep external systems synchronized.

The platform covers a broad capability surface including CRM data synchronization, granular role-based access control, and secure OAuth-based integration management. It supports advanced booking configurations, such as prefilling form data and monitoring state changes, alongside specialized tools for Salesforce connectivity, including assignment traceability and fuzzy account matching. Users can also leverage local or remote server hosting options to maintain control over their infrastructure and security configurations.
- [awslabs/amazon-kinesis-connectors](https://awesome-repositories.com/repository/awslabs-amazon-kinesis-connectors.md) (326 ⭐) — The Amazon Kinesis Connector Library helps Java developers integrate [Amazon Kinesis][aws-kinesis] with other AWS and non-AWS services. The current version of the library provides connectors for [Amazon DynamoDB][aws-dynamodb], [Amazon Redshift][aws-redshift], [Amazon S3][aws-s3],…
- [coollabsio/coolify](https://awesome-repositories.com/repository/coollabsio-coolify.md) (57,055 ⭐) — This project is a self-hosted platform-as-a-service that provides a centralized management interface for deploying, configuring, and monitoring containerized applications and databases on private infrastructure. It functions as a visual control plane, automating the end-to-end lifecycle of services from source code to production. By managing container orchestration, networking, and resource allocation, it allows users to maintain full control over their own hardware while streamlining the delivery of software.

The platform distinguishes itself through its agentless architecture, which uses secure shell connections to execute administrative tasks and manage remote servers without requiring persistent local software. It integrates directly with version control systems to trigger automated build and deployment pipelines, including the creation of temporary, isolated preview environments for every pull request. This workflow is supported by a declarative engine that uses templates to standardize the deployment of complex multi-container architectures and persistent database engines.

Beyond core orchestration, the system handles the operational requirements of hosted services by managing dynamic reverse-proxy routing and automated SSL certificate lifecycles. It provides a comprehensive suite of infrastructure management tools, including browser-based terminal access for debugging, automated system dependency installation, and persistent state management via a central database. These capabilities ensure that infrastructure remains synchronized and consistent across multiple remote environments.
- [sghall/react-move](https://awesome-repositories.com/repository/sghall-react-move.md) (6,564 ⭐) — React Move is a declarative animation library for React that animates components by interpolating between start and end states with configurable timing and easing. It provides data-driven transitions for single elements, groups, lists, and SVG elements, supporting staggered timing, custom interpolation for non-numeric values like colors and paths, and drag-and-drop reordering of list items.

The library distinguishes itself through its support for custom interpolation functions that replace default numeric interpolation, keyed array reconciliation for tracking items as they enter, update, or leave, and lifecycle callback hooks that fire at transition start, interruption, or completion. It also offers per-property timing configuration, nested state namespaces for organizing animated attributes, and staggered transition scheduling for choreographed visual effects across multiple elements.

Beyond its core animation capabilities, React Move handles SVG chart animation for elements like bars, lines, and axes, and supports collapsible tree animations with smooth expansion and collapse of nodes. The library provides control over transition timing with independent duration, delay, and easing values for each animated property or group of properties.
- [redpanda-data/redpanda](https://awesome-repositories.com/repository/redpanda-data-redpanda.md) (12,248 ⭐) — Redpanda is a distributed event streaming engine designed to serve as a high-performance, drop-in replacement for existing event-driven architectures. It provides a foundation for building and scaling applications that require reliable data movement, analytical querying, and strict operational compliance across both cloud and self-managed environments.

The platform distinguishes itself through a shared-nothing architecture that utilizes thread-per-core execution and a non-blocking asynchronous input/output engine to maximize throughput. It maintains data consistency through a consensus-based replication model and implements binary protocol compatibility, allowing existing ecosystem tools to interact with the system without modification. To optimize resource usage, the platform features a zero-copy data path and automated tiered storage that offloads historical log segments to object storage while maintaining a unified view for consumers.

Beyond core streaming, the platform includes integrated governance and orchestration capabilities for connecting autonomous agents to data flows. It provides granular identity management and execution controls to secure agent interactions, alongside auditing tools that record immutable logs of system actions. The infrastructure also supports real-time analytical querying across live and historical data streams to facilitate immediate operational insights.
- [keen/dashboards](https://awesome-repositories.com/repository/keen-dashboards.md) (11,038 ⭐) — This project is a collection of responsive CSS Grid dashboard templates and a data visualization UI kit. It provides a set of HTML layouts designed for building analytics interfaces and monitoring views for KPIs and business metrics that adapt to different screen sizes.

The toolkit is library-agnostic, allowing the connection of static HTML templates to any external data source or third-party charting library without requiring custom adapter code. It uses a template-driven approach to separate the visual structure of the dashboard from the underlying data.

The capabilities cover the assembly of responsive grid layouts and the embedding of charts into predefined cells. These frontend templates are provided as containerized images to ensure consistent hosting and delivery across different server environments.
- [amnn/move-mode](https://awesome-repositories.com/repository/amnn-move-mode.md) (0 ⭐) — move-mode is an Emacs major-mode for editing smart contracts written in the Move programming language. Supports Emacs 25.1 and above (tested on Emacs for Mac OS X 25.1-1, Emacs Mac Port 28.1).
- [duckdb/duckdb](https://awesome-repositories.com/repository/duckdb-duckdb.md) (38,805 ⭐) — DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation.

The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adaptive query optimization to dynamically select execution plans at runtime and utilizes zero-copy ingestion to map external data formats directly into memory. To facilitate integration with analytical programming environments, the system supports high-performance data exchange through standardized memory formats and provides specialized connectors for Python, R, and Java.

The project covers a broad capability surface, including advanced relational join operations, incremental result streaming for large datasets, and flexible data ingestion from various file formats. It supports complex data types and provides a comprehensive command-line interface for interactive session management and batch processing. The codebase is designed for portability, offering single-file amalgamation to simplify integration into external projects and build systems.
- [ijzerenhein/react-native-magic-move](https://awesome-repositories.com/repository/ijzerenhein-react-native-magic-move.md) (968 ⭐) — Create magical move transitions between scenes in react-native 🐰🎩✨
- [letta-ai/letta](https://awesome-repositories.com/repository/letta-ai-letta.md) (21,168 ⭐) — Letta is a framework for building, deploying, and managing autonomous AI agents that maintain persistent state across long-term interactions. It provides a comprehensive suite of primitives for defining agents with configurable personas, modular memory blocks, and tool-use capabilities, enabling them to retain user preferences and conversation history over extended sessions.

The platform distinguishes itself through its advanced memory management and orchestration capabilities. It allows agents to autonomously update their own memory, perform retrieval-augmented generation, and coordinate complex multi-agent workflows through hierarchical delegation. By supporting both local and remote execution environments, it enables developers to build stateful agents that can be managed programmatically via API or integrated into existing automation pipelines.

The system includes a robust set of administrative and security features, such as human-in-the-loop approval for tool execution, multi-tenant identity management, and automated performance evaluation suites. These tools allow for the creation of reproducible agent blueprints, version-controlled deployments, and detailed observability into agent reasoning and memory integrity.

The project is distributed as a Python-based framework, providing official SDKs and a command-line interface to facilitate integration into development workflows and production environments.
