73 dépôts
Configuration utilities for establishing connections to external databases and APIs.
Distinguishing note: Focuses on the connectivity configuration rather than the data itself.
Explore 73 awesome GitHub repositories matching data & databases · Data Source Connectivity Tools. Refine with filters or upvote what's useful.
This project is a structured educational resource and technical guide for designing and implementing autonomous systems using large language models. It provides a comprehensive curriculum and code samples focused on agentic design patterns, autonomous development, and the creation of systems capable of planning and executing multi-step tasks. The resource details the implementation of agentic retrieval-augmented generation, where models autonomously plan and refine data searches. It covers a wide array of orchestrators and design patterns, including metacognitive reflection for self-correctin
Implements methods for connecting language models to diverse data sources like vector databases and SQL stores for dynamic context.
Minds Platform is an automation system and application platform designed for building and deploying custom AI tools and workflows. It functions as a machine learning integration layer and self-hosted orchestrator that connects predictive models and large language models to external data sources. The platform enables the execution of multi-step tasks that read and write data to automate reports and operational activities. It supports deployment across cloud, on-premises, and virtual private cloud environments to maintain control over models and data. Capabilities include event-driven workflow
Standardises connections to diverse data sources using a common interface for AI model read and write operations.
ToolJet is a low-code development platform designed for building and deploying internal business applications. It provides a visual interface where users can drag and drop components to design layouts, connect to various data sources, and execute custom logic. The platform is built on a containerized architecture, ensuring that applications remain portable and consistent across different cloud and server environments. The platform distinguishes itself through integrated artificial intelligence capabilities that assist in the generation of user interfaces, database schemas, and data queries fr
Configures connection details for external databases, APIs, and cloud storage services to enable cross-environment data querying.
graphql-engine is an automated GraphQL API engine that transforms database tables and relationships into a queryable GraphQL schema. It functions as a federation gateway and mapper, instantly generating APIs with built-in filtering, pagination, and mutations from existing databases and remote schemas. The project distinguishes itself through a fine-grained access control layer that enforces row-level and field-level permissions. It further provides a real-time data subscription server that converts standard queries into live streams and a system for triggering event-driven webhooks and notifi
Links a unified API layer to multiple external data stores through a system of specialized data connectors.
Danswer is an LLM application framework and RAG engine that provides a self-hosted interface for connecting large language models to private data. It serves as an enterprise AI chat interface and agent orchestrator, enabling the creation of specialized assistants with custom instructions and knowledge bases. The platform differentiates itself through an observability dashboard for tracking query history and token consumption, as well as a white-labeled interface for customized branding. It includes a multi-step research workflow for producing long-form reports and a sandboxed environment for
Includes utilities for establishing connections to external databases and APIs to index private information.
Redash is a self-hosted analytics platform and SQL data visualization tool. It provides a web-based SQL query editor for writing, executing, and scheduling database queries, and functions as a business intelligence dashboard for monitoring metrics via visual widgets. The platform distinguishes itself through its data source connectors, which integrate with various SQL, NoSQL, and API-based stores to retrieve information for analysis. It enables self-service analytics by allowing users to run queries with dynamic parameters and supports shared data reporting via public links or embedded dashbo
Enables the retrieval and aggregation of information across various data store types for comprehensive analysis.
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Provides configuration utilities for establishing secure connections to external data sources.
Cube is a semantic layer data platform that maps raw SQL databases to standardized business metrics and dimensions. It functions as a SQL dialect translator, converting abstract semantic queries into optimized SQL statements for various cloud data warehouses. The platform operates as a multi-tenant data gateway, isolating information and security permissions for different customers within a single deployment. It includes a relational caching engine that stores pre-aggregated query results to reduce latency and decrease the load on primary data warehouses. The system provides a REST-based int
Enables connecting modeled semantic data to external business intelligence software for visualization in dashboards.
Qwen-code is an AI-powered development framework designed for orchestrating intelligent coding agents within terminal and IDE environments. It provides a comprehensive infrastructure for automating software maintenance, code generation, and complex refactoring tasks by managing multi-agent workflows and persistent session states. The system is built to handle both interactive development and automated background processes, ensuring that agents can execute shell commands and file operations safely within isolated, sandboxed environments. What distinguishes this project is its focus on granular
Connects to remote services, databases, and APIs to retrieve data within the development environment.
DB-GPT is an AI-driven database management system that uses agentic reasoning to execute data tasks. It converts natural language prompts into executable database queries and combines structured database records with unstructured knowledge bases to provide grounded analysis. The system orchestrates multi-step reasoning chains that integrate database queries, custom scripts, and external tool calls. It allows for the packaging of domain knowledge into reusable analysis skills and executes generated code within sandboxed environments for system safety. The platform covers data orchestration ac
Orchestrates a unified analysis view by integrating structured SQL databases, flat files, and external APIs.
DB-GPT is an agentic data analysis platform and business intelligence AI that functions as a large language model data assistant. It provides a text-to-SQL interface and a sandboxed code execution environment to translate natural language into executable database queries and Python scripts. The platform utilizes iterative agentic reasoning to plan and execute multi-step data analysis workflows through tool calls. It features a modular skill-based extension system that allows domain knowledge and analysis workflows to be packaged into reusable functional components. The system integrates data
Integrates relational databases, spreadsheets, and unstructured documents into a unified interface for cross-origin analysis.
TradingAgents-CN is a multi-agent framework designed for autonomous financial market analysis and automated trading execution. It functions as a containerized orchestrator that leverages large language models to perform complex reasoning, research, and decision-making tasks within financial environments. The platform distinguishes itself through a modular architecture that integrates diverse artificial intelligence providers and financial data sources into a unified pipeline. It provides granular control over agent behavior through prompt-driven logic configuration and multi-model orchestrati
Provides configuration utilities for establishing connections to external market data providers and APIs.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Provides standardized interfaces and drivers for executing distributed SQL queries across heterogeneous data environments.
This project is a comprehensive framework for engineering financial data pipelines, designed to automate the collection, cleaning, and synchronization of large-scale market datasets. It functions as a quantitative trading data engine, providing the infrastructure necessary to manage historical and real-time asset pricing information for research and machine learning workflows. The system distinguishes itself through a configuration-driven approach to orchestration, allowing users to manage complex data acquisition tasks across multiple financial providers. It features resilient middleware tha
Manages connections to financial data sources by defining authentication, rate limits, and caching policies.
typeahead.js is a JavaScript autocomplete library used to build searchable input fields that provide real-time suggestions from local or remote data sources. It functions as a client-side suggestion engine and an asynchronous search interface, providing a customizable UI toolkit for managing search suggestion menus. The library focuses on data aggregation and performance, allowing the combination of multiple local or remote datasets into a single interface grouped by category. It utilizes rate-limited asynchronous fetching to prevent API overloading and employs search data prefetching and loc
Combines results from multiple local and remote datasets into a single interface grouped by category.
Amass is a network attack surface mapper and reconnaissance framework designed to discover and map the external, internet-facing infrastructure of a target organization. It functions as an open source intelligence tool that identifies public network boundaries and locates hidden or forgotten subdomains to define an organization's total reachable footprint. The project utilizes passive-source data aggregation from external APIs and public databases alongside active DNS brute-forcing and recursive subdomain expansion. It employs a graph-based asset mapping system to visualize the relationships
Aggregates asset information from multiple external APIs and public databases to identify subdomains.
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Deletes configured data source connections to stop further ingestion.
This project provides a TypeScript software development kit for the Model Context Protocol, a standard designed to facilitate bidirectional communication between AI applications and external data sources or tools. It serves as a foundational framework for building both clients and servers, enabling language models to interact with external systems through a unified, decoupled interface. The SDK distinguishes itself by implementing a transport-agnostic connection layer that supports both local standard input-output streams and remote HTTP endpoints. It utilizes a JSON-RPC message bus to manage
Establishes client-server connections to integrate external tools and data into language model workflows.
Sublist3r is a subdomain enumeration tool and passive reconnaissance framework designed to discover subdomains by querying search engines and public intelligence sources. It functions as a security tool for identifying the digital footprint of a target domain. The project provides both passive enumeration through multi-source API aggregation and active discovery via a DNS brute force tool. It includes a TCP port scanner to identify active services and open ports on discovered subdomains, facilitating attack surface mapping. The tool can be used as a standalone utility or as a Python security
Collects subdomain data by aggregating results from multiple search engines and public intelligence sources.
BrasilAPI is a REST API gateway that aggregates and exposes official Brazilian public data from fragmented government sources. It functions as a multi-provider data aggregator that normalizes heterogeneous information into a standardized JSON schema for consistent delivery. The system utilizes a multi-provider fallback pipeline to ensure reliable data resolution, querying several external APIs in sequence if a primary provider fails. It also incorporates a caching proxy gateway to reduce latency and avoid redundant requests for frequently accessed public data. The platform covers a broad ran
Retrieves and aggregates information from diverse external government APIs to provide a unified data view.