43 repos

Awesome GitHub repositories, curated.

A community-curated directory of interesting public GitHub repositories. Ask in plain English — AI ranks by relevance. Save what you find.

We'll search the best matching repositories with AI.

opencv/opencv
86,238
OpenCV is a comprehensive computer vision library designed for real-time performance and cross-platform deployment. It provides a native execution environment that leverages multi-threaded operations and automated memory management to handle intensive computational tasks, including image processing and machine learning model inference. The library distinguishes itself through a data-oriented matrix framework that utilizes proxy-based array abstractions to provide a consistent interface for multidimensional data. By employing factory-pattern algorithm interfaces and runtime type dispatching, it ensures long-term API stability and enables cross-language bindings, allowing developers to integrate high-performance vision capabilities into diverse hardware and software environments. The project covers a broad range of functional requirements, including automated memory allocation, saturation-aware arithmetic for pixel-level operations, and standardized error handling. It maintains a clean integration surface through namespace-encapsulated structures and rigorous coding standards. Technical documentation is generated from standardized inline comments, and the codebase is supported by a comprehensive suite of unit tests to ensure reliability across versions.
c-plus-pluscomputer-visiondeep-learning
opencv/opencv
rasbt/LLMs-from-scratch
85,529
This repository serves as an educational framework for building large language models from the ground up. It provides a structured curriculum that guides learners through the end-to-end lifecycle of model development, including data processing, architecture design, and optimization. By focusing on low-level implementation, the project enables users to master the fundamental mechanics of artificial intelligence without relying on high-level abstraction frameworks. The project distinguishes itself by constructing neural network components and gradient-based optimization logic from first principles. It utilizes tensor-based computational modeling and stateless functional architectures to define network layers as pure mathematical transformations. This approach exposes the underlying mechanics of weight updates and loss minimization, allowing for a deeper conceptual mastery of modern machine learning architectures. The content is organized into a series of executable notebooks that facilitate incremental learning. Each chapter is encapsulated within an independent directory, providing a clear separation of concerns that simplifies dependency management. The repository supports various execution environments, including local Python, Docker containers, and cloud-based platforms, ensuring that the code remains accessible and functional on conventional hardware.
aiartificial-intelligencechatbot
rasbt/LLMs-from-scratch
firecrawl/firecrawl
84,034
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live web research, interact with pages, and execute multi-step navigation tasks. It supports distributed crawling infrastructure, enabling users to scale data collection across multiple nodes while managing concurrency and long-running jobs through asynchronous queueing. The system also integrates with agentic frameworks via standardized protocols, allowing for seamless connection to AI-powered clients and automated pipelines. Beyond its core extraction capabilities, the project provides a suite of developer tools for site mapping, batch scraping, and web searching. It includes features for stateful session persistence, webhook-based notifications, and configurable crawl depth, allowing for granular control over how information is retrieved and processed. The project offers comprehensive API documentation and SDKs to facilitate integration into backend services and local development environments. Users can deploy the crawling infrastructure within their own private networks or utilize managed cloud services.
aiai-agentsai-crawler
firecrawl/firecrawl
microsoft/ML-For-Beginners
83,800
This project is an open-source educational curriculum designed to provide a structured path for developers to master machine learning and generative AI. It functions as a technical skill development platform, offering comprehensive study materials that guide learners through fundamental concepts, algorithms, and the practical implementation of artificial intelligence models from scratch. The curriculum distinguishes itself through a pedagogy centered on interactive Jupyter Notebooks, which allow students to execute code cells directly within narrative documents for immediate visual feedback. To bridge the gap between theory and practice, the repository integrates cloud-based resource provisioning and containerized development environments, ensuring that learners can deploy infrastructure and maintain consistent dependency management across different machines. The content covers a broad spectrum of technical domains, including data science skill acquisition, cloud-native AI deployment, and the development of applications powered by large language models. The materials are organized into modular, independent units that support flexible, non-linear navigation through complex topics. The repository is authored using a markdown-centric structure to facilitate portability and collaboration. It serves as a central hub for a wider series of educational resources covering topics such as AI-assisted software development, agentic workflows, and modern orchestration frameworks.
data-scienceeducationmachine-learning
microsoft/ML-For-Beginners
punkpeye/awesome-mcp-servers
81,101
This project serves as a centralized directory and interoperability hub for the Model Context Protocol, providing a curated collection of standardized service connectors that bridge artificial intelligence models with external software, databases, and APIs. It facilitates the integration of AI agents with diverse ecosystems by offering a registry of machine-readable interface definitions that enable dynamic tool discovery and structured context injection. The directory distinguishes itself by focusing on the protocol-based interoperability required for autonomous AI agents to interact with heterogeneous remote services. It emphasizes a decoupled request-response pattern and a bidirectional capability handshake, ensuring that AI hosts and servers can negotiate operational constraints and supported features before any tool invocation occurs. This architecture supports stateless service implementations, allowing for independent scaling and deployment of tools across various environments. The collection covers a broad functional range, including integrations for business productivity, data science, infrastructure management, and developer utilities. These connectors enable AI agents to perform tasks such as secure database querying, code execution, desktop automation, and persistent memory management. The repository acts as a community-driven resource for developers seeking to extend the operational range of their AI agents through modular, plug-and-play service integrations.
aimcp
punkpeye/awesome-mcp-servers
hacksider/Deep-Live-Cam
79,568
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes compute tasks to available graphics hardware, such as CUDA or CoreML backends. This architecture supports complex operations like multi-face mapping, where distinct target faces are applied to multiple subjects simultaneously, and preserves original mouth movements to maintain natural speech synchronization. To ensure visual fidelity, the engine employs precision mask-based blending and generative detail restoration, effectively integrating source features into target video geometry. Beyond core transformation capabilities, the application includes tools for cinematic rendering, such as real-time color grading and frame interpolation. It manages system resources through chunked memory and frame-based stream processing, which prevents crashes during intensive workloads and maintains stable performance. The interface is designed for focused workflows, offering distraction-free modes and automated projection window management to streamline the user experience during live operations.
aiai-deep-fakeai-face
hacksider/Deep-Live-Cam
fighting41love/funNLP
78,999
This project is a community-driven knowledge base and curated repository focused on natural language processing and large language model development. It serves as a centralized index for high-quality tools, libraries, and research materials, organizing technical resources into structured, version-controlled documentation to assist developers in navigating the evolving artificial intelligence ecosystem. The repository distinguishes itself by acting as an aggregator for AI model evaluation and benchmarking. It provides access to tools that enable the simultaneous comparison of multiple conversational agents, alongside a collection of methodologies for optimizing large language models. By focusing on low-resource training and efficient inference techniques, the project helps users identify strategies for deploying massive models on constrained hardware. The collection relies on manual contributions and peer review to maintain its relevance, utilizing hyperlink-based referencing to connect users directly to external projects. This structure simplifies discovery across fragmented technical domains, offering a comprehensive directory for those engaged in building multi-model conversational interfaces and automated text processing workflows.
fighting41love/funNLP
browser-use/browser-use
78,576
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions. The project distinguishes itself through its ability to translate high-level intent into specific browser primitives, supported by a serialization process that converts complex web page structures into simplified text for model processing. It includes robust support for stateful session persistence, allowing agents to maintain authenticated environments across long-running tasks. Furthermore, the framework facilitates remote browser orchestration, enabling the scaling of automation routines in cloud environments with integrated support for stealth configurations and proxy management. Beyond its core agent capabilities, the platform provides extensive tooling for structured data extraction and workflow integration. It supports a variety of model configurations and allows for the definition of custom tools to extend interaction logic. The project documentation includes quickstart guides for command-line execution and examples for integrating browser automation into broader software ecosystems.
ai-agentsai-toolsbrowser-automation
browser-use/browser-use
hoppscotch/hoppscotch
77,888
Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a single environment. The platform distinguishes itself through a highly interactive, command-driven interface that utilizes a global spotlight palette and keyboard shortcuts to streamline complex workflows. It supports advanced request manipulation and validation by executing JavaScript-based scripts and assertions within a sandboxed runtime. Furthermore, it integrates AI-assisted tools to automate the generation of request payloads, test scripts, and documentation, while maintaining compatibility with existing API definitions and collections from other formats. Beyond core testing capabilities, the project offers a collaborative workspace for teams to organize, share, and synchronize API collections and environment variables. It includes robust support for diverse authorization methods, proxy interception for network requests, and enterprise-grade features such as SCIM user provisioning and activity auditing. The software is available for self-hosted deployment via containerized architectures, ensuring consistent behavior across various production and development environments.
apiapi-clientapi-rest
hoppscotch/hoppscotch
netdata/netdata
77,812
Netdata is a distributed observability platform designed for real-time infrastructure monitoring and performance tracking. It functions as a high-frequency agent that collects system, container, and application metrics with per-second precision, providing both local visualization and centralized aggregation across complex, multi-cloud environments. The platform distinguishes itself through edge-based intelligence, utilizing local machine learning models to automatically detect performance anomalies without requiring manual configuration or external query engines. Its architecture prioritizes local-first data persistence and secure metadata-only synchronization, ensuring that granular observability data remains on the host while essential system information is routed to a cloud-connected management plane. This hierarchical approach allows for horizontal scaling through parent-child node relationships, enabling unified monitoring and alerting across distributed infrastructure. Beyond core collection and analysis, the system supports automated troubleshooting through natural language querying and intelligent metric correlation. It features a modular data acquisition engine that employs thread-per-core execution for low-latency performance, alongside isolated external processes for heterogeneous application support. The platform includes automated service discovery, diverse deployment options, and built-in diagnostic utilities to maintain visibility and connectivity across large-scale clusters. Installation is supported through various methods including package managers, automated scripts, source compilation, and containerized orchestration.
aialertingcncf
netdata/netdata
tensorflow/models
77,684
This repository serves as a centralized collection of state-of-the-art deep learning architectures and reference implementations designed for research and application development. It provides a comprehensive toolkit for computer vision and natural language processing, offering pre-built models and training pipelines for tasks ranging from image classification and object detection to complex sequence modeling. The project distinguishes itself by providing a flexible execution harness that manages the entire training lifecycle, including data ingestion and backpropagation. It supports scalable training across distributed hardware environments through collective communication primitives and utilizes configuration-driven experimentation to decouple hyperparameters from source code. By structuring neural architectures through hierarchical class compositions and employing checkpoint-based state persistence, the repository ensures that research workflows remain modular, reproducible, and fault-tolerant. These implementations demonstrate industry-standard patterns for constructing and deploying neural networks, including optimized graph-based execution for hardware acceleration. The repository functions as a reference for best practices in deep learning, providing documented examples for vision, language, and training loop management.
tensorflow/models
nomic-ai/gpt4all
77,146
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vector spaces. This capability enables context-aware chat sessions where the model can reference private files, notes, and spreadsheets to provide grounded, relevant responses. The system also features a local HTTP server that exposes an OpenAI-compatible API, allowing developers to integrate these private, self-hosted models into existing applications and workflows. Beyond its core inference and retrieval capabilities, the project includes a graphical desktop interface for end-user interaction and a Python software development kit for programmatic access. These tools support advanced configuration of model parameters, performance monitoring, and the management of local embedding pipelines for custom semantic search tasks. The software is distributed as a unified application package, with documentation available to guide users through installation and local environment setup.
ai-chatllm-inference
nomic-ai/gpt4all
elastic/elasticsearch
76,163
Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism. The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insights, allowing users to perform complex statistical aggregations, geospatial analysis, and automated anomaly detection. Its storage architecture supports multi-tier data lifecycles, enabling efficient data placement across hot, warm, and cold nodes to balance performance with long-term retention requirements. Beyond core search and storage, the system provides comprehensive observability tools for centralized log analysis, application performance monitoring, and infrastructure health diagnostics. It includes built-in security operations for threat detection and endpoint protection, all managed through a unified RESTful API gateway. The system is accessible via standardized REST APIs for cluster management, data ingestion, and query execution. Extensive documentation is available to guide users through API references for search, indexing, security, and cluster administration.
elasticsearchjavasearch-engine
elastic/elasticsearch
d2l-ai/d2l-zh
75,708
This project is an open-source, interactive educational platform designed to teach deep learning through a comprehensive, code-first curriculum. It provides a structured learning path that covers foundational mathematics, modern neural network architectures, and practical optimization techniques, enabling practitioners to master complex artificial intelligence concepts through hands-on experimentation. The platform distinguishes itself by integrating technical explanations with executable Jupyter notebooks. This design allows readers to modify code and hyperparameters in real-time, facilitating immediate feedback and practical skill acquisition. The curriculum spans a wide range of domains, including computer vision and natural language processing, while providing the necessary infrastructure to run these interactive materials locally or via cloud-based environments. The project covers a broad capability surface, including end-to-end model training pipelines, advanced sequence modeling, and techniques for computational performance optimization. It addresses essential deep learning primitives such as automatic differentiation, layer construction, and parameter management, ensuring users gain both theoretical understanding and implementation proficiency. The documentation is structured as a live, interactive textbook, with comprehensive guides for environment setup and cloud resource management to support the learning experience.
bookchinesecomputer-vision
d2l-ai/d2l-zh
zed-industries/zed
75,634
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactoring, and code generation. The editor distinguishes itself through a GPU-accelerated rendering pipeline and an asynchronous multi-threaded architecture that ensures low-latency interaction even with large-scale projects. It features built-in support for real-time, multi-user collaboration using conflict-free replicated data types, allowing for synchronized editing sessions. Users can leverage both local machine learning model execution for data privacy and external AI service integrations to power inline assistance and agentic workflows. The platform provides comprehensive language-aware analysis by acting as a standards-compliant client for external language servers, enabling intelligent diagnostics, completions, and structural navigation. Its modular design supports a customizable environment where developers can manage language extensions, define keybindings, and utilize command-driven navigation to streamline their specific coding requirements.
gpuirust-langtext-editor
zed-industries/zed
mlabonne/llm-course
75,340
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment. The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space model merging and mixture-of-experts strategies, alongside practical guidance on low-precision parameter quantization and inference optimization to manage hardware requirements. Furthermore, it explores the development of autonomous agentic systems capable of tool-use orchestration and the construction of retrieval-augmented generation pipelines to ground model outputs in external data. The content spans the entire technical stack, from foundational deep learning concepts and neural network design to the complexities of deploying, evaluating, and securing models in production environments. It includes a curated collection of technical articles, blog posts, and interactive notebooks that track state-of-the-art research trends and experimental methodologies in generative artificial intelligence.
courselarge-language-modelsllm
mlabonne/llm-course
Developer-Y/cs-video-courses
74,064
This project is a community-driven educational repository that serves as a comprehensive directory of university-level computer science video lectures. It provides a structured learning path for students and professionals, aggregating high-quality academic resources to facilitate self-paced study across a wide range of technical disciplines. The repository distinguishes itself through a collaborative maintenance model, utilizing version control workflows to allow contributors to expand and update the collection. Content is organized within a single, version-controlled document that leverages internal navigation anchors to create a hierarchical table of contents, ensuring that users can easily locate specific subject matter within the extensive index. The collection covers a broad spectrum of technical knowledge, spanning foundational topics like mathematics and data structures to specialized domains such as machine learning, distributed systems, and quantum computing. By curating expert-led instructional materials, the project functions as a centralized knowledge base for those seeking to master complex computing concepts independently. The information is presented through a platform-native rendering engine that converts repository markup files into accessible, human-readable web pages.
algorithmsbioinformaticscomputational-biology
Developer-Y/cs-video-courses
infiniflow/ragflow
73,425
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasoning workflows. By integrating document intelligence with advanced retrieval pipelines, the platform enables the creation of grounded, verifiable responses supported by traceable citations. The platform distinguishes itself through deep document understanding and sophisticated knowledge orchestration. It supports complex document parsing, including the extraction of tables and images, and utilizes graph-based indexing to enhance reasoning over large document collections. Users can configure multiple recall strategies and fused re-ranking to optimize retrieval accuracy, while the system maintains context through multi-turn dialogue management and flexible tool-use frameworks. The architecture is built on a modular, containerized microservice foundation that supports both local inference engines and external language model APIs. It includes asynchronous task processing for document ingestion and indexing, ensuring system responsiveness during heavy workloads. The platform also provides a standardized interface for model abstraction, allowing for seamless integration with existing language model ecosystems. Developers can interact with the platform through a comprehensive suite of RESTful endpoints and Python client libraries, which cover the full lifecycle of agents, datasets, and knowledge graphs. The system is designed for flexible deployment, offering configurable environment settings and support for custom containerized environments to facilitate local development and infrastructure portability.
agentagenticagentic-ai
infiniflow/ragflow
redis/redis
73,096
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongside hierarchical JSON documents and high-dimensional vector embeddings. It supports advanced operational patterns such as active-active database deployment for global distribution, real-time data streaming, and probabilistic statistics for large-scale data analysis. These capabilities are complemented by a pluggable indexing engine that enables semantic similarity matching and full-text retrieval. The platform offers a comprehensive ecosystem for managing distributed state, including master-replica replication, automated cluster management, and granular security controls like access control lists and TLS encryption. Developers can interact with the database through language-specific client libraries that support connection multiplexing and object mapping, or via a command-line interface for direct administrative tasks and scripting. Redis is deployed through standard package managers and supports both self-managed clusters and managed cloud instances. Observability is provided through integrated tools for performance analysis, slow log monitoring, and bulk data management.
cachecachingdatabase
redis/redis
awesomedata/awesome-public-datasets
72,846
This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, the repository facilitates the discovery of data necessary for exploratory analysis, machine learning model training, and the development of data-intensive applications. The directory distinguishes itself through a lightweight, platform-agnostic approach to resource indexing that avoids the need for complex backend infrastructure. Content is organized using a topic-centric hierarchical taxonomy, which simplifies navigation across diverse domains ranging from climate science and economics to healthcare and computer networks. This structure is maintained through a collaborative, community-driven model where peer review and version-controlled updates ensure the ongoing accuracy and relevance of the curated links. The collection covers a broad capability surface, including specialized datasets for fields such as physics, geographic information systems, natural language processing, and time-series analysis. The repository is documented entirely through human-readable markdown files, allowing for transparent contributions and easy access to its comprehensive index of public information.
aaron-swartzawesome-public-datasetsdatasets
awesomedata/awesome-public-datasets

Browse repositories

opencv/opencv

rasbt/LLMs-from-scratch

firecrawl/firecrawl

microsoft/ML-For-Beginners

punkpeye/awesome-mcp-servers

hacksider/Deep-Live-Cam

fighting41love/funNLP

browser-use/browser-use

hoppscotch/hoppscotch

netdata/netdata

tensorflow/models

nomic-ai/gpt4all

elastic/elasticsearch

d2l-ai/d2l-zh

zed-industries/zed

mlabonne/llm-course

Developer-Y/cs-video-courses

infiniflow/ragflow

redis/redis

awesomedata/awesome-public-datasets