Which open-source GitHub repositories match “Model evaluation and LLM observability”?

mlflow/mlflow is the closest match — mlflow/mlflow on GitHub. Other strong matches: microsoft/vscode-copilot-chat, huggingface/open-r1, langchain-ai/deepagents, hiyouga/llamafactory.

Why does microsoft/vscode-copilot-chat match “Model evaluation and LLM observability”?

This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automa…

Why does huggingface/open-r1 match “Model evaluation and LLM observability”?

Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development…

Why does langchain-ai/deepagents match “Model evaluation and LLM observability”?

Deepagents is an LLM agent orchestration platform and stateful application server designed for deploying and managing AI agents built with computational graphs. It provides a containerized runtime environment that handles agent execution, state persistence, and the versioning of AI assistants. The…

Why does hiyouga/llamafactory match “Model evaluation and LLM observability”?

LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a singl…

Model evaluation and LLM observability

Tools and frameworks for monitoring performance, tracking metrics, and evaluating the output quality of large language models.

Find the best repos with AI.We'll search the best matching repositories with AI.

mlflow/mlflow
mlflow/mlflow
26,554View on GitHub
PythonAgent Evaluation ToolsAI GatewaysExperiment Tracking
View on GitHub26,554
microsoft/vscode-copilot-chat
microsoft/vscode-copilot-chat
9,493View on GitHub
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
TypeScriptAI Coding AssistantsAI-Powered Code GenerationAI-Powered Development Environments
View on GitHub9,493
huggingface/open-r1
huggingface/open-r1
26,326View on GitHub
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test
PythonCode-Integrated Training FrameworksLarge Scale Training SuitesReasoning Model Training Suites
View on GitHub26,326
langchain-ai/deepagents
langchain-ai/deepagents
25,006View on GitHub
Deepagents is an LLM agent orchestration platform and stateful application server designed for deploying and managing AI agents built with computational graphs. It provides a containerized runtime environment that handles agent execution, state persistence, and the versioning of AI assistants. The platform distinguishes itself through deep integration with the Model Context Protocol, allowing agents to function as servers that expose tools and capabilities to external clients. It features a sophisticated observability suite for capturing execution traces, performing LLM-based evaluations agai
PythonAgent Communication ProtocolsAgent Orchestration PlatformsAI Agent Infrastructure
View on GitHub25,006
hiyouga/llamafactory
hiyouga/LlamaFactory
72,213View on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experim
PythonExperiment TrackingLanguage Model Fine-TuningLarge Language Model Fine-Tuning Frameworks
View on GitHub72,213
microsoft/jarvis
microsoft/JARVIS
24,854View on GitHub
JARVIS is a system for large language model task orchestration, deployment management, and automation benchmarking. It utilizes a task orchestrator to decompose complex requests into actionable steps and coordinates various expert models to synthesize final responses. The project includes an AI model deployment manager to handle the local deployment of expert models across different hardware scales. It further provides an AI workflow API consisting of web endpoints used to trigger automated task workflows and retrieve results from model selection stages. The framework incorporates an automat
PythonTask Planning SystemsAutomation Capability BenchmarksAutomation Success Metrics
View on GitHub24,854
openai/clip
openai/CLIP
33,779View on GitHub
CLIP is a neural network architecture designed to map visual and textual data into a shared latent vector space. By utilizing transformer-based feature extraction and multi-modal tokenization, the system aligns images and natural language strings, enabling cross-modal similarity analysis and semantic classification. The project functions as a zero-shot classification engine, identifying image content by calculating the cosine similarity between visual features and arbitrary text labels without requiring task-specific retraining. Beyond inference, it serves as a research toolkit for evaluating
Jupyter NotebookContrastive Learning ModelsZero-Shot Inference EnginesComputer Vision Evaluation Tools
View on GitHub33,779
midudev/jscamp
midudev/jscamp
3,811View on GitHub
jscamp is a full-stack web development and education project focused on mastering JavaScript, TypeScript, and AI integration. It provides a structured curriculum and interactive exercises covering language fundamentals, frontend engineering, and backend API development. The project distinguishes itself through the implementation of autonomous AI agents capable of complex task automation, such as modifying files, managing servers, and executing API calls. It includes advanced AI development tools for conversational querying, real-time code suggestions, and automated repository analysis to gene
JavaScriptAutonomous Agent LoopsFull-Stack CurriculaProgramming Courses
View on GitHub3,811
stanfordnlp/dspy
stanfordnlp/dspy
35,325View on GitHub
DSPy is a declarative programming framework designed for building complex language model applications. It treats model interactions as modular, composable programs, allowing developers to define task logic through typed class schemas rather than relying on manually written prompts. By organizing workflows into hierarchical, reusable Python objects, the framework enables the construction of sophisticated AI systems that manage state and execution flow independently. The framework distinguishes itself through an automated optimization engine that iteratively refines prompt instructions and few-
PythonDeclarative AI FrameworksAgentic Orchestration FrameworksAI Signature Definitions
View on GitHub35,325
googlecloudplatform/generative-ai
GoogleCloudPlatform/generative-ai
12,700View on GitHub
This project is a development platform for managing the lifecycle of generative artificial intelligence models. It provides a unified environment for accessing, fine-tuning, and deploying large language models, serving as an orchestrator that handles the integration of diverse models into custom applications. The platform distinguishes itself by offering a managed infrastructure for hosting and scaling models, which removes the requirement for manual server maintenance or configuration. It includes integrated tools for supervised fine-tuning and vector embedding optimization, allowing for the
Jupyter NotebookGenerative AI DevelopmentGenerative AI ModelsModel Deployment Management
View on GitHub12,700
fighting41love/funnlp
fighting41love/funNLP
81,299View on GitHub
This project is a community-driven knowledge base and curated repository focused on natural language processing and large language model development. It serves as a centralized index for high-quality tools, libraries, and research materials, organizing technical resources into structured, version-controlled documentation to assist developers in navigating the evolving artificial intelligence ecosystem. The repository distinguishes itself by acting as an aggregator for AI model evaluation and benchmarking. It provides access to tools that enable the simultaneous comparison of multiple conversa
PythonAwesome ListLanguage Model IntegrationsVersion-Controlled Knowledge Bases
View on GitHub81,299
evidentlyai/evidently
evidentlyai/evidently
7,137View on GitHub
Evidently is an AI observability platform and evaluation framework designed to quantify the performance of machine learning models and large language models. It functions as a monitoring tool for detecting data drift and quality degradation in tabular datasets, while providing a specialized analyzer for the faithfulness and correctness of retrieval augmented generation systems. The project distinguishes itself through an evaluation framework that utilizes judge models and custom rubrics to score language model outputs. It includes tools for iterative prompt optimization and the generation of
Jupyter NotebookAI Observability TracingAI Evaluation FrameworksData Drift Detectors
View on GitHub7,137
jingyaogong/minimind
jingyaogong/minimind
51,834View on GitHub
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and adva
PythonModel Training ToolkitsAgentic FrameworksAgentic Training Frameworks
View on GitHub51,834
internlm/opencompass
InternLM/opencompass
7,096View on GitHub
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-sta
PythonLLM EvaluationModel Evaluation FrameworksDistributed Task Orchestration
View on GitHub7,096
mlabonne/llm-course
mlabonne/llm-course
80,178View on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment. The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space mode
AI Research RepositoriesAwesome ListFine-Tuning Strategies
View on GitHub80,178
lianjiatech/belle
LianjiaTech/BELLE
8,273View on GitHub
BELLE is a specialized implementation of Chinese conversational large language models, encompassing a full instruction tuning framework. It provides a pipeline for training, evaluating, and deploying models optimized for natural language understanding and dialogue tasks in the Chinese language. The project is distinguished by its integrated approach to model refinement, combining the curation of multi-million entry instruction datasets with a distributed training pipeline. This pipeline supports both full fine-tuning and low-rank adaptation to optimize conversational performance. The system
HTMLChinese Conversational LLMsAutomated Output EvaluationDistributed Fine-Tuning
View on GitHub8,273
haotian-liu/llava
haotian-liu/LLaVA
24,465View on GitHub
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
PythonMultimodal Large Language ModelsVision-Language PipelinesVisual Instruction Tuning
View on GitHub24,465
lm-sys/fastchat
lm-sys/FastChat
39,472View on GitHub
FastChat is a training and serving platform for large language models that provides an integrated toolkit for fine-tuning, hosting, and benchmarking chatbots. It functions as an inference server capable of hosting multiple models and exposing them via a standardized API for chat applications. The platform distinguishes itself through a distributed model controller that manages worker nodes and routes requests across a hardware-agnostic inference layer supporting various accelerators. It includes a dedicated evaluation framework for assessing model quality using automated judges, multi-turn di
PythonLarge Language Model ServingChatbot Hosting ServicesDistributed Model Orchestration
View on GitHub39,472
unslothai/unsloth
unslothai/unsloth
66,628View on GitHub
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade hardware. The platform distinguishes itself through hand-optimized kernels and automated computational graph techniques that maximize hardware throughput. It supports advanced training methodologies, including reinforcement learning for reasoning and efficient adapter-based fin
PythonLanguage Model TrainingCustom Kernel AcceleratorsEfficient Training Pipelines
View on GitHub66,628
comet-ml/opik
comet-ml/opik
17,787View on GitHub
Opik is an observability and evaluation platform designed for generative AI applications and agentic workflows. It provides a centralized environment for tracing execution flows, managing prompt templates, and monitoring production performance, allowing teams to gain visibility into complex model interactions and tool usage without requiring manual application code changes. The platform distinguishes itself through its integrated approach to the AI development lifecycle, combining distributed trace instrumentation with automated evaluation frameworks. It supports model-as-a-judge scoring, syn
PythonAI Observability and EvaluationLLM ObservabilityAI Evaluation Frameworks
View on GitHub17,787
tatsu-lab/stanford_alpaca
tatsu-lab/stanford_alpaca
30,266View on GitHub
This project provides an end-to-end framework for adapting large language models to follow user instructions through supervised fine-tuning. It functions as a comprehensive training pipeline that enables the creation of specialized assistant models by minimizing the difference between predicted outputs and target responses within structured instruction datasets. The framework distinguishes itself by integrating synthetic data generation with memory-efficient training techniques. It utilizes powerful language models to iteratively expand small sets of human-written seeds into diverse, high-qua
PythonInstruction Fine-Tuning FrameworksInstruction TuningInstruction Tuning Frameworks
View on GitHub30,266
agenta-ai/agenta
Agenta-AI/agenta
3,860View on GitHub
Agenta is a Prompt Ops lifecycle manager and prompt management platform that decouples prompt engineering from application code. It serves as a centralized system for developing, versioning, and deploying prompt templates and model configurations across different environments. The platform functions as an AI agent orchestrator with a visual interface for building agent workflows and connecting models to external tools. It further acts as an evaluation framework and observability tool, utilizing OpenTelemetry to capture execution traces, monitor latency, and track token costs. The system cove
TypeScriptPrompt Management SystemsPrompt Management SystemsPrompt Registries
View on GitHub3,860
zai-org/chatglm-6b
zai-org/ChatGLM-6B
41,039View on GitHub
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
PythonAutoregressive Inference EnginesLocal Inference EnginesModel Runtimes
View on GitHub41,039
arize-ai/phoenix
Arize-ai/phoenix
8,605View on GitHub
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
Jupyter NotebookLLM EvaluationLLM Evaluation FrameworksAI Observability Tracing
View on GitHub8,605
deepseek-ai/deepseek-v3
deepseek-ai/DeepSeek-V3
103,753View on GitHub
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting native 8-bit floating-point quantization. The repository offers extensive support for local and distributed inference through integration with multiple frameworks and engines. It includes documentation for deploying the model across various hardware configurations, such as GPUs an
PythonModel WeightsInference FrameworksFrontier Models
View on GitHub103,753
confident-ai/deepeval
confident-ai/deepeval
13,733View on GitHub
Deepeval is a framework for testing and evaluating large language model applications. It provides a suite of tools for executing automated regression tests, validating model output quality against defined standards, and tracing the execution of complex agent workflows. By integrating these capabilities into development pipelines, the platform ensures consistent performance and reliability throughout the software lifecycle. The platform distinguishes itself through its focus on programmatic validation and observability. It utilizes secondary language models to score output quality and employs
PythonAI Regression Testing SuitesAutomated Assertion ValidatorsLLM Evaluation
View on GitHub13,733
shap/shap
shap/shap
25,049View on GitHub
SHAP is an explainable AI toolkit that provides a game theoretic framework for interpreting machine learning model predictions. It functions as a feature attribution engine, decomposing model outputs into the sum of individual feature effects to clarify how specific input variables influence a final decision. By assigning importance values to these inputs, the library enables users to understand the logic behind complex predictive models. The project distinguishes itself through its versatility and specialized calculation methods. It operates as a model-agnostic diagnostic library, capable of
Jupyter NotebookExplainable AI ToolkitsFeature Attribution MethodsGame Theoretic Explainability
View on GitHub25,049
datawhalechina/llm-universe
datawhalechina/llm-universe
13,269View on GitHub
llm-universe is a structured learning resource and technical guide focused on the development of large language model applications. It serves as a curriculum for mastering model orchestration, the creation of autonomous conversational agents, and the implementation of retrieval-augmented generation systems. The project provides detailed instructions on connecting model APIs with memory and tools to create execution chains. It specifically covers the construction of retrieval pipelines, including the process of cleaning raw documents, generating embeddings, and integrating vector databases to
Jupyter NotebookLLM Application Development CurriculaLLM TutorialsRetrieval-Augmented Generation
View on GitHub13,269
deepinsight/insightface
deepinsight/insightface
29,002View on GitHub
InsightFace is a comprehensive deep learning framework designed for face recognition, biometric identity verification, and feature extraction. It provides a specialized engine for one-to-one verification and one-to-many identification tasks, utilizing convolutional neural networks to transform raw image pixels into high-dimensional vector embeddings. The project includes a complete toolkit for detecting, aligning, and processing facial data to ensure consistent identity discrimination. Beyond core recognition, the platform distinguishes itself through an extensive model management and optimiz
PythonBiometric AuthenticationBiometric EnginesEmbedding Computation
View on GitHub29,002
comet-ml/comet-llm
comet-ml/comet-llm
19,673View on GitHub
Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardrails. The platform distinguishes itself through a combination of model-based scoring and heuristic metrics to quantify output quality and detect hallucinations. It includes a dedicated prompt and agent optimizer with an interactive playground for refining templates and tool configurations. For retri
PythonLLM ObservabilityAgent Debugging ToolsAgentic Workflow Debuggers
View on GitHub19,673
nomic-ai/gpt4all
nomic-ai/gpt4all
77,375View on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vect
C++C++ Inference BackendsLanguage Model OrchestrationLocal AI Inference
View on GitHub77,375
ibm/mcp-context-forge
IBM/mcp-context-forge
3,310View on GitHub
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
PythonAggregator ServersAI Protocol TranslationAI Tool Federation
View on GitHub3,310
karpathy/nanochat
karpathy/nanochat
55,103View on GitHub
Nanochat is a lightweight execution environment designed for training and running language models on standard consumer hardware. It functions as both a neural network training framework and an inference engine, enabling users to perform backpropagation-based training and model execution directly on general-purpose processors without the need for dedicated graphics hardware. The project distinguishes itself through a suite of optimization tools that prioritize efficiency on local machines. By utilizing memory-mapped weight loading and CPU-optimized vector math, it maximizes throughput for inte
PythonLocal Inference RuntimesTransformer Inference EnginesTraining Frameworks
View on GitHub55,103
datawhalechina/tiny-universe
datawhalechina/tiny-universe
4,505View on GitHub
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
Jupyter NotebookJupyter Notebook CollectionsTransformer Architecture ImplementationAgentic Reasoning Frameworks
View on GitHub4,505
huggingface/transformers
huggingface/transformers
161,630View on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
PythonAPI FrameworksByte Pair EncodingsHybrid
View on GitHub161,630
coleam00/archon
coleam00/Archon
13,728View on GitHub
Archon is an artificial intelligence agent automation engine designed to orchestrate complex development workflows. It functions as a platform for chaining multi-step tasks into directed graphs, allowing developers to standardize and execute repeatable coding patterns through declarative configuration files. The system distinguishes itself by maintaining stateful context across long-running sessions and executing operations within isolated, containerized worktrees to prevent file interference. It integrates with external language models and provides a centralized registry for sharing and inst
PythonAgentic Workflow AutomationAI Workflow OrchestratorsAutomated Development Workflows
View on GitHub13,728
ultralytics/ultralytics
ultralytics/ultralytics
58,468View on GitHub
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
PythonComputer VisionModel Training and Inference EnginesComputer Vision Training Frameworks
View on GitHub58,468
explodinggradients/ragas
explodinggradients/ragas
14,400View on GitHub
Ragas is an evaluation framework and performance benchmark designed to quantify the quality of retrieval augmented generation pipelines. It functions as an application optimizer to identify bottlenecks in language model workflows using automated metrics and model-based scoring. The framework includes a system for generating synthetic datasets that mimic production scenarios and edge cases to create realistic test cases. It enables reference-free assessment, allowing the evaluation of response quality by analyzing grounding in the provided context without requiring gold-standard labels. The s
PythonRAG Evaluation FrameworksLLM EvaluationLLM Test Pair Generators
View on GitHub14,400
qwenlm/qwen3
QwenLM/Qwen3
27,324View on GitHub
Qwen3 is a transformer-based large language model designed as a generative AI foundation for understanding, reasoning, and generating human language. It functions as a comprehensive ecosystem for model training, fine-tuning, and production-ready inference, providing the underlying architecture and weights necessary to build diverse artificial intelligence applications. The project distinguishes itself through extensive support for model quantization and distributed inference, enabling efficient execution across a wide range of hardware from consumer-grade devices to scalable cloud infrastruct
PythonGenerative AI FoundationsLarge Language ModelsModel Training Frameworks
View on GitHub27,324
aishwaryanr/awesome-generative-ai-guide
aishwaryanr/awesome-generative-ai-guide
24,755View on GitHub
This project is a community-driven knowledge repository and technical learning resource focused on the field of generative artificial intelligence. It serves as a centralized hub for developers and practitioners to access curated research, tutorials, and foundational concepts necessary for building and deploying modern artificial intelligence applications. The platform distinguishes itself through a collaborative, distributed contribution model that aggregates diverse learning materials into a structured, searchable knowledge base. It covers a wide range of specialized topics, including retri
HTMLAwesome ListGenerative AI Skill PathsLarge Language Model Tutorials
View on GitHub24,755
meta-llama/llama
meta-llama/llama
59,464View on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateles
PythonInference EnginesLarge Language Model RuntimesLocal Inference Engines
View on GitHub59,464
lightning-ai/litgpt
Lightning-AI/litgpt
13,431View on GitHub
LitGPT is a training and deployment framework for large language models, providing a suite of tools for pretraining, finetuning, quantizing, evaluating, and serving models within a production environment. It includes a dedicated training pipeline for adapting pretrained models to specific tasks, a quantization tool for reducing weight precision, and an inference server for hosting models via web interfaces. The framework supports high-performance model development through custom architecture implementation and the use of predefined recipes to standardize pretraining and finetuning. It enables
PythonLarge Language Model Training FrameworksFinetuning WorkflowsInstruction Tuning
View on GitHub13,431
ggml-org/llama.cpp
ggml-org/llama.cpp
116,799View on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU architectures. The project distinguishes itself by offering a lightweight HTTP server that adheres to standard API specifications, enabling chat completion, embeddings, and reranking services. It includes a suite of tools for model quantization and conversion, which reduces memory us
C++Hardware Abstraction LayersText-Only Inference EnginesMultimodal Inference Engines
View on GitHub116,799
alirezadir/machine-learning-interviews
alirezadir/Machine-Learning-Interviews
8,455View on GitHub
This project is a comprehensive machine learning interview guide and technical study resource designed for individuals preparing for machine learning and AI engineering roles. It provides a collection of materials and practice problems covering core algorithms, theoretical fundamentals, and the implementation of neural network architectures. The resource serves as a technical reference for generative AI development, focusing on the design and optimization of large language models and diffusion systems. It includes frameworks for system design, covering the architecture of production machine l
Jupyter NotebookML Interview PreparationSystem Design Interview PreparationAlgorithm Implementations
View on GitHub8,455
open-mmlab/mmdetection
open-mmlab/mmdetection
32,756View on GitHub
This project is a modular research toolkit designed for developing, training, and evaluating deep learning models for object detection, segmentation, and video instance tracking. It provides a flexible training engine that manages complex neural network execution, including distributed training, custom lifecycle hooks, and weight optimization. The framework is built around a hierarchical configuration system that allows users to define architectures, data pipelines, and training hyperparameters through composable, inheritable files. The project distinguishes itself through its highly modular
PythonComputer Vision ToolkitsObject DetectionTraining Pipelines
View on GitHub32,756
openai/evals
openai/evals
18,702View on GitHub
Evals is a framework designed for automating, managing, and executing repeatable benchmarking suites to analyze the quality and performance of language models. It provides a platform for running standardized tests to measure model accuracy and track behavioral changes over time. The system distinguishes itself through a modular architecture that uses a standardized adapter layer to normalize inputs and outputs, allowing different models to be swapped and tested interchangeably. It supports the creation of custom benchmarks using proprietary data, enabling quality assurance on sensitive tasks
PythonLLM EvaluationModel Performance BenchmarkingModel Testing
View on GitHub18,702
ultralytics/yolov5
ultralytics/yolov5
57,528View on GitHub
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning to high-speed inference and deployment. The framework utilizes a modular neural architecture, allowing users to swap backbone and head components to tailor models for specific visual tasks. What distinguishes this project is its focus on production-ready deployment and model ef
PythonComputer VisionObject DetectionReal-Time
View on GitHub57,528
keras-team/keras
keras-team/keras
64,094View on GitHub
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a directed acyclic graph approach, the framework allows users to build intricate models with multiple inputs, outputs, and shared layers, ensuring consistent numerical execution through functional state management. The project distinguishes itself as a multi-backend machine learning
PythonFrameworksModel DefinitionArchitectures
View on GitHub64,094

Model evaluation and LLM observability

mlflow/mlflow

microsoft/vscode-copilot-chat

huggingface/open-r1

langchain-ai/deepagents

hiyouga/LlamaFactory

microsoft/JARVIS

openai/CLIP

midudev/jscamp

stanfordnlp/dspy

GoogleCloudPlatform/generative-ai

fighting41love/funNLP

evidentlyai/evidently

jingyaogong/minimind

InternLM/opencompass

mlabonne/llm-course

LianjiaTech/BELLE

haotian-liu/LLaVA

lm-sys/FastChat

unslothai/unsloth

comet-ml/opik

tatsu-lab/stanford_alpaca

Agenta-AI/agenta

zai-org/ChatGLM-6B

Arize-ai/phoenix

deepseek-ai/DeepSeek-V3

confident-ai/deepeval

shap/shap

datawhalechina/llm-universe

deepinsight/insightface

comet-ml/comet-llm

nomic-ai/gpt4all

IBM/mcp-context-forge

karpathy/nanochat

datawhalechina/tiny-universe

huggingface/transformers

coleam00/Archon

ultralytics/ultralytics

explodinggradients/ragas

QwenLM/Qwen3

aishwaryanr/awesome-generative-ai-guide

meta-llama/llama

Lightning-AI/litgpt

ggml-org/llama.cpp

alirezadir/Machine-Learning-Interviews

open-mmlab/mmdetection

openai/evals

ultralytics/yolov5

keras-team/keras