# karpathy/autoresearch

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/karpathy-autoresearch).**

87,119 stars · 12,617 forks · Python

## Links

- GitHub: https://github.com/karpathy/autoresearch
- awesome-repositories: https://awesome-repositories.com/repository/karpathy-autoresearch.md

## Description

Autoresearch is an autonomous machine learning research agent and architecture search framework. It employs a closed-loop system to programmatically rewrite training and architecture source code to discover optimal language model configurations.

The system iteratively modifies code and evaluates performance metrics to improve model quality based on a target objective. It optimizes model performance and training efficiency by tracking validation bits per byte, which allows for a fair comparison of architectural changes independently of vocabulary size.

The framework manages the full training workflow on a single GPU, utilizing Git branches to isolate experimental changes and version track successful improvements. It incorporates fixed-budget time constraints for each run and maintains structured logs of performance metrics and memory usage across all trials.

## Tags

### Artificial Intelligence & ML

- [Autonomous Research Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-research-agents.md) — Functions as an autonomous agent that iteratively rewrites source code and evaluates metrics to improve model performance.
- [Coding Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/ai-agent-tooling/coding-agents.md) — Employs AI agents to programmatically read and edit training source code to optimize model configurations.
- [Model Evaluation Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-and-validation/model-evaluation-metrics.md) — Uses validation bits per byte as a specialized metric to compare architectural changes independently of vocabulary size.
- [Large Language Model Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization.md) — Uses AI agents to iteratively modify training code and architectures to optimize the performance of language models.
- [Training and Evaluation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/training-and-evaluation-pipelines.md) — Provides automated workflows for executing model training, iteration, and validation on a single GPU.
- [Automated Code Refinement Loops](https://awesome-repositories.com/f/artificial-intelligence-ml/model-feedback-loops/automated-code-refinement-loops.md) — Iteratively modifies training code and evaluates metrics to automatically improve model performance.
- [Model Performance Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/profiling-and-benchmarking/model-performance-optimization.md) — Adjusts training code and model architecture to minimize validation bits per byte and maximize performance. ([source](https://github.com/karpathy/autoresearch/blob/master/analysis.ipynb))
- [Model Training Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-optimizers.md) — Automatically modifies training code and evaluates results to optimize model accuracy and convergence. ([source](https://github.com/karpathy/autoresearch#readme))
- [Neural Architecture Search](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-architecture-search.md) — Implements an automated process to test and discover optimal model configurations to minimize validation loss.
- [Performance Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/performance-metrics.md) — Calculates and tracks performance indicators like validation bits per byte to measure the impact of architectural changes.
- [Experiment Tracking](https://awesome-repositories.com/f/artificial-intelligence-ml/experiment-tracking.md) — Logs performance metrics and memory usage across all trials to maintain a detailed history of the research process. ([source](https://github.com/karpathy/autoresearch/blob/master/program.md))
- [Model Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization.md) — Improves language model quality by tracking validation bits per byte across different architectural iterations.
- [Training Efficiency](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/training-efficiency.md) — Evaluates training efficiency using vocabulary-size-independent metrics to compare architectural configurations fairly. ([source](https://github.com/karpathy/autoresearch/blob/master/README.md))

### Part of an Awesome List

- [Automated Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/automated-machine-learning.md) — Provides an autonomous framework for architecture search and hyperparameter optimization using AI agents.
- [Architecture Search Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/neural-architecture-search/architecture-search-frameworks.md) — Provides a framework that uses AI agents to iteratively modify training code and architectures to optimize performance.

### Scientific & Mathematical Computing

- [Iterative Feedback Loops](https://awesome-repositories.com/f/scientific-mathematical-computing/research-analysis-workflows/research-and-data-analysis-tools/research-and-analysis-tools/research-automation-tools/iterative-feedback-loops.md) — Implements an automated system that refines model code through iterative cycles of modification and performance evaluation.

### Software Engineering & Architecture

- [Automated Architecture Search](https://awesome-repositories.com/f/software-engineering-architecture/code-modification-systems/automated-architecture-search.md) — Programmatically rewrites training and architecture source code to autonomously discover optimal model configurations.
- [Training Workflow Orchestrators](https://awesome-repositories.com/f/software-engineering-architecture/training-workflow-orchestrators.md) — Manages the execution, versioning, and monitoring of small language model experiments on a single GPU.

### User Interface & Experience

- [Close Buttons](https://awesome-repositories.com/f/user-interface-experience/ui-components/feedback-overlay-components/close-buttons.md) — Iteratively modifies training code and evaluates metrics to automatically improve model performance based on target objectives.

### DevOps & Infrastructure

- [Execution Time Limits](https://awesome-repositories.com/f/devops-infrastructure/execution-rate-limiters/execution-time-limits.md) — Limits each experimental training run to a specific time window to evaluate efficiency across different configurations.
- [Version Control and Management](https://awesome-repositories.com/f/devops-infrastructure/version-control-management.md) — Manages experimental iterations using version control branches to isolate changes and track improvements. ([source](https://github.com/karpathy/autoresearch/blob/master/program.md))
- [Branch-Based Isolation](https://awesome-repositories.com/f/devops-infrastructure/version-control-management/version-control-workflows/branch-based-isolation.md) — Utilizes Git branches to isolate experimental architectural changes and track successful model training iterations.
