1 repo
Automated testing and verification of model outputs using secondary models or benchmarks.
Distinguishing note: Focuses on using AI to evaluate AI, distinct from traditional software unit testing.
Explore 1 awesome GitHub repository matching testing & quality assurance · Model-Based Evaluation. Refine with filters or upvote what's useful.
This project is a comprehensive educational resource and technical guide focused on the development, optimization, and application of large language models. It provides a structured curriculum for mastering prompt engineering, ranging from foundational principles of instruction design to advanced techniques for improving model reasoning, accuracy, and reliability. The guide distinguishes itself by offering deep technical insights into agentic workflows and autonomous system design. It covers the implementation of multi-step reasoning chains, tool integration through function calling, and stat
Uses automated evaluation loops to validate and refine model outputs against predefined criteria.