1 repo
Specialized testing suites for assessing the reasoning, tool usage, and output quality of autonomous AI agents.
Distinguishing note: Distinct from general model evaluation: focuses on multi-step agentic workflows and tool-use verification.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Agent Evaluation Tools. Refine with filters or upvote what's useful.
Analyze agent performance by defining test datasets and custom scorers to assess both final outputs and intermediate tool usage.