←Backevalplus/evalplus0Copy as MarkdownView on GitHub↗1,765 stars·199 forks·Python·Apache-2.0·1 viewevalplus.github.io↗EvalplusFeaturesBenchmarks and Datasets - Rigorous testing framework for evaluating code generation correctness.Evaluation Frameworks - Rigorous evaluation framework specifically for code-generation models.Model Evaluation and Benchmarking - Robust evaluation framework for LLM-based code generation.