30 open-source projects similar to ai45lab/openrt, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best OpenRT alternative.
This project is a technical curriculum and development guide focused on large language model prompt engineering, fine-tuning, and the creation of retrieval augmented generation applications. It serves as a comprehensive resource for developers to master crafting precise instructions and textual patterns to improve the quality and predictability of model outputs. The material covers the end-to-end workflow of adapting open-source models to specific datasets and integrating language models with vector databases to generate responses based on private information. It also provides a systematic ap
ECCV 2024 BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
This repository is no longer maintained and deprecated in favour of the repository at https://github.com/dsbowen/strong_reject. Please refer to that repository for full paper replication including human evaluation details and using the fine-tuned version of the StrongReject evaluator.
Welcome to JailbreakZoo, a dedicated repository focused on the jailbreaking of large models (LMs), encompassing both large language models (LLMs) and vision language models (VLMs). This project aims to explore the vulnerabilities, exploit methods, and defense mechanisms associated with these…
📰 Latest News 📰 - 🗡️ What is HarmBench 🛡️ - 🌐 Overview 🌐 - ☕ Quick Start ☕ - ⚙️ Installation - 🛠️ Running the Evaluation Pipeline - ➕ Using your own models in HarmBench - ➕ Using your own red teaming methods in HarmBench - 🤗 Classifiers - ⚓ Documentation ⚓ - 🌱 HarmBench's Roadmap 🌱 -…
Train-val-test splits of GPT2Shape dataset can be found in folder gpt2shape: train val test
This repo provides the source code & data of our paper: Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023).
This is the official repository of our paper ScanQA: 3D Question Answering for Spatial Scene Understanding (CVPR 2022) by Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, and Motoki Kawanabe. We propose a new 3D spatial understanding task for 3D question answering (3D-QA). In the 3D-QA task, models…
🤗 Dataset | 📖 arXiv | GitHub Atsuyuki Miyai 1   Jingkang Yang 2   Jingyang Zhang 3   Yifei Ming 4   Qing Yu 1,5   Go Irie 6   Sharon Yixuan Li 4   Hai Li 3   Ziwei Liu 2 Kiyoharu Aizawa 1 1 The University of Tokyo  2 S-Lab, Nanyang Technological…
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
✨✨CVPR 2025 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Authors: Liwei Jiang, Kavel Rao ⭐, Seungju Han ⭐, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri ⭐ Co-second authors
ICLR 2025 ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation
NAACL 2025 Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation. The jailbreak-evaluation is designed for comprehensive and accurate evaluation of language model jailbreak attempts. Currently, jailbreak-evaluation support evaluating a language model jailbreak…
Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects. To address the 3D object detection and description problems, we…
An easy-to-use Python framework to generate adversarial jailbreak prompts.
SciGraphQA: Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashistha, Atharva Naik, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024