Data

I champion open-source ethos. Below are datasets my collaborators and I have built together. Feel free to explore and use them in your work. 😊

🌐 Real-World Data from Online Social Platforms

In‑The‑Wild Jailbreak Prompts

prompts jailbreak LLMs 10.7K downloads

"Do Anything Now": Characterizing and Evaluating In‑The‑Wild Jailbreak Prompts on Large Language Models (CCS'24)

Description: (Jailbreak) prompts created by real-world users
Source: Reddit, Discord, websites, open datasets
Size: 15,140 prompts, including 1,405 jailbreak prompts

ForbiddenQuestionSet

questions LLMs 3.8K downloads

"Do Anything Now": Characterizing and Evaluating In‑The‑Wild Jailbreak Prompts on Large Language Models (CCS'24)

Description: Questions that LLMs should not answer, covering 13 forbidden scenarios from OpenAI usage policy
Source: GPT-4 generated, based on OpenAI usage policy
Size: 390

GPT Metadata

metadata GPTs AI agents

GPTracker: A Large-Scale Measurement of Misused GPTs (S&P'25)

Description: GPT (user-customized ChatGPT) metadata spanning four categories: basic information, GPT builders, user feedback, and GPT configurations
Source: The official GPT Store, collected bi-weekly
Size: 755,297

Lexica Dataset

prompts images 30.2K downloads

Prompt Stealing Attacks Against Text-to-Image Generation Models (USENIX'24)

Description: User-crafted prompts with generated images
Source: Lexica.art
Size: 61,467

UnsafeBench

images safety 8.7K downloads

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images (CCS'25)

Description: Safe/unsafe images from the Web, including both real-world and AI-generated examples
Source: LAION-5B (real) and Lexica.art (AI), human-labeled across 11 categories
Size: 10,146

AIGTBench

text LLMs OSNs

Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media (ACL'25)

Description: Real/AI-generated social media posts
Source: Medium, Quora, Reddit
Size: 845,497

🤖 Synthetic Data for Model Safety

HateBenchSet

hate speech LLMs

HateBench: Benchmarking Hate Speech Detectors on LLM‑Generated Content and Hate Campaigns (USENIX'25)

Description: Hate speech dataset generated by LLMs, covering 34 identity groups
Source: LLMs (GPT‑3.5, GPT‑4, Vicuna, Baichuan2, Dolly2, OPT), manually annotated
Size: 7,838

Hateful Memes in VLMs

hateful memes VLMs

From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models (USENIX'25)

Description: Responses of VLMs to hateful memes, annotated for informativeness and soundness
Source: VLMs (InstructBlip, ShareGPT-4V, LLaVA, CogVLM), manually annotated
Size: 27,373

Unsafe Images

unsafe images text-to-image models

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models (CCS'23)

Description: Unsafe images generated by text-to-image models
Source: Generated from harmful prompts (4chan, Lexica.art) and safe prompts (COCO-based)
Size: 800