Data

I am committed to open-source data to advance research and foster collaborative development of safer, more equitable, and unbiased AI systems.

Below are datasets my collaborators and I have built together. Feel free to explore and use them in your work! 😊

🌐 Real-World Data from Online Social Platforms
prompts jailbreak LLMs 10.7K downloads
"Do Anything Now": Characterizing and Evaluating In‑The‑Wild Jailbreak Prompts on Large Language Models (CCS'24)
Description: (Jailbreak) prompts created by real-world users
Source: Reddit, Discord, websites, open datasets
Size: 15,140 prompts, including 1,405 jailbreak prompts
questions LLMs 3.8K downloads
"Do Anything Now": Characterizing and Evaluating In‑The‑Wild Jailbreak Prompts on Large Language Models (CCS'24)
Description: Questions that LLMs should not answer, covering 13 forbidden scenarios from OpenAI usage policy
Source: GPT-4 generated, based on OpenAI usage policy
Size: 390
metadata GPTs AI agents
GPTracker: A Large-Scale Measurement of Misused GPTs (S&P'25)
Description: GPT (user-customized ChatGPT) metadata spanning four categories: basic information, GPT builders, user feedback, and GPT configurations
Source: The official GPT Store, collected bi-weekly
Size: 755,297
prompts images 30.2K downloads
Prompt Stealing Attacks Against Text-to-Image Generation Models (USENIX'24)
Description: User-crafted prompts with generated images
Source: Lexica.art
Size: 61,467
images safety 8.7K downloads
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images (CCS'25)
Description: Safe/unsafe images from the Web, including both real-world and AI-generated examples
Source: LAION-5B (real) and Lexica.art (AI), human-labeled across 11 categories
Size: 10,146
text LLMs OSNs
Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media (ACL'25)
Description: Real/AI-generated social media posts
Source: Medium, Quora, Reddit
Size: 845,497
šŸ¤– Synthetic Data for Model Safety
hate speech LLMs
HateBench: Benchmarking Hate Speech Detectors on LLM‑Generated Content and Hate Campaigns (USENIX'25)
Description: Hate speech dataset generated by LLMs, covering 34 identity groups
Source: LLMs (GPT‑3.5, GPT‑4, Vicuna, Baichuan2, Dolly2, OPT), manually annotated
Size: 7,838
hateful memes VLMs
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models (USENIX'25)
Description: Responses of VLMs to hateful memes, annotated for informativeness and soundness
Source: VLMs (InstructBlip, ShareGPT-4V, LLaVA, CogVLM), manually annotated
Size: 27,373
unsafe images text-to-image models
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models (CCS'23)
Description: Unsafe images generated by text-to-image models
Source: Generated from harmful prompts (4chan, Lexica.art) and safe prompts (COCO-based)
Size: 800