Selected Publications

  • “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
    Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang; arXiv
    pdf arXiv Website online Dataset Hugging Face Code GitHub Repo stars
    🏆 Award
    🎙️ Coverage: New Scientist German Federal Office for Information Security NIST Deutschlandfunk Nova Spektrum.de

  • Prompt Stealing Attacks Against Text-to-Image Generation Models
    Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang; arXiv
    pdf arXiv Slides Video Dataset Hugging Face Code
    🏆 Recognized in Award
    🎙️ Coverage: German Federal Office for Information Security NIST CISPA News

  • HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
    Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, Yang Zhang; arXiv
    pdf arXiv Website online Dataset Hugging Face Code ArtifactAppendix
    📦 Artifact Badges: Available, Functional, Results Reproduced
    🎙️ Coverage: CNIL

  • GPTracker: A Large-Scale Measurement of Misused GPTs
    Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang; arXiv
    pdf Dataset Code
    ✨ Our findings help the platform owner take down thousands of misused GPTs

  • When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
    Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang; conf
    pdf arXiv Website online

What’s new?

  • (2025.07) Our paper titled “UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images” got accepted by ACM CCS 2025.
  • (2025.07) Our paper ““Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models” won the Best Machine Learning and Security Paper in Cybersecurity Award 2025!
  • (2025.07) I will serve on the Program Committee of SaTML 2026.
  • (2025.06) I will be giving an invited talk at the LLMApp Workshop @FSE 2025. See you in Norway!
  • (2025.06) I will serve on the Program Committee of ICWSM 2026.
  • (2025.05) 3 papers got accepted by ACL 2025. See you in Vienna!
  • (2025.05) I will be giving an invited talk at CNIL Privacy Research Day 2025 on LLM-driven threats in hate speech domain (HateBench). See you in Paris!
  • (2025.05) I will serve on the Program Committee of AISec 2025.
  • (2025.03) Thrilled to be selected as 2025 ML and Systems Rising Star!
  • (2025.03) Our paper titled “GPTracker: A Large-Scale Measurement of Misused GPTs” got accepted by IEEE S&P 2025. See you in San Francisco!
  • (2025.03) Our paper titled “On the Effectiveness of Prompt Stealing Attacks on In-The-Wild Prompts” got accepted by IEEE S&P 2025. See you in San Francisco!
  • (2025.01) Thrilled to be selected as 2025 KAUST Rising Star in AI!
  • (2025.01) Our paper titled “HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns” got accepted by USENIX Security 2025. See you in Seattle!
  • (2025.01) Our paper titled “From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models” got accepted by USENIX Security 2025. See you in Seattle!
  • (2024.11) My popular-science novel “When Trojan Virus Meets Military Training” won the "Outstanding Popular Science Work Award" from China Science Writers Association. May it inspire love for cybersecurity in young readers ;D
  • (2024.09) I will serve on the Program Committee of ICWSM 2025.
  • (2024.07) I will serve on the Program Committee of USENIX Security 2025.