Hi, I am Xinyue (pronounced “Shin-Yueh” 🔊). I am a final-year Ph.D. candidate at CISPA Helmholtz Center for Information Security, advised by Michael Backes and Yang Zhang. I earned my B.S. from the University of Electronic Science and Technology of China (UESTC). Before joining CISPA, I have two year industrial experience of working as an algorithm engineer at Alibaba.
My research interests lie in Trustworthy AI, with a focus on the security, safety, and responsibility of generative AI systems. My recent work focuses on three main directions:
Understanding real-world AI system misuses, such as in-the-wild jailbreaks and LLM agent misuses.
Proactively detecting and mitigating misused outputs from AI systems, such as hate speech, hateful memes, unsafe images, stereotypes, and AIGC.
Identifying emerging security risks like prompt stealing attack and knowledge file leakage.
Recognitions and Awards: My research has been acknowledged by Google, Microsoft, and OpenAI, and featured in major media outlets such as New Scientist, Deutschlandfunk Nova. My work is now integrated into major AI systems such as Nvidia’s Garak, OpenAI’s GPT-4.5, o3-mini, and o1, with 3K+ Github stars and 53K+ downloads on HuggingFace. I’m honored with several awards, including the KAUST Rising Star in AI (2025), Machine Learning and Systems Rising Star (2025), Heidelberg Laureate Forum Young Researcher (2024), and Best Machine Learning and Security Paper in Cybersecurity Award (2025).
Teaching, Mentoring, and Outreach: I am passionate about teaching and mentoring. To help reduce barriers to starting research or pursuing a Ph.D. in this area, I am hosting weekly office hours open to everyone (please sign up from Calendly!). I also write sci-fi novels and popular-science articles to make AI and Cybersecurity more accessible to the general public, especially the next generation.
“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang;
🏆
🎙️ Coverage:
Prompt Stealing Attacks Against Text-to-Image Generation Models
Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang;
🏆 Recognized in
🎙️ Coverage:
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, Yang Zhang;
📦 Artifact Badges: Available, Functional, Results Reproduced
🎙️ Coverage:
GPTracker: A Large-Scale Measurement of Misused GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang;
✨ Our findings help the platform owner take down thousands of misused GPTs
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang;