Hi, I am Xinyue (pronounced “Shin-Yueh” 🔊). I am a final-year Ph.D. candidate at CISPA Helmholtz Center for Information Security, advised by Michael Backes and Yang Zhang. I earned my B.S. from the University of Electronic Science and Technology of China (UESTC). Before joining CISPA, I have two year industrial experience of working as an algorithm engineer at Alibaba.
My research interests lie in Trustworthy Machine Learning, with a particular focus on the Security and Safety of Large Language Models (LLMs). I work towards a future where AI systems are built securely, safely, and responsibly, ensuring they are resistant to misuse and aligned with human values. To achieve this, my recent work focuses on three main directions:
Understanding user-driven misuse in the real-world AI systems: I develop frameworks to systematically uncover, characterize, and evaluate how users misuse AI systems, such as through in-the-wild jailbreak attacks and LLM agent misuses.
Proactively detecting and mitigating AI system misuse: I design and assess detection mechanisms to identify and defend against misused outputs from AI systems, such as hate speech, hateful memes, unsafe images, stereotypes, and AIGC, across diverse AI systems (e.g., LLMs, VLMs, and T2I models).
Identifying emerging security risks in the broader AI ecosystem: As AI systems see broader adoption, a more extensive AI ecosystem is emerging. I examine new security risks within the expanding ecosystem, such as prompt stealing attack and knowledge file leakage.
Recognitions and Awards: My research has been acknowledged by Google, Microsoft, and OpenAI, and featured in major media outlets such as New Scientist, Deutschlandfunk Nova. My work is now integrated into major AI systems such as Nvidia’s Garak, OpenAI’s GPT-4.5, o3-mini, and o1, with 3K+ Github stars and 45K+ downloads on HuggingFace. I’m honored with several awards, including the Best Machine Learning and Security Paper in Cybersecurity Award (2025), Machine Learning and Systems Rising Star (2025), KAUST Rising Star in AI (2025), and Heidelberg Laureate Forum Young Researcher (2024).
Teaching, Mentoring, and Outreach: I am passionate about teaching, mentoring, and helping students to learn research, especially students from underrepresented groups. I’ve been the guest lecturer for three courses at CISPA & Saarland University, teaching both undergraduates and graduates. To help reduce barriers to starting research or pursuing a Ph.D. in this area, I am hosting weekly office hours open to everyone (please sign up from Calendly!). I also write sci-fi novels and popular-science articles to make AI and Cybersecurity more accessible to the general public, especially the next generation.
“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang;
🏆
🎙️ Coverage:
Prompt Stealing Attacks Against Text-to-Image Generation Models
Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang;
🏆 Recognized in
🎙️ Coverage:
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, Yang Zhang;
📦 Artifact Badges: Available, Functional, Results Reproduced
🎙️ Coverage:
GPTracker: A Large-Scale Measurement of Misused GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang;
✨ Our findings help the platform owner take down thousands of misused GPTs
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang;