Publications
The complete publication list can be found at Google Scholar.
Recent Preprints
- "Humans welcome to observe": A First Look at the Agent Social Network Moltbook
Yukun Jiang*, Yage Zhang*, Xinyue Shen*, Michael Backes, Yang Zhang
[arXiv: 2602.10127]
[website]
[dataset]
Coverage: [TechXplore]
[AI Era]
- Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang
[arXiv: 2603.01919]
Coverage: [Jiqizhixin]
- Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks
Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang
[arXiv: 2603.04459]
- OrgAgent: Organize Your Multi-Agent System like a Company
Yiru Wang, Xinyue Shen, Yaohui Han, Michael Backes, Pin-Yu Chen, Tsung-Yi Ho
[arXiv: 2604.01020]
- HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen, Yang Zhang
[arXiv: 2604.15415]
[code]
[dataset]
2026
- Open Schrödinger's Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services
Yukun Jiang, Xinyue Shen, Michael Backes, Zheng Li, Yang Zhang
ACL 2026
[pdf]
- The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training
Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen, Wenbo Jiang, Guowen Xu, Yang Liu, Michael Backes, Yang Zhang
ACL 2026 (Findings)
[pdf]
2025
- GPTracker: A Large-Scale Measurement of Misused GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang
IEEE S&P 2025
[pdf]
[dataset]
[code]
- On the Effectiveness of Prompt Stealing Attacks on In-The-Wild Prompts
Yicong Tan, Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang
IEEE S&P 2025
[pdf]
[code]
- HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, Yang Zhang
USENIX Security 2025
[pdf]
[website]
[dataset]
[code]
Coverage: [CNIL, France's Data Protection Authority]
- From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
Yihan Ma, Xinyue Shen, Yiting Qu, Ning Yu, Michael Backes, Savvas Zannettou, Yang Zhang
USENIX Security 2025
[pdf]
[dataset]
[code]
- When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang
ACL 2025
[pdf]
[arXiv]
[website]
- Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media
Zhen Sun, Zongmin Zhang, Xinyue Shen, Ziyi Zhang, Yule Liu, Michael Backes, Yang Zhang, Xinlei He
ACL 2025
[pdf]
[dataset]
[code]
- JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang
ACL 2025
[pdf]
[website]
[code]
(Oral)
- UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang
ACM CCS 2025
[pdf]
[website]
[dataset]
[code]
2024
- "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang
ACM CCS 2024
[pdf]
[website]
[dataset]
[code]
Best Machine Learning & Security Paper in Cybersecurity Award 2025
Coverage: [New Scientist]
[German Federal Office for Information Security]
[NIST]
[Deutschlandfunk Nova]
[Spektrum.de]
- MGTBench: Benchmarking Machine-Generated Text Detection
Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
ACM CCS 2024
[pdf]
[dataset]
[code]
- Prompt Stealing Attacks Against Text-to-Image Generation Models
Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang
USENIX Security 2024
[pdf]
[video]
[dataset]
[code]
Recognized by Microsoft Vulnerability Severity Classification for AI Systems
Coverage: [German Federal Office for Information Security]
[NIST]
[CISPA News]
- The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective
Yihan Ma, Xinyue Shen, Yixin Wu, Boyang Zhang, Michael Backes, Yang Zhang
EMNLP 2024
[pdf]
- ModScan: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
Yukun Jiang, Zheng Li, Xinyue Shen, Yugeng Liu, Michael Backes, Yang Zhang
EMNLP 2024
[pdf]
- Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming
Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu, Yugeng Liu, Michael Backes, Yang Zhang
ICWSM 2024
[pdf]
[poster]
2023
2022
- On Xing Tian and the Perseverance of Anti-China Sentiment Online
Xinyue Shen, Xinlei He, Michael Backes, Jeremy Blackburn, Savvas Zannettou, Yang Zhang
ICWSM 2022
[pdf]
Before PhD
- Evil Under the Sun: Understanding and Discovering Attacks on Ethereum Decentralized Applications
Liya Su*, Xinyue Shen*, Xiangyu Du, Xiaojing Liao, XiaoFeng Wang, Luyi Xing, Baoxu Liu
USENIX Security 2021
[pdf]
* Equal contribution.
|
|