DarkPatterns-LLM: A Benchmark for Detecting Manipulative and Harmful Behaviors in LLMs
Status: Under review, 2026
DarkPatterns-LLM introduces a benchmark suite for systematically evaluating manipulative and harmful behaviors in large language models. The benchmark supports rigorous safety assessment, comparative analysis, and more transparent evaluation of model behavior under adversarial and high-risk interaction settings.
Recommended citation: Asif, S., Loguan, I., Asif, S., & Khan, H. (2025). "DarkPatterns-LLM: A Benchmark for Detecting Manipulative and Harmful Behaviors in LLMs." Under review.
Download Paper
