"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak

January 2025

Type

Publication

Proc. of the International Conference on Computational Linguistics, Coling, 2025, pages 2144-2162