Publications

You can also find my articles on my Google Scholar profile.

Safety is Essential for Responsible Open-Ended Systems

Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz forthcoming in SSI-FM ICLR 2025 Workshop, 2025

Open-ended AI is a growing paradigm where AI continuously explores novel and interesting artifacts. This position paper describes specific safety challenges in Open-Ended AI and how they can be mitigated.

[Article] [PDF]

Immunization against harmful fine-tuning attacks

Domenic Rosati, Jan Wehner, Kai Williams, Lukasz Bartoszcze, Hassan Sajjad, Frank Rudzicz in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLMs can be fine-tuned with harmful data to remove their safeguards. We formalize the problem and set out conditions for a solution.

[Article] [PDF]