Posts by Collection

portfolio

publications

Immunization against harmful fine-tuning attacks

Domenic Rosati, Jan Wehner, Kai Williams, Lukasz Bartoszcze, Hassan Sajjad, Frank Rudzicz in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLMs can be fine-tuned with harmful data to remove their safeguards. We formalize the problem and set out conditions for a solution.

[Article] [PDF]

Safety is Essential for Responsible Open-Ended Systems

Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz forthcoming in SSI-FM ICLR 2025 Workshop, 2025

Open-ended AI is a growing paradigm where AI continuously explores novel and interesting artifacts. This position paper describes specific safety challenges in Open-Ended AI and how they can be mitigated.

[Article] [PDF]

talks

teaching

Teaching experience 1

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

This is a description of a teaching experience. You can use markdown like any other post.