About me

I’m a recent Computer Science graduate from the University of Cambridge.

I aim to understand how cognitive abilities form in deep neural networks. Motivated by both curiosity about their internal mechanisms and the challenges of AI safety, my current work explores how mechanistic¹ interpretability can support these goals. I also appreciate the awesome, research-driven engineering² behind the highly-performant deep learning systems!

Please, don’t hesitate to reach out! I’m very happy to hear about your research, tell you a bit about mine, or just chat about anything interesting!

Research updates

2025-09: Activation Transport Operators will be presented as Spotlight at the Mechanistic Interpretability Workshop at NeurIPS 2025 in San Diego, US! 🎉🇺🇸
2025-08: Our recent work “Activation Transport Operators” on transporting SAE features across transformer layers is available! [arxiv | code]
2025-07: Presented my MPhil dissertation at MobiUK 2025 in Edinburgh, UK! 🇬🇧 [poster]
2024-07: Presented preliminary results for Data-Efficient Task Unlearning at EEML 2024 in Novi Sad, Serbia! 🇷🇸 [poster]

More about me

Throughout my year-long MPhil degree, I have worked along the CaMLSys group, where I focused on Federated Learning. My dissertation explored how several institutions with limited computational resources can collaborate on training a joint foundational language model. During my studies, I also benchmarked the inner workings of torch.compile(), and explored the KV-caching strategies in LLM inference. On the more theoretical front, I looked into the phenomenon of attention sinks³ in transformers and studied the concept of dynamic tokenisation. What a year it was!

Previously, I have been working with researchers from UCL NLP on BritLLM – a joint effort towards producing freely available Large⁴ Language Models for UK languages⁵. In my undergraduate dissertation, I focused on the problem of the poor availability of LLMs for low-resource languages and worked on language model adaptation methods for African languages. Furthermore, we explored Data-Efficient Task Unlearning in LMs, a method for increasing the safety of language models and removing their undesired capabilities.

Notes

mechanistic? ↩
Awesome Engineering! ↩
~~Write-up in progress.~~ Update: It’s arrived! ↩
As of the time of writing (2024-08-27), 3 billion parameters make the model be considered large. Update: As of 2025-07-28, a 3B model is still pretty big. ↩
Isn’t this just… English?! Explore our work to see what other languages are spoken in the UK! ↩

Andrzej Szablewski

Research updates

More about me

Notes