Same Parts, Different Wiring: Mechanistic Interpretability of Moral Fine-Tuning
An exploration of how moral fine-tuning changes LLMs
Machine learning notes, course projects, and research writeups on supervised learning, data analysis, evaluation, and model behavior.
Two things, mostly. I took a graduate ML course and wrote up a retrospective that covers the whole thing: supervised learning, reinforcement learning, and the theory behind them. The other posts are notes from papers I've read and topics I keep coming back to.
This is the general-purpose end of what I work on. Deep learning, GPUs, and AI safety have their own pages.