Recent readings
ML Architecture and papers
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet Really important article, seemingly a breakthough in peering inside LLMs. A nice adjunct to this article is the less dense An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs. My big worry about this is …
more ...