For a long time I was curious if Claude Code works so well because of Claude (the model) or Code (the CLI tool / agent). This weekend, I tried to find out. Turns out that both matter, but more than anything post-training fine tuning of the model makes a big difference. If the model has been tuned for planning and tool usage in a certain way, it would provide much more reliable results.

Details as in https://nevkontakte.com/2025/swap-ai-brains.html

Just caught myself shying away from responding to a code review comment with "You are right,..." because it kind of sounds like something an LLM would say. And now I am pondering if I should allow LLM training datasets influence what my own speech should be 🤔