LLMs struggle with good software dev, especially on implemen...

semisol

npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj

hex

83f70f833e703f5d58e3e6d8cb5d0ef7f5a87229c642b383d7cc4ded94aa4b0e

nevent

nevent1qqsg8ac0svl8q06atr37dkxtt5800adgwg5uvs4ns0tucn0djj4ykrsprpmhxue69uhhyetvv9ujuem4d36kwatvw5hx6mm9qgs99d9qw67th0wr5xh05de4s9k0wjvnkxudkgptq8yg83vtulad30gjpze65

Kind-1 (TextNote)

2026-06-07T00:14:54Z

LLMs struggle with good software dev, especially on implementing novel things, and maintainability. The best way them to view them is as a really expensive machine translation from English to code.

There are probably a few reasons:

  1. ML models have limited generalization capability ML models can only "do" what they have been trained on. What is outside of that can't be reliably represented or processed by the model, and so trying to do anything outside that will lead to weird results.

  2. Biases in training data Models are trained with a lot of data. The pre-training dataset can significantly bias the model (for example preferred frameworks, tools or "suggestions"), and so can the post-training (which is what results in models having a certain "design" style, or the LLMisms)

  3. Reinforcement learning The RL stage of a model optimizes for things like the number of tool calls, and a binary pass metric. The problem is that, just like human-made code, it is easier to hack on a fix than to properly integrate it.

The model is not trained for achieving anything more than satisfying your request with the bare minimum, and so you will accumulate layers of slop and slop.

Raw JSON

{
  "kind": 1,
  "id": "83f70f833e703f5d58e3e6d8cb5d0ef7f5a87229c642b383d7cc4ded94aa4b0e",
  "pubkey": "52b4a076bcbbbdc3a1aefa3735816cf74993b1b8db202b01c883c58be7fad8bd",
  "created_at": 1780791294,
  "tags": [],
  "content": "LLMs struggle with good software dev, especially on implementing novel things, and maintainability.\nThe best way them to view them is as a really expensive machine translation from English to code.\n\nThere are probably a few reasons: \n1. ML models have limited generalization capability\nML models can only \"do\" what they have been trained on. What is outside of that can't be reliably represented or processed by the model, and so trying to do anything outside that will lead to weird results.\n\n2. Biases in training data\nModels are trained with a lot of data. The pre-training dataset can significantly bias the model (for example preferred frameworks, tools or \"suggestions\"), and so can the post-training (which is what results in models having a certain \"design\" style, or the LLMisms)\n\n3. Reinforcement learning\nThe RL stage of a model optimizes for things like the number of tool calls, and a binary pass metric. The problem is that, just like human-made code, it is easier to hack on a fix than to properly integrate it.\n\nThe model is not trained for achieving anything more than satisfying your request with the bare minimum, and so you will accumulate layers of slop and slop.",
  "sig": "02fb00bde2ffc50fcf38ee93b735ed424790cee6e4409bbbb45b38d3078593abe97e500a2a949d6fd34a3f13fe425a78a02d57284c8ba6760bc749974e734ccf"
}