they are effectively the same thing, one just has more tunab...

52b4a076bcbbbdc3...

npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj

hex

73654644dbb32a525757d2d20258431f3437af9301e8403fc097138e75e0a9ad

nevent

nevent1qqs8xe2xgndmx2jj2ata95sztpp37dph47fsr6zq8lqfwyuwwhs2ntgprpmhxue69uhhyetvv9ujuem4d36kwatvw5hx6mm9qgs99d9qw67th0wr5xh05de4s9k0wjvnkxudkgptq8yg83vtulad30gzxqcuv

Kind-1 (TextNote)

2026-04-23T20:50:16Z

↳ 回复 事件不存在

4a5ed84f10926d05c515d29150b5a0166abd2b0990bc4fad4cfc83aba1ed92fd...

they are effectively the same thing, one just has more tunability.

a basic dot-product based classifier could be a classification head just by setting each output’s weights to the embed.

I think a gated MLP may work better though, and would allow a lower LR for the main weights that could reduce OOD shift. Compared to a direct one it would allow some nonlinearity and a higher intermediate representation

Could also intentionally bottleneck training by having LoRA instead of full FT on the base model (but usually not worth it when considering these are <1B param)

原始 JSON

{
  "kind": 1,
  "id": "73654644dbb32a525757d2d20258431f3437af9301e8403fc097138e75e0a9ad",
  "pubkey": "52b4a076bcbbbdc3a1aefa3735816cf74993b1b8db202b01c883c58be7fad8bd",
  "created_at": 1776977416,
  "tags": [
    [
      "e",
      "bb2b59a5ec8683b22037fe5353dc7511e026ae150c669c4fda770da727da2aee",
      "wss://relay.primal.net/",
      "root",
      "5cdbde0a550fc046a38d75e9fc238094f96b0165b8297c4fa69134ae4ec80024"
    ],
    [
      "e",
      "4a5ed84f10926d05c515d29150b5a0166abd2b0990bc4fad4cfc83aba1ed92fd",
      "wss://nos.lol",
      "reply",
      "5cdbde0a550fc046a38d75e9fc238094f96b0165b8297c4fa69134ae4ec80024"
    ],
    [
      "p",
      "5cdbde0a550fc046a38d75e9fc238094f96b0165b8297c4fa69134ae4ec80024"
    ]
  ],
  "content": "they are effectively the same thing, one just has more tunability.\n\na basic dot-product based classifier could be a classification head just by setting each output’s weights to the embed.\n\nI think a gated MLP may work better though, and would allow a lower LR for the main weights that could reduce OOD shift. Compared to a direct one it would allow some nonlinearity and a higher intermediate representation\n\nCould also intentionally bottleneck training by having LoRA instead of full FT on the base model (but usually not worth it when considering these are \u003c1B param)",
  "sig": "acde57f98dd4905ca175dd35191f9fce7db2fe0b4168f711bf704e1fc696ce6e1bd0d088560dea41dc0ff0de16feee9e46f6fb35ea7a1daf072e077852bb55a5"
}