Wouldn't scale to many users, unfortunately. 2-4 users with ...

1a5cff5118d071a2...

npub1rfw075gc6pc693w5v568xw4mnu7umlzpkfxmqye0cgxm7qw8tauqfck3t8

hex

fdbe47df7a47e05cec4c6a57ebd64384bcbb96bf721ecee9a12ea026f2475c56

nevent

nevent1qqs0m0j8maay0czua3xx54lt6epcf09mj6lhy8kwaxsjagpx7fr4c4sprpmhxue69uhhyetvv9ujuem4d36kwatvw5hx6mm9qgsp5h8l2yvdqudzch2x2drn82ae70wdl3qmyndszvhuyrdlq8r477qre6znq

Kind-1 (TextNote)

2026-02-22T22:47:00Z

↳ Reply to Event not found

b70320f82c7a220421fc305349b41d4bb039f73a2b159d3d4065c29bb295c850...

Wouldn't scale to many users, unfortunately. 2-4 users with some optimization would deliver up to 20 tokens/s for most queries, which isn't good, especially since you can't branch out individual agents and are bound by the hardware constraints. Hardware costs, energy use and maintenance would make this a moneydump, I fear. Adding more nodes didn't significantly bump token rates.

But just being able to self host such a model unquantized would've been unthinkable just 1-2 years ago, even with various hacks (like offloading to SSDs) on consumer hardware alone.

I hope we'll see small groups of people hosting private llms sustainably though. Trusted circles and their oracles, basically. 🌚

Raw JSON

{
  "kind": 1,
  "id": "fdbe47df7a47e05cec4c6a57ebd64384bcbb96bf721ecee9a12ea026f2475c56",
  "pubkey": "1a5cff5118d071a2c5d46534733abb9f3dcdfc41b24db0132fc20dbf01c75f78",
  "created_at": 1771800420,
  "tags": [
    [
      "alt",
      "A short note: Wouldn't scale to many users, unfortunately. 2-4 u..."
    ],
    [
      "e",
      "b70320f82c7a220421fc305349b41d4bb039f73a2b159d3d4065c29bb295c850",
      "wss://relay.primal.net/v1",
      "root",
      "76e36f2fabfbd8c353ffcda8fe07877a0977ee4aa9d9bcc224f02038d73b3787"
    ],
    [
      "p",
      "76e36f2fabfbd8c353ffcda8fe07877a0977ee4aa9d9bcc224f02038d73b3787",
      "wss://relay.chorus.community/"
    ]
  ],
  "content": "Wouldn't scale to many users, unfortunately. 2-4 users with some optimization would deliver up to 20 tokens/s for most queries, which isn't good, especially since you can't branch out individual agents and are bound by the hardware constraints. Hardware costs, energy use and maintenance would make this a moneydump, I fear. Adding more nodes didn't significantly bump token rates. \n\nBut just being able to self host such a model unquantized would've been unthinkable just 1-2 years ago, even with various hacks (like offloading to SSDs) on consumer hardware alone. \n\nI hope we'll see small groups of people hosting private llms sustainably though. Trusted circles and their oracles, basically. 🌚",
  "sig": "39d39c30ec62627c9f0d9a63dac9ddbc0c51d57d1ada63a723ab58647be36bab2d1bf361c124387dd3c81005e938478712f021ae089bc92ea847f213c05ffc19"
}