It's different from what I was thinking. Here is a Chinese L...

Laan Tungir

npub1rmz9gu6de0m0u4ysrn39crrud099ahvfgs6pvasl4hpjr5ud7yus54xv06

hex

10aa6b650849240d72955ec45f039451e9c60d634ce5bf8a33df3382df1a1425

nevent

nevent1qqspp2ntv5yyjfqdw224a3zlqw29r6wxp435eedl3gea7vuzmudpgfgprpmhxue69uhhyetvv9ujuem4d36kwatvw5hx6mm9qgspa3z5wdxuhah72jgpecjup37xhjj7mky5gdqkwc06msep6wxlzwg7wpxwu

Kind-1 (TextNote)

2026-06-28T18:15:53Z

↳ Reply to Laan Tungir (npub1rmz9gu6de0m0u4ysrn39crrud099ahvfgs6pvasl4hpjr5ud7yus54xv06)

Hmmmm, interesting. So you would have to share prompts in common as well? Maybe an encryption prompt and a decryption prompt? So if I were to send y...

It's different from what I was thinking. Here is a Chinese LLM generated example for irony.

A Concrete, Step-by-Step Example

To make this tangible, let's use a toy model with a small vocabulary of 11 words so you can see every number. The real scheme works identically, just with vocabularies of ~50,000 tokens and floating-point probabilities.

Setup

The model: A tiny "LLM" with a fixed vocabulary of 11 words:

{apple, date, banana, cherry, pie, juice, tart, sauce, for, with, and}

At each step, the model assigns a probability to all 11 words based on the context. The probabilities change as the context grows, and they always sum to 1.0. (A real LLM does the same thing, just with ~50,000 tokens instead of 11.)

The shared context/prompt: "I like to eat" — both Alice and Bob have this. It is not secret, just shared.

The shared secret key: Used to seed a pseudorandom number generator (PRNG). The key produces the random stream: 0.15, 0.62, 0.40, ... (uniform random numbers between 0 and 1).

The secret message Alice wants to send: The bits 0 1 1 (3 bits — in reality this would be a longer message, but let's keep it tiny).


Encoding (Alice's Side)

Step 1: Alice runs the model

Alice feeds the shared context "I like to eat" into the model. The model outputs a probability distribution over all 11 words. After "I like to eat", fruits are the most natural continuation, so they get the highest probabilities:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.20 | [0.00, 0.20) | | date | 0.15 | [0.20, 0.35) | | banana | 0.10 | [0.35, 0.45) | | cherry | 0.05 | [0.45, 0.50) | | pie | 0.20 | [0.50, 0.70) | | juice | 0.12 | [0.70, 0.82) | | tart | 0.08 | [0.82, 0.90) | | sauce | 0.05 | [0.90, 0.95) | | for | 0.03 | [0.95, 0.98) | | with | 0.015 | [0.98, 0.995) | | and | 0.005 | [0.995, 1.00) |

This is just normal LLM behavior — the model is predicting what word comes next. Notice the probabilities sum to 1.0, and the intervals partition [0, 1).

Step 2: Alice encodes her first secret bit

Alice's first secret bit is 0. She interprets this as: "look in the first half of the range [0, 1), i.e., [0.00, 0.50)."

The words whose intervals fall within [0.00, 0.50) are:

  • apple: [0.00, 0.20)
  • date: [0.20, 0.35)
  • banana: [0.35, 0.45)
  • cherry: [0.45, 0.50)

Alice uses her next random number from the shared key, 0.15, to pick within [0.00, 0.50). The value 0.15 falls in [0.00, 0.20), which is apple.

Alice outputs: "apple"

Text so far: "I like to eat apple"

Step 3: Alice runs the model again

Now the context is "I like to eat apple". The model outputs a new distribution over all 11 words. After "apple", food preparations like pie and tart become more likely, while the other fruits become less likely:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.15 | [0.00, 0.15) | | date | 0.10 | [0.15, 0.25) | | banana | 0.08 | [0.25, 0.33) | | cherry | 0.07 | [0.33, 0.40) | | for | 0.06 | [0.40, 0.46) | | with | 0.04 | [0.46, 0.50) | | pie | 0.20 | [0.50, 0.70) | | tart | 0.16 | [0.70, 0.86) | | juice | 0.08 | [0.86, 0.94) | | sauce | 0.04 | [0.94, 0.98) | | and | 0.02 | [0.98, 1.00) |

Alice's next secret bit is 1. This means "look in the second half [0.50, 1.00)."

The words whose intervals fall within [0.50, 1.00) are:

  • pie: [0.50, 0.70)
  • tart: [0.70, 0.86)
  • juice: [0.86, 0.94)
  • sauce: [0.94, 0.98)
  • and: [0.98, 1.00)

Alice uses her next random number, 0.62, to pick within [0.50, 1.00). She scales it into that range: 0.50 + 0.62 × 0.50 = 0.81. The value 0.81 falls in [0.70, 0.86), which is tart.

Alice outputs: "tart"

Text so far: "I like to eat apple tart"

Step 4: Alice runs the model again

Context: "I like to eat apple tart". Model output — after "apple tart", connectors and prepositions like "with" and "and" become the most likely continuations, while the fruits and preparations drop in probability:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.10 | [0.00, 0.10) | | date | 0.08 | [0.10, 0.18) | | banana | 0.07 | [0.18, 0.25) | | cherry | 0.06 | [0.25, 0.31) | | pie | 0.08 | [0.31, 0.39) | | juice | 0.06 | [0.39, 0.45) | | for | 0.05 | [0.45, 0.50) | | tart | 0.06 | [0.50, 0.56) | | sauce | 0.04 | [0.56, 0.60) | | with | 0.18 | [0.60, 0.78) | | and | 0.22 | [0.78, 1.00) |

Alice's next secret bit is 1. Look in the second half [0.50, 1.00):

The words whose intervals fall within [0.50, 1.00) are:

  • tart: [0.50, 0.56)
  • sauce: [0.56, 0.60)
  • with: [0.60, 0.78)
  • and: [0.78, 1.00)

Random number 0.40 → scaled: 0.50 + 0.40 × 0.50 = 0.70. Falls in [0.60, 0.78), which is with.

Alice outputs: "with"

Text so far: "I like to eat apple tart with"

Alice is done

Alice sends Bob the text: "I like to eat apple tart with"

(Plus whatever padding/continuation she wants to make it look like a complete sentence.)

To any observer, this looks like someone generated a sentence about food. Nothing suspicious.


Decoding (Bob's Side)

Bob has:

  • The received text: "I like to eat apple tart with"
  • The same model
  • The same shared context: "I like to eat"
  • The same shared key (so the same random stream: 0.15, 0.62, 0.40, ...)

Step D1: Bob runs the model

Bob feeds "I like to eat" into the model. He gets the same distribution Alice got:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.20 | [0.00, 0.20) | | date | 0.15 | [0.20, 0.35) | | banana | 0.10 | [0.35, 0.45) | | cherry | 0.05 | [0.45, 0.50) | | pie | 0.20 | [0.50, 0.70) | | juice | 0.12 | [0.70, 0.82) | | tart | 0.08 | [0.82, 0.90) | | sauce | 0.05 | [0.90, 0.95) | | for | 0.03 | [0.95, 0.98) | | with | 0.015 | [0.98, 0.995) | | and | 0.005 | [0.995, 1.00) |

Bob sees that Alice chose apple, which is in the interval [0.00, 0.20).

Now Bob asks: "Which half was apple in?" Apple is in [0.00, 0.20), which is in the first half [0.00, 0.50).

First half → bit 0

Bob recovers bit: 0

Step D2: Bob runs the model again

Bob appends "apple" to context: "I like to eat apple". Model output:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.15 | [0.00, 0.15) | | date | 0.10 | [0.15, 0.25) | | banana | 0.08 | [0.25, 0.33) | | cherry | 0.07 | [0.33, 0.40) | | for | 0.06 | [0.40, 0.46) | | with | 0.04 | [0.46, 0.50) | | pie | 0.20 | [0.50, 0.70) | | tart | 0.16 | [0.70, 0.86) | | juice | 0.08 | [0.86, 0.94) | | sauce | 0.04 | [0.94, 0.98) | | and | 0.02 | [0.98, 1.00) |

Bob sees Alice chose tart, interval [0.70, 0.86).

"Which half was tart in?" [0.70, 0.86) is in the second half [0.50, 1.00).

Second half → bit 1

Bob recovers bit: 1

Step D3: Bob runs the model again

Context: "I like to eat apple tart". Model output:

| Word | Probability | Interval | |--------|-------------|---------------| | apple | 0.10 | [0.00, 0.10) | | date | 0.08 | [0.10, 0.18) | | banana | 0.07 | [0.18, 0.25) | | cherry | 0.06 | [0.25, 0.31) | | pie | 0.08 | [0.31, 0.39) | | juice | 0.06 | [0.39, 0.45) | | for | 0.05 | [0.45, 0.50) | | tart | 0.06 | [0.50, 0.56) | | sauce | 0.04 | [0.56, 0.60) | | with | 0.18 | [0.60, 0.78) | | and | 0.22 | [0.78, 1.00) |

Bob sees Alice chose with, interval [0.60, 0.78).

"Which half?" [0.60, 0.78) is in the second half [0.50, 1.00).

Second half → bit 1

Bob recovers bit: 1


Result

Bob has recovered: 0 1 1 — exactly the message Alice sent.


Diagram

 ┌──────────────────────────────────────────────────────────────────┐
 │  ALICE (ENCODER)                                                  │
 │                                                                  │
 │  Secret bits: 0, 1, 1                                            │
 │  Shared context: "I like to eat"                                 │
 │  Shared key → random stream: 0.15, 0.62, 0.40                    │
 │                                                                  │
 │  For each bit:                                                   │
 │    1. Run model with current context → distribution              │
 │    2. Use bit to pick which HALF of [0,1) to look in             │
 │    3. Use random# to pick a word within that half                │
 │    4. Append word to context, repeat                             │
 │                                                                  │
 │  Output: "I like to eat apple tart with"                         │
 └──────────────────────────────────────────────────────────────────┘
                              │
                              │  (sent over channel — looks innocent)
                              ▼
 ┌──────────────────────────────────────────────────────────────────┐
 │  BOB (DECODER)                                                   │
 │                                                                  │
 │  Received text: "I like to eat apple tart with"                  │
 │  Shared context: "I like to eat"   (SAME as Alice)               │
 │  Shared key → random stream: 0.15, 0.62, 0.40  (SAME as Alice)   │
 │                                                                  │
 │  For each word:                                                  │
 │    1. Run model with current context → SAME distribution         │
 │    2. Find which interval the word falls in                      │
 │    3. Determine which HALF that interval is in                   │
 │    4. That half = the recovered bit                              │
 │    5. Append word to context, repeat                             │
 │                                                                  │
 │  Recovered bits: 0, 1, 1  ✓                                      │
 └──────────────────────────────────────────────────────────────────┘

Key Points

  1. The LLM never saw the secret message. The secret bits controlled which word was picked from the distribution. The LLM just produced distributions.

  2. The output looks natural. "I like to eat apple tart with" is a perfectly normal sentence. A censor seeing this has no reason to be suspicious.

  3. Both sides run the same model with the same context. This is why they get the same distributions. If Bob had a different model or different context, the distributions would differ and decoding would fail.

  4. The shared key provides the random stream. This is needed for the encoding to be secure — without it, the censor could run the same model and try to decode. With the key, the censor can't reproduce the random choices.

  5. Each token encodes roughly 1 bit in this toy example (because we split into 2 halves). In reality, with arithmetic coding over a 50,000-word vocabulary, you can encode multiple bits per token — roughly 2–4 bits per token for a real LLM, depending on how uncertain the model is at each step.

  6. The "half" splitting is the simplified version. The real scheme (from the Meteor paper, eprint.iacr.org/2021/686) uses full arithmetic coding, which is more efficient — it doesn't waste the probability mass the way a 2-way split does. But the principle is the same: the secret bits steer the selection within the model's probability distribution.

Raw JSON

{
  "kind": 1,
  "id": "10aa6b650849240d72955ec45f039451e9c60d634ce5bf8a33df3382df1a1425",
  "pubkey": "1ec454734dcbf6fe54901ce25c0c7c6bca5edd89443416761fadc321d38df139",
  "created_at": 1782670553,
  "tags": [
    [
      "e",
      "b36d106bf285829e448fb45f165de0cf15b7ca2fe35cbebb96e6675fa6915816",
      "wss://relay.primal.net/",
      "root",
      "675b84fe75e216ab947c7438ee519ca7775376ddf05dadfba6278bd012e1d728"
    ],
    [
      "e",
      "ea811c52ba2ce6e3b9fe234c679c917feae3d4e86fdcd5094903f35bfd77902b",
      "wss://relay.laantungir.net/",
      "reply",
      "1ec454734dcbf6fe54901ce25c0c7c6bca5edd89443416761fadc321d38df139"
    ],
    [
      "p",
      "675b84fe75e216ab947c7438ee519ca7775376ddf05dadfba6278bd012e1d728"
    ]
  ],
  "content": "It's different from what I was thinking. Here is a Chinese LLM generated example for irony.\n\n\n## A Concrete, Step-by-Step Example\n\nTo make this tangible, let's use a **toy model with a small vocabulary of 11 words** so you can see every number. The real scheme works identically, just with vocabularies of ~50,000 tokens and floating-point probabilities.\n\n### Setup\n\n**The model:** A tiny \"LLM\" with a fixed vocabulary of 11 words:\n\n`{apple, date, banana, cherry, pie, juice, tart, sauce, for, with, and}`\n\nAt each step, the model assigns a probability to **all 11 words** based on the context. The probabilities change as the context grows, and they always sum to 1.0. (A real LLM does the same thing, just with ~50,000 tokens instead of 11.)\n\n**The shared context/prompt:** `\"I like to eat\"` — both Alice and Bob have this. It is not secret, just shared.\n\n**The shared secret key:** Used to seed a pseudorandom number generator (PRNG). The key produces the random stream: `0.15, 0.62, 0.40, ...` (uniform random numbers between 0 and 1).\n\n**The secret message Alice wants to send:** The bits `0 1 1` (3 bits — in reality this would be a longer message, but let's keep it tiny).\n\n---\n\n### Encoding (Alice's Side)\n\n#### Step 1: Alice runs the model\n\nAlice feeds the shared context `\"I like to eat\"` into the model. The model outputs a probability distribution over all 11 words. After \"I like to eat\", fruits are the most natural continuation, so they get the highest probabilities:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.20        | [0.00, 0.20)  |\n| date   | 0.15        | [0.20, 0.35)  |\n| banana | 0.10        | [0.35, 0.45)  |\n| cherry | 0.05        | [0.45, 0.50)  |\n| pie    | 0.20        | [0.50, 0.70)  |\n| juice  | 0.12        | [0.70, 0.82)  |\n| tart   | 0.08        | [0.82, 0.90)  |\n| sauce  | 0.05        | [0.90, 0.95)  |\n| for    | 0.03        | [0.95, 0.98)  |\n| with   | 0.015       | [0.98, 0.995) |\n| and    | 0.005       | [0.995, 1.00) |\n\nThis is just normal LLM behavior — the model is predicting what word comes next. Notice the probabilities sum to 1.0, and the intervals partition [0, 1).\n\n#### Step 2: Alice encodes her first secret bit\n\nAlice's first secret bit is `0`. She interprets this as: \"look in the **first half** of the range [0, 1), i.e., [0.00, 0.50).\"\n\nThe words whose intervals fall within [0.00, 0.50) are:\n- apple: [0.00, 0.20)\n- date: [0.20, 0.35)\n- banana: [0.35, 0.45)\n- cherry: [0.45, 0.50)\n\nAlice uses her next random number from the shared key, `0.15`, to pick within [0.00, 0.50). The value `0.15` falls in [0.00, 0.20), which is **apple**.\n\n**Alice outputs: \"apple\"**\n\nText so far: `\"I like to eat apple\"`\n\n#### Step 3: Alice runs the model again\n\nNow the context is `\"I like to eat apple\"`. The model outputs a new distribution over all 11 words. After \"apple\", food preparations like pie and tart become more likely, while the other fruits become less likely:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.15        | [0.00, 0.15)  |\n| date   | 0.10        | [0.15, 0.25)  |\n| banana | 0.08        | [0.25, 0.33)  |\n| cherry | 0.07        | [0.33, 0.40)  |\n| for    | 0.06        | [0.40, 0.46)  |\n| with   | 0.04        | [0.46, 0.50)  |\n| pie    | 0.20        | [0.50, 0.70)  |\n| tart   | 0.16        | [0.70, 0.86)  |\n| juice  | 0.08        | [0.86, 0.94)  |\n| sauce  | 0.04        | [0.94, 0.98)  |\n| and    | 0.02        | [0.98, 1.00)  |\n\nAlice's next secret bit is `1`. This means \"look in the **second half** [0.50, 1.00).\"\n\nThe words whose intervals fall within [0.50, 1.00) are:\n- pie: [0.50, 0.70)\n- tart: [0.70, 0.86)\n- juice: [0.86, 0.94)\n- sauce: [0.94, 0.98)\n- and: [0.98, 1.00)\n\nAlice uses her next random number, `0.62`, to pick within [0.50, 1.00). She scales it into that range: `0.50 + 0.62 × 0.50 = 0.81`. The value `0.81` falls in [0.70, 0.86), which is **tart**.\n\n**Alice outputs: \"tart\"**\n\nText so far: `\"I like to eat apple tart\"`\n\n#### Step 4: Alice runs the model again\n\nContext: `\"I like to eat apple tart\"`. Model output — after \"apple tart\", connectors and prepositions like \"with\" and \"and\" become the most likely continuations, while the fruits and preparations drop in probability:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.10        | [0.00, 0.10)  |\n| date   | 0.08        | [0.10, 0.18)  |\n| banana | 0.07        | [0.18, 0.25)  |\n| cherry | 0.06        | [0.25, 0.31)  |\n| pie    | 0.08        | [0.31, 0.39)  |\n| juice  | 0.06        | [0.39, 0.45)  |\n| for    | 0.05        | [0.45, 0.50)  |\n| tart   | 0.06        | [0.50, 0.56)  |\n| sauce  | 0.04        | [0.56, 0.60)  |\n| with   | 0.18        | [0.60, 0.78)  |\n| and    | 0.22        | [0.78, 1.00)  |\n\nAlice's next secret bit is `1`. Look in the second half [0.50, 1.00):\n\nThe words whose intervals fall within [0.50, 1.00) are:\n- tart: [0.50, 0.56)\n- sauce: [0.56, 0.60)\n- with: [0.60, 0.78)\n- and: [0.78, 1.00)\n\nRandom number `0.40` → scaled: `0.50 + 0.40 × 0.50 = 0.70`. Falls in [0.60, 0.78), which is **with**.\n\n**Alice outputs: \"with\"**\n\nText so far: `\"I like to eat apple tart with\"`\n\n#### Alice is done\n\nAlice sends Bob the text: **\"I like to eat apple tart with\"**\n\n(Plus whatever padding/continuation she wants to make it look like a complete sentence.)\n\nTo any observer, this looks like someone generated a sentence about food. Nothing suspicious.\n\n---\n\n### Decoding (Bob's Side)\n\nBob has:\n- The received text: `\"I like to eat apple tart with\"`\n- The same model\n- The same shared context: `\"I like to eat\"`\n- The same shared key (so the same random stream: `0.15, 0.62, 0.40, ...`)\n\n#### Step D1: Bob runs the model\n\nBob feeds `\"I like to eat\"` into the model. He gets the **same** distribution Alice got:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.20        | [0.00, 0.20)  |\n| date   | 0.15        | [0.20, 0.35)  |\n| banana | 0.10        | [0.35, 0.45)  |\n| cherry | 0.05        | [0.45, 0.50)  |\n| pie    | 0.20        | [0.50, 0.70)  |\n| juice  | 0.12        | [0.70, 0.82)  |\n| tart   | 0.08        | [0.82, 0.90)  |\n| sauce  | 0.05        | [0.90, 0.95)  |\n| for    | 0.03        | [0.95, 0.98)  |\n| with   | 0.015       | [0.98, 0.995) |\n| and    | 0.005       | [0.995, 1.00) |\n\nBob sees that Alice chose **apple**, which is in the interval [0.00, 0.20).\n\nNow Bob asks: \"Which half was apple in?\" Apple is in [0.00, 0.20), which is in the **first half** [0.00, 0.50).\n\n**First half → bit `0`**\n\n**Bob recovers bit: `0`** ✓\n\n#### Step D2: Bob runs the model again\n\nBob appends \"apple\" to context: `\"I like to eat apple\"`. Model output:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.15        | [0.00, 0.15)  |\n| date   | 0.10        | [0.15, 0.25)  |\n| banana | 0.08        | [0.25, 0.33)  |\n| cherry | 0.07        | [0.33, 0.40)  |\n| for    | 0.06        | [0.40, 0.46)  |\n| with   | 0.04        | [0.46, 0.50)  |\n| pie    | 0.20        | [0.50, 0.70)  |\n| tart   | 0.16        | [0.70, 0.86)  |\n| juice  | 0.08        | [0.86, 0.94)  |\n| sauce  | 0.04        | [0.94, 0.98)  |\n| and    | 0.02        | [0.98, 1.00)  |\n\nBob sees Alice chose **tart**, interval [0.70, 0.86).\n\n\"Which half was tart in?\" [0.70, 0.86) is in the **second half** [0.50, 1.00).\n\n**Second half → bit `1`**\n\n**Bob recovers bit: `1`** ✓\n\n#### Step D3: Bob runs the model again\n\nContext: `\"I like to eat apple tart\"`. Model output:\n\n| Word   | Probability | Interval      |\n|--------|-------------|---------------|\n| apple  | 0.10        | [0.00, 0.10)  |\n| date   | 0.08        | [0.10, 0.18)  |\n| banana | 0.07        | [0.18, 0.25)  |\n| cherry | 0.06        | [0.25, 0.31)  |\n| pie    | 0.08        | [0.31, 0.39)  |\n| juice  | 0.06        | [0.39, 0.45)  |\n| for    | 0.05        | [0.45, 0.50)  |\n| tart   | 0.06        | [0.50, 0.56)  |\n| sauce  | 0.04        | [0.56, 0.60)  |\n| with   | 0.18        | [0.60, 0.78)  |\n| and    | 0.22        | [0.78, 1.00)  |\n\nBob sees Alice chose **with**, interval [0.60, 0.78).\n\n\"Which half?\" [0.60, 0.78) is in the **second half** [0.50, 1.00).\n\n**Second half → bit `1`**\n\n**Bob recovers bit: `1`** ✓\n\n---\n\n### Result\n\nBob has recovered: `0 1 1` — exactly the message Alice sent.\n\n---\n\n## Diagram\n\n```\n ┌──────────────────────────────────────────────────────────────────┐\n │  ALICE (ENCODER)                                                  │\n │                                                                  │\n │  Secret bits: 0, 1, 1                                            │\n │  Shared context: \"I like to eat\"                                 │\n │  Shared key → random stream: 0.15, 0.62, 0.40                    │\n │                                                                  │\n │  For each bit:                                                   │\n │    1. Run model with current context → distribution              │\n │    2. Use bit to pick which HALF of [0,1) to look in             │\n │    3. Use random# to pick a word within that half                │\n │    4. Append word to context, repeat                             │\n │                                                                  │\n │  Output: \"I like to eat apple tart with\"                         │\n └──────────────────────────────────────────────────────────────────┘\n                              │\n                              │  (sent over channel — looks innocent)\n                              ▼\n ┌──────────────────────────────────────────────────────────────────┐\n │  BOB (DECODER)                                                   │\n │                                                                  │\n │  Received text: \"I like to eat apple tart with\"                  │\n │  Shared context: \"I like to eat\"   (SAME as Alice)               │\n │  Shared key → random stream: 0.15, 0.62, 0.40  (SAME as Alice)   │\n │                                                                  │\n │  For each word:                                                  │\n │    1. Run model with current context → SAME distribution         │\n │    2. Find which interval the word falls in                      │\n │    3. Determine which HALF that interval is in                   │\n │    4. That half = the recovered bit                              │\n │    5. Append word to context, repeat                             │\n │                                                                  │\n │  Recovered bits: 0, 1, 1  ✓                                      │\n └──────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Key Points\n\n1. **The LLM never saw the secret message.** The secret bits controlled *which word was picked* from the distribution. The LLM just produced distributions.\n\n2. **The output looks natural.** \"I like to eat apple tart with\" is a perfectly normal sentence. A censor seeing this has no reason to be suspicious.\n\n3. **Both sides run the same model with the same context.** This is why they get the same distributions. If Bob had a different model or different context, the distributions would differ and decoding would fail.\n\n4. **The shared key provides the random stream.** This is needed for the encoding to be secure — without it, the censor could run the same model and try to decode. With the key, the censor can't reproduce the random choices.\n\n5. **Each token encodes roughly 1 bit** in this toy example (because we split into 2 halves). In reality, with arithmetic coding over a 50,000-word vocabulary, you can encode **multiple bits per token** — roughly 2–4 bits per token for a real LLM, depending on how uncertain the model is at each step.\n\n6. **The \"half\" splitting is the simplified version.** The real scheme (from the Meteor paper, eprint.iacr.org/2021/686) uses full arithmetic coding, which is more efficient — it doesn't waste the probability mass the way a 2-way split does. But the principle is the same: the secret bits steer the selection within the model's probability distribution.",
  "sig": "4b07aa3820c314a16a7f0aff7d56677fa76161b0a062b263dd3ace32477e9b213c5dac64f0f6e048483293230010942c7b0c45af6cedb992852c47801958e9d6"
}