The Residual Keyspace

Every wallet in this post was drained years ago. This is a retrospective on dead wallets: no live funds were recovered, no usable secret or address list is published, and any currently-funded address that turned up (none did) sits behind a restricted denylist so they’re not saved and I’d never see them anyway.

A brainwallet is the most appealing bad idea in cryptocurrency key management. Instead of storing a 256-bit private key, guarding it and fearing its loss, you remember a passphrase, and the key is the hash of that phrase: priv = SHA256(passphrase). Nothing is written down; the wallet lives in your head and reconstitutes itself on demand. The catch is that the same derivation is available to everyone, so the key is only as strong as the passphrase is hard to guess, and guessing runs offline at millions of tries a second with nobody to rate-limit it and nobody to alert.

A decade ago, Vasek, Bonneau, Castellucci, Keith, and Moore measured what happens to such wallets in practice.¹ People chose guessable phrases, and a small market of automated “drainers” liquidated each wallet within minutes of its first deposit. The attack of the day was wordlists expanded by mangling rules, the same machinery as offline password cracking, packaged for this purpose as Castellucci’s brainflayer.² In the years since, that side of the problem has barely moved. The open tooling still chews through wordlists, and newer work on weak crypto keys went after different mechanisms, biased ECDSA nonces³ or broken RNG seeds, rather than better passphrase guessing.

Password modeling, meanwhile, changed completely. Melicher et al. showed that a character-level neural network estimates password guessability more accurately than the Markov models and grammars that came before it,⁴ and a line of GAN and transformer generators followed. All of that work is scored the same way: train on one leaked-password corpus, then count how many held-out leaked passwords the model reproduces inside some guess budget. The test set is always a list of human passwords. I wanted to point one of these models at a test set where a correct guess is worth money and the answer key is written on a public ledger.

This is the companion to an earlier post, where I scanned old drives for raw private keys lying in unzeroed free space. That search was over bytes; this one is over the much smaller, much stranger space of things a person would choose to type and expect to remember.

The question

The drainers of 2013 ran wordlists. A wordlist is a finite list, and rules only stretch it so far. So there is a set of passphrases that were weak enough for a person to pick and remember, weak enough to be guessable in principle, and yet absent from the lists everyone was running. Call it the residual keyspace: the funded wallets that a decade of wordlist attacks left on the table. I want to know how big it is, what it looks like, and whether any real money ever sat in it.

There is an obvious objection, so I will put it up front. The oracle I check against, the set of funded brainwallet addresses, is itself assembled from a decade of public cracking results.⁵ So in a strict sense nothing I find is unknown to everyone; it is unknown to the specific wordlist-and-rules record I diff against. “Novel” here means “my generator reaches it and the public list does not,” which makes every count below a lower bound on the real residual, not a measurement of it. I would rather state that plainly than dress the number up.

A neural net as a generator

The setup is small. We train a character-level LSTM on a leaked-password corpus, then sample passphrases from it one character at a time. For each candidate we derive the Bitcoin address the brainwallet construction would produce and check it against the funded oracle using brainflayer’s bloom filter followed by a sorted-index confirmation so there are no false positives. As a baseline we train OMEN, an ordered-Markov enumerator,⁶ on the same corpus and run it the same way. Then we plot recovered wallets against generation budget for both.

At a budget of $10^9$ candidates, the LSTM recovers 6,552 funded brainwallets and 40 passphrases absent from the wordlist record. OMEN recovers 2,920 and 19. The LSTM passes OMEN’s novel count at around $1.5 \times 10^7$ candidates and its funded count near $1.5 \times 10^8$ , so it reaches the same ground for roughly a sixth of the budget. That gap is real, though the comparison is rougher than the ratio suggests: the LSTM is sampled with replacement and pays for its own duplicates, while OMEN enumerates distinct candidates in order, so the two curves are not measured on quite the same axis.⁷ The headline I trust is the qualitative one. A model that learned the shape of human passwords reaches funded keys that a model of local character statistics does not.

What the wordlists could not say

The interesting part is which passphrases land in the LSTM-only column, because they sort themselves into a handful of structural families, and the families have a common property: a fixed-order Markov model cannot represent them.

A Markov model predicts the next character from the last few. That window is all it knows. So it handles password1 and letmein happily, because those are local statistics, and it is blind to anything whose structure lives outside the window:

Keyboard walks like 1q2w3e4r, where the pattern is spatial adjacency on the physical keyboard rather than letter-to-letter likelihood.
Running sequences: Fibonacci strings, full-alphabet runs, and arithmetic progressions, where each token depends on a sum or a position rather than on its neighbour.
Embedded whitespace and case structure: the difference between testing123 and testing 123, or a name capitalised the way a person capitalises it.
Multilingual lexemes: German words like sonnenblume and keineahnung, or ghjcnjgfhjkm, a Russian word typed on a QWERTY keyboard by someone whose layout was set to Cyrillic.

None of these are exotic. Each is the kind of string a person picks because it feels memorable and looks random. The neural model recovers them because it learned the global shape of how people compose secrets, the part of the distribution the wordlist era could see only where someone had already typed an example into a list.

The money

A residual full of clever-looking but empty wallets would be a curiosity. So for every re-discovered passphrase I looked up the peak balance its address ever held, and plotted it against the guessability the LSTM assigns the phrase, measured in bits as $-\log_2 p$ .

Value turns out to be close to independent of guessability. The empty passphrase, around 15 bits, once held roughly 50 BTC. But deadsheep held about 14 BTC at 28 bits, and 8964009 and ludogay held single-digit BTC up around 30 to 33 bits, well inside the band the neural model reaches and the wordlists did not.⁸ Real money sat behind secrets across the whole range, well past the sub-15-bit trivia that the early lists already covered.

The distribution underneath is heavily concentrated. A dense floor of wallets sits at about 0.000055 BTC, an amount too small to be worth the fee to move it. That floor is honeypot dust: tiny tracer payments people spray onto known-weak addresses to watch who sweeps them, and how fast. Above that floor, a few wallets hold almost everything: the top ten addresses account for the overwhelming majority of the value, and a single empty-string wallet is a large fraction of the total on its own. The median funded wallet is, in money terms, a rounding error.

The drain

By this point the model has done its job: the funded keyspace is larger than the wordlists found, and real money sat in it. The rest is on-chain forensics. Every funded brainwallet leaves a history, and that history is a play-by-play of an arms race.

Group the wallets by the year of their first real deposit and measure how long the money sat before something swept it. In 2011 the median time-to-sweep is days. In 2012 it is around twenty hours. From 2013 on it is near-instant, and the share of wallets emptied within a minute of funding climbs from near zero to roughly 58% that year, then up toward 100% by the end of the decade.⁹ The inflection lines up with the public brainwallet-cracking tooling becoming a commodity. Before it, sweeping a brainwallet was something a few people did slowly; after, it was a saturated market of bots racing each other to the same keys, and the gap between funding and theft collapsed to the time it takes to see a transaction confirm.

Anatomy of one theft

To see how short “instant” really is, follow one wallet. In May 2013 someone tests a brainwallet built from the word deadsheep with a 0.007 BTC deposit, and it is swept the same minute it confirms. Six weeks later, on the 24th of June, 14.92 BTC comes off an exchange, sits briefly in a fresh address, and 14.29 BTC of it goes into the deadsheep wallet at 03:58. A bot that has been watching the address since that first test deposit empties it in the same block, and moves the coins to an address where they have sat, untouched, ever since.¹⁰

The same bot had swept a 0.007 BTC test deposit to this wallet six weeks earlier, the minute it landed. It was watching the whole time.

The funding and the theft are in the same block. There is no window in which the money is safe, because the address was already on a watchlist before a single coin arrived. The owner is not slow or careless in any way that a faster reaction would have fixed; the word they chose was the vulnerability, and it was exploited the instant it held value.

The coins that never moved

That the coins never move again is not a quirk of this one wallet. It is the pattern, and it points at something odd about who is doing the draining.

The best-documented brainwallet loss is a round 50 BTC, deposited in 2015 behind a blank passphrase by someone who expected an empty string to be too obvious for anyone to try.¹¹ It plays out on-chain just as the forum thread describes: the 50 BTC arrives from an exchange withdrawal and is swept inside the minute by a drainer that forum regulars tie to the handle amaclin. The coins come to rest in an address that, a decade later, still holds 55 BTC and has never spent a satoshi. deadsheep’s 14.29 BTC sits frozen in the same way.

Both thefts end the same way: the thieves take real money and then never touch it. Perhaps they lost the keys, the way their victims did. Perhaps cashing out a tagged, endlessly-watched address is more trouble than it is worth. Either way, a good deal of the stolen brainwallet money was taken and never spent. Years later it is still sitting on the chain, untouched.

What was actually stolen

Clustering the sweeps by shared spending¹² resolves the thefts into a few thousand operators, most of them one-off sweeps of a single victim. The brainwallet drainers and the nonce-reuse drainers, the people exploiting biased ECDSA signatures, never share a transaction; as far as the chain shows they are different operators working the same blockchain for different weaknesses. A few are industrial: one brainwallet cluster appears in more than 180,000 sweep transactions across a decade.

Add up everything those clusters moved and the brainwallet total comes to about 10,400 BTC. It is tempting to read that as the size of the theft, but almost all of it is an artifact of how the counting works.

A sweeping bot does not take one wallet and stop. It rolls everything it has collected so far into each new transaction, so a single sweep of a fresh dust deposit also re-spends the bot’s entire running pile. Sum the outputs of every sweep transaction and you count that pile once per transaction, thousands of times over.

The bot rolls its whole pile into each new sweep, so summing transaction outputs counts that pile again every time. Throughput, not theft: the real number is the 201 BTC at the top, counted once.

Counted the other way, by the money that ever actually sat in a victim wallet rather than by transaction outputs, the number is small. About 214 BTC was ever deposited into a funded brainwallet, and once the honeypot dust is stripped, 201 BTC of that was real. So the theft is around 200 BTC, not 10,000. The gap is not owner gambling and not fees, which between them come to under half a percent of the gross; it is the same coins counted again and again.¹³

The dust pads the transaction count the same way it pads the gross. Of the roughly 1.2 million brainwallet deposits ever swept, all but 736 sit below 0.001 BTC: honeypot bait, not stored value. The entire 200 BTC of real theft lives in those 736 deposits.

That leaves the question of where the 200 BTC went. Some of it never moved, like the coins still frozen behind deadsheep and the blank-passphrase wallet. The rest was cashed out, and the largest operation did it in the open, depositing its pile straight to a major exchange’s hot wallet, an address with tens of thousands of transactions of its own. There was no mixer and no laundering chain.

The result

The residual keyspace is real, it had money in it, and the money is gone. A neural network reaches funded brainwallets that a decade of wordlist attacks missed, the wallets it uniquely reaches are the ones with structure a Markov model cannot encode, and some of them held real value. None of which helps anyone today, because the drainers got there years ago and the wallets have been empty ever since. The failure here is categorical: there is no version of “human-memorable” that survives a model which has learned what humans find memorable.

The method generalises beyond brainwallets. The standard way to evaluate a password model is to ask how well it reproduces a list of leaked passwords. Pointing the same model at a funded-key oracle turns that around: instead of asking how well the model predicts what people typed, we measure what the people who already attacked this space failed to type, and we get a ground truth in money rather than in held-out accuracy. The same lens fits any system that turns a remembered secret into a cryptographic key, as long as the derivation is cheap to check in bulk, which is the property that makes brainwallets measurable and a salted, stretched scheme thankfully not.

I didn’t expect to find any coins in this exercise, and I didn’t. For what it’s worth, I’d suggest anyone walking down the same path gate the temptation in their code, so that private keys for still-funded wallets are not kept. It’s better not to be hit with the temptation of a life-changing theft. For me, no amount of cash is worth the worry. Of course, your situation may be different. And if you do happen to benefit legitimately from this post, a few LBMA Good Delivery gold bars by way of thanks would be graciously accepted.

M. Vasek, J. Bonneau, R. Castellucci, C. Keith, and T. Moore, “The Bitcoin Brain Drain: Examining the Use and Abuse of Bitcoin Brain Wallets,” in Financial Cryptography and Data Security (FC 2016), LNCS 9603, Springer, 2017, pp. 609-618. ↩
R. Castellucci, “Cracking Cryptocurrency Brainwallets,” DEF CON 23, 2015. The tool is at ryancdotorg/brainflayer. ↩
J. Breitner and N. Heninger, “Biased Nonce Sense: Lattice Attacks against Weak ECDSA Signatures in Cryptocurrencies,” in Financial Cryptography and Data Security (FC 2019), LNCS 11598, Springer, 2019, pp. 3-20. ↩
W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer, N. Christin, and L. F. Cranor, “Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks,” in 25th USENIX Security Symposium, 2016, pp. 175-191. ↩
The funded-address set is consolidated from prior brainwallet cracking efforts, so it is closer to “funded addresses that someone has already cracked or listed” than to “all funded brainwallet addresses that ever existed.” Both the size of that set and the exact composition of the wordlist record I diff against to call a find “novel” need pinning down before any of these counts should be quoted as more than a lower bound. ↩
M. Dürmuth, F. Angelstorf, C. Castelluccia, D. Perito, and A. Chaabane, “OMEN: Faster Password Guessing Using an Ordered Markov Enumerator,” in Engineering Secure Software and Systems (ESSoS 2015), LNCS 8978, Springer, 2015, pp. 119-132. ↩
Ancestral sampling from the LSTM produces about 50% duplicate candidates at this budget, which are charged against it. OMEN enumerates distinct candidates in probability order. A clean comparison would either enumerate the LSTM’s highest-probability strings or sample OMEN, and would add at least one stronger baseline, a grammar-based generator or a rules-based cracker, rather than a single Markov model. The sixfold figure is read off one sampling run; it has no error band yet. ↩
The guessability axis is the LSTM’s own assigned probability, so it is the model scoring the same secrets it generated, which makes it a self-consistent ranking rather than an independent strength estimate. The novel finds themselves cannot be placed on this plot at all, since they were never funded in a wallet I can price; the value-versus-guessability picture is drawn from the re-discovered funded set and stands in for the rest. ↩
The early years are thin. 2011 rests on a handful of wallets and should not carry weight on its own. A dip in 2017, where the median sweep time rises briefly, comes from a small sample of that year’s deposits and is shown rather than smoothed away. The robust signal is the 2013 collapse and the saturation that follows. ↩
Reconstructed from a full archival Bitcoin node. The 14.29 BTC funding traces back to a multi-input consolidation of the kind exchanges produce; the same-block sweep and the destination’s subsequent decade of silence are both on the public ledger. Every address here is historical and long drained. ↩
The loss is documented in a 2015 Bitcoin forum thread, “50 BTC lost because of blank passphrase.” Matching it to the chain gives the exchange withdrawal that funded it, the sweep inside the minute, and a collector address that still holds 55 BTC unspent today. The amaclin attribution comes from forum users, not from anything the chain alone proves. ↩
S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G. M. Voelker, and S. Savage, “A Fistful of Bitcoins: Characterizing Payments Among Men with No Names,” in Internet Measurement Conference (IMC 2013), ACM, 2013, pp. 127-140. ↩
The theft-versus-throughput split leans on heuristics for dust, owner-relocation, and pipeline flow, and the figure for the nonce-reuse mechanism is rougher than the brainwallet one, so I trust the brainwallet roughly-fifty-times gap more than the combined hundred. Co-input clustering also cannot see operators who deliberately keep their inputs apart, so “no overlap between mechanisms” is a statement about what the clustering reveals, not a proof of disjoint operators. ↩

The question

A neural net as a generator

What the wordlists could not say

The money

The drain

Anatomy of one theft

The coins that never moved

What was actually stolen

The result

Footnotes