Keycarver

Code on GitHub at captainpete/keycarver. CUDA build available for larger images.

I had downtime, a junk drawer full of dusty 4GB USB drives, and a half-remembered suspicion that one of them contained a backup of some Bitcoin keys I’d misplaced years earlier. The drives had been formatted, but free-space hadn’t been zeroed. The bytes might still be there, just unindexed.

The usual tool for this kind of recovery is PhotoRec-style file carving: scan for known format signatures and reconstruct files. The problem is that early Bitcoin wallets used BerkeleyDB or SQLite, both of which produce a lot of false positives from partial or corrupted structures. I was getting piles of plausible-looking files that turned out to be noise.

So I switched approaches. A Bitcoin private key is 32 bytes. If one existed on disk in raw big-endian format, it had to appear somewhere in the image as a contiguous 32-byte sequence. Testing every possible sequence directly is brute force, but it’s also exact: no format parsing, no signature matching, no partial reconstruction.

The pipeline

I ran a full Bitcoin node to download all the blocks, then scanned every block file for P2PKH and P2WPKH transactions, extracting the public key hashes (PKHs) and storing them in RocksDB for deduplication. That gave me a set of 20-byte hashes covering every known P2PKH and P2WPKH address on the blockchain.1

With that in hand, I imaged each drive with ddrescue and ran a sliding 32-byte window across the image, testing each position as a candidate private key:

  1. Check if it falls within secp256k1 curve order (filters out most sequences immediately)
  2. Derive the compressed public key
  3. Compute the PKH: RIPEMD160(SHA256(PK))
  4. Look up the PKH in the address index

If the lookup matches, the byte sequence is a valid private key corresponding to a known Bitcoin address.

matchreader threadsliding 32-byte window · memory-mapped disk imageLRU cacheduplicate byte sequences skipped before dispatchworker poolSK → PK via secp256k1 · MPHF index lookup · parallelised

The address index

The lookup step is where most of the engineering went. There are roughly a billion known P2PKH addresses. A hash set would work but the memory footprint is large. I wanted O(1) lookups from a compact, memory-mappable structure.

The solution was a minimal perfect hash function (MPHF). Given the full set of known PKHs, an MPHF assigns each one a unique integer offset, with no collisions and no wasted slots. The index is then a flat binary file: at each offset, the PKH itself is stored. A lookup works like this: hash the query PKH to get an offset, read the PKH at that position, and check if they match. If they do, the address is in the index.2

Both the MPHF and the index file are memory-mapped at scan time, so the OS handles paging and parallel lookups need no locking.

Building the index takes a few steps. After deduplication in RocksDB, the key space is partitioned into 64 ranges and PKHs are streamed into staging files. The MPHF is built from those using boomphf, which supports chunked parallel construction. Then the staging files are fed back through the MPHF to write the flat index. Staging files and the RocksDB database are then cleaned up.

Scan performance

The scan runs at around 330k keys per second on a 32-core machine. Most of that time is in the secp256k1 point multiplication (SK → PK). Hashing is cheap by comparison.

Disk images, especially from formatted drives, contain a lot of repeated byte sequences: zeroed regions, filesystem metadata written multiple times, repeated patterns from old file contents. Testing the same 32-byte sequence twice is wasted work, so there’s a small LRU cache on the reader side, keyed on the raw bytes, that filters duplicates before they reach the worker threads. On my drives this cut a significant fraction of the candidate load.

EC multiplication parallelizes well on a GPU. I added a CUDA path that runs the full SK→PKH pipeline in a single kernel, one thread per byte offset, against a precomputed table of 256 EC points loaded at startup.3 Output from each GPU batch feeds into CPU-side index lookups via a double-buffer pipeline, so the two sides overlap rather than taking turns. On an RTX 3090 this reaches around 166M keys/sec, roughly 500x the CPU rate.

The result

I recovered previously lost keys. None corresponded to addresses with any balance.

It’s worth noting the implication in reverse: anyone with physical access to those drives after they left my possession could have run the same scan. Unzeroed free-space on drives that once held Bitcoin wallets is a real exposure. If the wallets had been funded, formatting wouldn’t have been enough.

If it turns up a loaded wallet, a few LBMA Good Delivery gold bars by way of thanks would be graciously accepted.

Footnotes

  1. P2PKH (pay-to-public-key-hash) is the standard transaction format for early Bitcoin. P2WPKH (native SegWit) uses the same underlying 20-byte hash structure, RIPEMD160(SHA256(public key)), so both can be indexed identically. P2SH and Taproot aren’t covered.

  2. The MPHF guarantees no collisions for keys in the set, but for keys outside it can return any offset. A key not in the index might hash to the same slot as a real PKH. The stored-PKH comparison handles this: if the hash maps an unknown query to slot 42, and slot 42 contains a different PKH, the comparison fails. No false positives.

  3. The table stores G, 2G, 4G, …, 2^255·G. Each scalar multiply becomes at most 256 point additions drawn from the table, with no doublings. Register pressure limits SM occupancy to 1—2 warps on the 3090. The kernel consistently finishes batches well ahead of the CPU-side MPHF lookups, 64 rayon threads doing random memory accesses across the index; that lookup is what caps throughput.

@misc{hollows2025keycarve,
  author = {Hollows, Peter},
  title  = {{Keycarver}},
  year   = {2025},
  month  = jan,
  url    = {https://dojo7.com/2025/01/08/keycarver/}
}