Replies: 1 comment
-
AHA.... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Superflat - bridge builder with only flat files
One of the purposes of the bridge node is to build the TTL flat files, which let clients know how long an output will live. Currently in the code this is handled the following way:
add outpoint:position_in_block to a database or map
query that map to get the position in the block that created it
remove the entry from the db / map
seek to the right block height, then seek to the txo position, then write the lifespan of the txo
One sort of theme with utreexo is that we're getting rid of databases, which is why using levelDB or even maps in the code is not great. Also, levelDB slows things way down.
In the case of utcd where we already are keeping track of a UTXO set, the position_in_block data is just a tiny extra 2 bytes to stick on the end of the UTXO, which we're already putting into a database anyway. But in the standalone bridgenode software, this 2 byte datum is the only reason to keep a database at all. Here's a way to get the same flat TTL files without a database. It will take more disk space, and might be a little slower (or faster, because of less GC? hard to guess, have to try it) but will not need a map or DB and not take up much ram.
Make a new flat file, alongside blk00000.dat, rev00000.dat, and proof00000.dat. I guess you could stick it inside proof00000.dat, and maybe we should, but for explanation I'll call it txids00000.dat. Inside txids00000.dat, the data is grouped by blocks, and within blocks data is grouped into transactions. Sounds familiar, but the tx entries are the following:
txid
,position_in_block
txid
is the 32 byte hash (but doesn't have to be 32 bytes, described later), andposition_in_block
is not the position of the transaction in the block in terms of being the nth transaction, but the position of the transaction's first output in the block in terms of being the nth output.Also, importantly, the txids are not in the order they appear in the bitcoin block. They are sorted.
Example:
You have a block with 2 txs. A coinbase, (hash:
aabbccdd
...) with 2 outputs, and a regular tx (hash:00112233
...) with 3.The txids block will look like:
{
00112233
, 2}, {aabbccdd
, 0}The non-coinbase tx appears first because it has a lower txid, and it's first output starts at position 2 (0 & 1 are taken up by the coinbase tx's 2 outputs)
Back of the envelope, say every mainnet block has 2K txs, with 32B txids and 2B positions, that's (32+2)*2K = 68KB per block or 45GB for all 666K blocks. But no because it's way less in the beginning; there's something like 600M tx ever, so 600M * 34B = 20GB. But also there's no need for all 32 bytes of the txid. No txids collide the first 16 bytes anywhere in the blockchain, let alone within a single block. So can cut that in half and only use a 16B prefix, then it's 600M * 18B = 11GB or so. Actually 8 byte txid prefix might be OK; we can deal with rare collisions by doing a slower search. With 8 bytes, you have 600M * 10B = 6GB, which seems reasonable to have for this.
This txids data is useless after the TTL file has been populated, so it can be compacted, with entries being removed once all the utxos in a transaction have been spent (which is easy to check for when you're writing a TTL). But that compaction would need to move everything around and is probably not worth it just to save a few gigabytes. Maybe do it very infrequently or manually if people want to clear up space.
So the new process would be:
append to the txids file, and a little offset file (4bytes per block)
seek into the txids file, do an interpolation search for the txid in the block
with the utxo position in block, write to the TTL flat file in the right place
Without any extra data, you'd have to do a linear search in the block to find the txid you want. (Also segment the data into txs, and hash everything you pass by in that linear search!) With this, you have to do an interpolation search through the txid prefixes, which in a typical 2000 transaction block will be log(log(2048)) = 3.459 seeks. Also those seeks are right next to each other so on a spinning disk this is probably pretty quick.
I can probably code this up this week. It doesn't seem too hard and flat file are kind of fun.
Beta Was this translation helpful? Give feedback.
All reactions