Superflat - bridge builder with only flat files #247

adiabat · 2021-02-08T18:51:53Z

adiabat
Feb 8, 2021

Superflat - bridge builder with only flat files

One of the purposes of the bridge node is to build the TTL flat files, which let clients know how long an output will live. Currently in the code this is handled the following way:

block comes in
dedupe the block
for all the block's outputs:
add outpoint:position_in_block to a database or map
for all all the block's inputs:
query that map to get the position in the block that created it
remove the entry from the db / map
seek to the right block height, then seek to the txo position, then write the lifespan of the txo

One sort of theme with utreexo is that we're getting rid of databases, which is why using levelDB or even maps in the code is not great. Also, levelDB slows things way down.

In the case of utcd where we already are keeping track of a UTXO set, the position_in_block data is just a tiny extra 2 bytes to stick on the end of the UTXO, which we're already putting into a database anyway. But in the standalone bridgenode software, this 2 byte datum is the only reason to keep a database at all. Here's a way to get the same flat TTL files without a database. It will take more disk space, and might be a little slower (or faster, because of less GC? hard to guess, have to try it) but will not need a map or DB and not take up much ram.

Make a new flat file, alongside blk00000.dat, rev00000.dat, and proof00000.dat. I guess you could stick it inside proof00000.dat, and maybe we should, but for explanation I'll call it txids00000.dat. Inside txids00000.dat, the data is grouped by blocks, and within blocks data is grouped into transactions. Sounds familiar, but the tx entries are the following:

txid, position_in_block

txid is the 32 byte hash (but doesn't have to be 32 bytes, described later), and position_in_block is not the position of the transaction in the block in terms of being the nth transaction, but the position of the transaction's first output in the block in terms of being the nth output.

Also, importantly, the txids are not in the order they appear in the bitcoin block. They are sorted.

Example:

You have a block with 2 txs. A coinbase, (hash: aabbccdd...) with 2 outputs, and a regular tx (hash: 00112233...) with 3.
The txids block will look like:
{00112233, 2}, {aabbccdd, 0}

The non-coinbase tx appears first because it has a lower txid, and it's first output starts at position 2 (0 & 1 are taken up by the coinbase tx's 2 outputs)

Back of the envelope, say every mainnet block has 2K txs, with 32B txids and 2B positions, that's (32+2)*2K = 68KB per block or 45GB for all 666K blocks. But no because it's way less in the beginning; there's something like 600M tx ever, so 600M * 34B = 20GB. But also there's no need for all 32 bytes of the txid. No txids collide the first 16 bytes anywhere in the blockchain, let alone within a single block. So can cut that in half and only use a 16B prefix, then it's 600M * 18B = 11GB or so. Actually 8 byte txid prefix might be OK; we can deal with rare collisions by doing a slower search. With 8 bytes, you have 600M * 10B = 6GB, which seems reasonable to have for this.

This txids data is useless after the TTL file has been populated, so it can be compacted, with entries being removed once all the utxos in a transaction have been spent (which is easy to check for when you're writing a TTL). But that compaction would need to move everything around and is probably not worth it just to save a few gigabytes. Maybe do it very infrequently or manually if people want to clear up space.

So the new process would be:

block comes in
dedupe
with block outputs:
append to the txids file, and a little offset file (4bytes per block)
with block inputs:
seek into the txids file, do an interpolation search for the txid in the block
with the utxo position in block, write to the TTL flat file in the right place

Without any extra data, you'd have to do a linear search in the block to find the txid you want. (Also segment the data into txs, and hash everything you pass by in that linear search!) With this, you have to do an interpolation search through the txid prefixes, which in a typical 2000 transaction block will be log(log(2048)) = 3.459 seeks. Also those seeks are right next to each other so on a spinning disk this is probably pretty quick.

I can probably code this up this week. It doesn't seem too hard and flat file are kind of fun.

Shymaa-Arafat · 2021-04-30T23:03:40Z

Shymaa-Arafat
Apr 30, 2021

the position_in_block data is just a tiny extra 2 bytes to stick on the end of the UTXO, which we're already putting into a database anyway.

AHA....
It's strange that u ur self is the person who suggested this idea with different approaches txid or DB or ...?
(you replied with a different opinion in #266?)
.
There's no feedback on what u accomplished here?
I think also if u r going to process transactions somehow maybe the replacement thought in my last comment in issue #257
(since there is something like energy law here : the end of say n1 UTXOs is the beginning of another say n2 UTXOs, could we replace instead of delete?)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Superflat - bridge builder with only flat files #247

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Superflat - bridge builder with only flat files #247

adiabat Feb 8, 2021