Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement streaming lz4 compession #1611

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

zhwufd
Copy link

@zhwufd zhwufd commented Nov 20, 2021

Compress method Compress size(B) Compress time(us) Decompress time(us) Compress throughput(MB/s) Decompress throughput(MB/s) Compress ratio
Snappy 128 0.630420 0.676840 193.633312 180.353278 37.500000%
Gzip 128 4.886340 0.804860 24.981952 151.666517 47.656250%
Zlib 128 4.376520 0.665340 27.892095 183.470575 38.281250%
Lz4 128 0.614140 0.359200 198.766263 339.839400 58.115935%

          Snappy                1024            0.690060            0.697920                   1415.184911                   1399.247048                     8.789062%
            Gzip                1024            7.144860            1.752040                    136.680425                    557.385962                     6.640625%
            Zlib                1024            6.703100            1.304000                    145.688189                    748.897623                     5.468750%
             Lz4                1024            0.781000            0.520440                   1250.400128                   1876.417070                     7.810039%

          Snappy               16384            4.155420            4.253955                   3760.149466                   3673.052421                     4.962158%
            Gzip               16384           55.165820           19.621680                    283.236974                    796.313070                     0.781250%
            Zlib               16384           48.379395           13.014209                    322.968077                   1200.610811                     0.708008%
             Lz4               16384            3.494336            3.105762                   4471.521994                   5030.971921                     1.227403%

          Snappy               32768            7.633301            8.515039                   4093.903921                   3669.977292                     4.824829%
            Gzip               32768          105.522363           37.225879                    296.145756                    839.469770                     0.537109%
            Zlib               32768           92.468164           24.807813                    337.954152                   1259.683819                     0.500488%
             Lz4               32768            6.346191            4.902539                   4924.213280                   6374.248038                     1.037690%

          Snappy              524288          121.535938          143.765625                   4114.009488                   3477.882839                     4.747772%
            Gzip              524288         2128.425000          564.970312                    234.915489                    885.002254                     0.305939%
            Zlib              524288         1917.210938          367.104688                    260.795508                   1362.009304                     0.303650%
             Lz4              524288          105.067188           66.967187                   4758.859658                   7466.343125                     0.857538%

LZ4 compress/decompress throughput outperform snappy, gzip and zlib.

@zhwufd zhwufd changed the title Add streaming lz4 compession Implement streaming lz4 compession Nov 22, 2021
@zhwufd zhwufd closed this Nov 22, 2021
@zhwufd zhwufd reopened this Nov 22, 2021
@zhwufd zhwufd closed this Nov 25, 2021
@zhwufd zhwufd reopened this Nov 25, 2021
@zhwufd
Copy link
Author

zhwufd commented Nov 25, 2021

本地Makefile编译没问题,travis-ci编译不过。

@@ -0,0 +1,2495 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要更新下LICENSE

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LICENSE更新完成了。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件需要加上额外的namespace,不然如果应用方也使用了lz4的话,会造成链接冲突

size_t ref_cnt = in.backing_block_num();
LZ4_stream_t* lz4_stream = LZ4_createStream();
butil::IOBuf block_buf;
std::vector<size_t> block_metas;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用size_t, 这个跨架构大小不是确定的。 这里得用int32或者int16.

block_metas.emplace_back(src_block_size);
}
size_t nblocks = block_metas.size() / 2;
out->append(&nblocks, sizeof(size_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能直接这么写,需要转换成网络序。另外同上,这里不能用size_t

return false;
}
std::vector<size_t> block_metas(nblocks * 2, 0);
buf_iter.copy_and_forward(block_metas.data(), nblocks * 2 * sizeof(size_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些都得实现为序列化,这里如果考虑压缩的话,可以用varint encoding(protobuf应该有类似的接口)

return false;
}
LZ4_streamDecode_t* lz4_stream_decode = LZ4_createStreamDecode();
char* in_scratch = new char[max_block];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用DEFNIE_SMALL_ARRAY, 大部分情况下应该不会很大.

LZ4_freeStreamDecode(lz4_stream_decode);
return false;
}
out->append_user_data(out_buf, dst_block_size, [](void *d) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iobuf as zero copy stream这里无法使用么, 必须依赖user_data?

@@ -386,6 +387,11 @@ static void GlobalInitializeOrDieImpl() {
if (RegisterCompressHandler(COMPRESS_TYPE_SNAPPY, snappy_compress) != 0) {
exit(1);
}
const CompressHandler lz4_compress =
{ Lz4Compress, Lz4Decompress, "lz4" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lz4有 frame, block, stream不同模式(记忆中block似乎是压缩比最高的). 这里要么就改成lz4s. 而不是直接用lz4.

butil::IOBufAsZeroCopyOutputStream wrapper(&serialized_pb);
if (res.SerializeToZeroCopyStream(&wrapper)) {
return Lz4Compress(serialized_pb, buf);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议写一下文档先讨论下对应压缩的wire format, 这里似乎并不是效率最高的。 结合zero copy stream, 这里应该能做到一边序列化一边压缩(至少传输格式上需要能保留这种实现可能)。 现在还是多构造了一次中间数据。

@ehds
Copy link
Contributor

ehds commented Nov 15, 2023

这个PR有考虑继续跟进吗?

@zhwufd
Copy link
Author

zhwufd commented Nov 16, 2023

这个PR有考虑继续跟进吗?

我这周再优化下。

@wanghenshui
Copy link

这个PR有考虑继续跟进吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants