Skip to content

Conversation

@ivkalita
Copy link
Contributor

@ivkalita ivkalita commented Nov 24, 2025

What this PR does / why we need it:

Adding columnar reader for (log)pointers section.

New reader is slower than existing RowReader due to double-translation of the data (columns->rows->rows vs columns->rows->columns->rows). The idea though is to get rid of the double-translation in the future.

Read section benchmarks

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/dataobj/metastore
cpu: Apple M4
                                     │ result.old.txt │           result.new.txt            │
                                     │     sec/op     │    sec/op     vs base               │
ReadSections/single_index_file-10        157.2µ ±  5%   161.5µ ± 10%       ~ (p=0.105 n=10)
ReadSections/multiple_index_files-10     3.199m ± 11%   3.333m ± 10%  +4.19% (p=0.023 n=10)
geomean                                  709.1µ         733.8µ        +3.47%

                                     │ result.old.txt │           result.new.txt            │
                                     │      B/op      │     B/op      vs base               │
ReadSections/single_index_file-10        542.9Ki ± 0%   535.8Ki ± 0%  -1.31% (p=0.000 n=10)
ReadSections/multiple_index_files-10     28.74Mi ± 0%   27.75Mi ± 0%  -3.47% (p=0.000 n=10)
geomean                                  3.904Mi        3.810Mi       -2.39%

                                     │ result.old.txt │           result.new.txt           │
                                     │   allocs/op    │  allocs/op   vs base               │
ReadSections/single_index_file-10         2.369k ± 0%   2.498k ± 0%  +5.45% (p=0.000 n=10)
ReadSections/multiple_index_files-10      112.8k ± 0%   111.9k ± 0%  -0.87% (p=0.000 n=10)
geomean                                   16.35k        16.72k       +2.24%

RowReader vs Reader comparison benchmarks

                                                │ result.old.txt │            result.new.txt            │
                                                │     sec/op     │    sec/op     vs base                │
Readers/1k_pointers,_10_stream_IDs/Reader-10        265.8µ ±  5%   278.3µ ± 15%   +4.70% (p=0.001 n=10)
Readers/1k_pointers,_200_stream_IDs/Reader-10       686.2µ ± 12%   965.1µ ±  2%  +40.65% (p=0.000 n=10)
Readers/10k_pointers,_200_stream_IDs/Reader-10      695.1µ ±  2%   977.2µ ±  1%  +40.57% (p=0.000 n=10)
Readers/100k_pointers,_200_stream_IDs/Reader-10     714.4µ ± 16%   982.6µ ±  1%  +37.55% (p=0.000 n=10)
geomean                                             548.6µ         712.6µ        +29.90%

                                                │ result.old.txt │            result.new.txt             │
                                                │      B/op      │     B/op       vs base                │
Readers/1k_pointers,_10_stream_IDs/Reader-10        623.4Ki ± 0%    496.2Ki ± 0%  -20.41% (p=0.000 n=10)
Readers/1k_pointers,_200_stream_IDs/Reader-10       876.2Ki ± 0%   1443.3Ki ± 0%  +64.72% (p=0.000 n=10)
Readers/10k_pointers,_200_stream_IDs/Reader-10      914.1Ki ± 0%   1475.1Ki ± 0%  +61.37% (p=0.000 n=10)
Readers/100k_pointers,_200_stream_IDs/Reader-10     914.5Ki ± 0%   1475.4Ki ± 0%  +61.34% (p=0.000 n=10)
geomean                                             822.0Ki         1.091Mi       +35.92%

                                                │ result.old.txt │           result.new.txt            │
                                                │   allocs/op    │  allocs/op   vs base                │
Readers/1k_pointers,_10_stream_IDs/Reader-10         8.557k ± 0%   8.738k ± 0%   +2.11% (p=0.000 n=10)
Readers/1k_pointers,_200_stream_IDs/Reader-10        12.58k ± 0%   20.07k ± 0%  +59.54% (p=0.000 n=10)
Readers/10k_pointers,_200_stream_IDs/Reader-10       12.59k ± 0%   20.08k ± 0%  +59.46% (p=0.000 n=10)
Readers/100k_pointers,_200_stream_IDs/Reader-10      12.59k ± 0%   20.08k ± 0%  +59.46% (p=0.000 n=10)
geomean                                              11.43k        16.31k       +42.66%

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@ivkalita ivkalita force-pushed the ivkalita/metastore-columnar-pointers-reader branch 4 times, most recently from a31b118 to 159273e Compare November 28, 2025 10:30
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a strong feeling it will look better in pkg/dataobj/sections/pointers than here (especially given iter.go already exists there). Yet I'd like to proceed with the modifications of readers for other sections first to better understand how to refactor them

_ = cfg.TargetPageSize.Set("2MB")
_ = cfg.TargetObjectSize.Set("1GB")
_ = cfg.BufferSize.Set("16MB")
_ = cfg.TargetSectionSize.Set("128MB")
Copy link
Contributor Author

@ivkalita ivkalita Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make format made this change

Comment on lines +24 to +27
_ = cfg.TargetPageSize.Set("128KB")
_ = cfg.TargetObjectSize.Set("64MB")
_ = cfg.BufferSize.Set("2MB")
_ = cfg.TargetSectionSize.Set("16MB")
Copy link
Contributor Author

@ivkalita ivkalita Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make format made this change

@ivkalita ivkalita force-pushed the ivkalita/metastore-columnar-pointers-reader branch from 159273e to 8190ded Compare November 28, 2025 10:37
@ivkalita ivkalita marked this pull request as ready for review November 28, 2025 10:37
@ivkalita ivkalita requested a review from a team as a code owner November 28, 2025 10:37
@ivkalita ivkalita requested a review from benclive November 28, 2025 10:37
return nil
}

type streamSectionPointerBuilder struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a new type here? Could you create a buf of pointers.SectionPointer instead?

actual, err := arrowtest.TableRows(memory.DefaultAllocator, actualTable)
require.NoError(t, err)

expected := arrowtest.Rows{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct: A range of 25-55 should cover both path2 and path3 rows. A timestamp range should match any stream it overlaps with, it doesn't have to completely cover it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants