Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add views for metrics about pageserver requests #9008

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hlinnaka
Copy link
Contributor

The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things.

The metrics are not yet exported anywhere, but you can query them manually.

This is what the view looks like:

postgres=# select * from neon_perf_counters ;
                   metric                   |  value  
--------------------------------------------+---------
 getpage_wait_seconds_count                 |    6881
 getpage_wait_seconds_sum                   | 0.93566
 getpage_wait_seconds_bucket{le="0.000020"} |       0
 getpage_wait_seconds_bucket{le="0.000030"} |       0
 getpage_wait_seconds_bucket{le="0.000060"} |    1324
 getpage_wait_seconds_bucket{le="0.000100"} |    2886
 getpage_wait_seconds_bucket{le="0.000200"} |    5589
 getpage_wait_seconds_bucket{le="0.000300"} |    6611
 getpage_wait_seconds_bucket{le="0.000600"} |    6875
 getpage_wait_seconds_bucket{le="0.001"}    |    6880
 getpage_wait_seconds_bucket{le="0.002"}    |    6880
 getpage_wait_seconds_bucket{le="0.003"}    |    6880
 getpage_wait_seconds_bucket{le="0.006"}    |    6880
 getpage_wait_seconds_bucket{le="0.010"}    |    6880
 getpage_wait_seconds_bucket{le="0.020"}    |    6881
 getpage_wait_seconds_bucket{le="0.030"}    |    6881
 getpage_wait_seconds_bucket{le="0.060"}    |    6881
 getpage_wait_seconds_bucket{le="0.100"}    |    6881
 getpage_wait_seconds_bucket{le="0.200"}    |    6881
 getpage_wait_seconds_bucket{le="0.300"}    |    6881
 getpage_wait_seconds_bucket{le="0.600"}    |    6881
 getpage_wait_seconds_bucket{le="1"}        |    6881
 getpage_wait_seconds_bucket{le="2"}        |    6881
 getpage_wait_seconds_bucket{le="3"}        |    6881
 getpage_wait_seconds_bucket{le="6"}        |    6881
 getpage_wait_seconds_bucket{le="10"}       |    6881
 getpage_wait_seconds_bucket{le="20"}       |    6881
 getpage_wait_seconds_bucket{le="30"}       |    6881
 getpage_wait_seconds_bucket{le="60"}       |    6881
 getpage_wait_seconds_bucket{le="100"}      |    6881
 getpage_wait_seconds_bucket{le="+Inf"}     |    6881
 prefetch_requests_total                    |      67
 sync_requests_total                        |    6815
 pageserver_requests_sent_total             |    6899
 pageserver_requests_disconnects_total      |       0
 pageserver_send_flushes_total              |    6899
 prefetch_misses_total                      |       0
 prefetch_discards_total                    |       0
 file_cache_hits_total                      |       0
(39 rows)

@hlinnaka hlinnaka requested review from a team as code owners September 16, 2024 08:51
Copy link

@orca-security-us orca-security-us bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link

github-actions bot commented Sep 16, 2024

4977 tests run: 4813 passed, 0 failed, 164 skipped (full report)


Flaky tests (7)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.8% (7415 of 23300 functions)
  • lines: 49.8% (59571 of 119717 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
367de54 at 2024-09-17T19:41:09.997Z :recycle:

@MMeent
Copy link
Contributor

MMeent commented Sep 16, 2024

getpage_wait_seconds_bucket{le="0.000200"}

Can't this use additional columns for dimensions? I'm an unfan of prometheus' format and would appreciate not exposing such opinionated names.

@hlinnaka
Copy link
Contributor Author

getpage_wait_seconds_bucket{le="0.000200"}

Can't this use additional columns for dimensions? I'm an unfan of prometheus' format and would appreciate not exposing such opinionated names.

I actually wrote it that way at first, with an explicit "bucket" column. But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query. I found it easier to do that in C code directly.

@MMeent
Copy link
Contributor

MMeent commented Sep 16, 2024

But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query

Why can't the exporter do the table-to-metrics transformation? Shouldn't it be able to handle that transformation by itself?

@hlinnaka
Copy link
Contributor Author

But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query

Why can't the exporter do the table-to-metrics transformation? Shouldn't it be able to handle that transformation by itself?

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

@MMeent
Copy link
Contributor

MMeent commented Sep 16, 2024

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?

E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.

See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

Comment on lines +18 to +21
cur.execute("SELECT * FROM neon_perf_counters")
cur.execute("SELECT * FROM neon_backend_perf_counters")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely these would be all zero?

Should the getpage wait thresholds be checked against pageserver getpage metric buckets that they are in sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely these would be all zero?

You mean, in this test, because it hasn't done anything? No, even just connecting to Postgres requires fetching some pages for the system catalogs that are needed for authentication.

Should the getpage wait thresholds be checked against pageserver getpage metric buckets that they are in sync?

That would be difficult to automate. There's always some network latency, so the latencies measured from the compute are expected to be somewhat higher. But they can also be lower, if prefetching is effective; these reported wait times start the clock when we enter the smgrread() call and know that we need to read a page. If a prefetch request has been issued for the page earlier, the response might already be in flight.

Comparing the values as measured from pageserver is a very valuable thing to do manually, though. It can tell a lot about network latency and how effective the prefetching is.

@hlinnaka
Copy link
Contributor Author

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?

E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.

See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

A-ha, now I understand. I didn't know sql_exporter can do that. Yeah, that makes sense, I'll do that.

@ololobus
Copy link
Member

There is an Epic from John for that #8926

pgxn/neon/neon.control Outdated Show resolved Hide resolved
pgxn/neon/neon_perf_counters.h Outdated Show resolved Hide resolved
Comment on lines +42 to +44
uint64 getpage_wait_us_count;
uint64 getpage_wait_us_sum;
uint64 getpage_wait_us_bucket[NUM_GETPAGE_WAIT_BUCKETS];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add per-shard and/or per-pageserver host labels here? I think this info could be useful and it's also requested in the mentioned Epic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not impossible but it would be a bit more complicated. I'm going to punt on that for now, so that we have something, and we can add that as a followup PR.

* backend had to reconnect. Note that this doesn't count the first
* connection in each backend, only reconnects.
*/
uint64 pageserver_disconnects_total;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other error reasons except disconnections? Like WAL waiting timeout? Should it be a separate pageserver_requests_errors then? Or just replace pageserver_disconnects_total with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pageserver could send an error, or a bogus response which is turned into an error in the compute. But those would result in transaction abort and the error would be logged. The correct number for those errors is zero, and we should rely on logs for them.

Comment on lines 14 to 16
-- Show various metrics, for each backend. Note that the values are not
-- reset when a backend exits. When a new backend starts with the backend
-- ID, it will continue accumulating the values from where the old backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: The current implementation seems ambiguous to me. I see that you reused the same shared struct for both cumulative and per-backend stats; that's why we cannot reset the counters, but is the end result handy? Procno is some internal concept that is hard to reason about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The view exposes 'pid' as well, which is pretty convenient to use, e.g. to show stats of just your own backend

select * from neon_backend_perf_counters where pid=pg_backend_pid();

The metrics include a histogram of how long we need to wait for a
GetPage request, number of reconnects, and number of requests among
other things.

The metrics are not yet exported anywhere, but you can query them
manually.
@hlinnaka
Copy link
Contributor Author

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?
E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.
See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

A-ha, now I understand. I didn't know sql_exporter can do that. Yeah, that makes sense, I'll do that.

Done. It now looks like this:

postgres=# select * from neon_perf_counters ;
                metric                 | bucket_le |  value   
---------------------------------------+-----------+----------
 getpage_wait_seconds_count            |           |      300
 getpage_wait_seconds_sum              |           | 0.048506
 getpage_wait_seconds_bucket           |     2e-05 |        0
 getpage_wait_seconds_bucket           |     3e-05 |        0
 getpage_wait_seconds_bucket           |     6e-05 |       71
 getpage_wait_seconds_bucket           |    0.0001 |      124
 getpage_wait_seconds_bucket           |    0.0002 |      248
 getpage_wait_seconds_bucket           |    0.0003 |      279
 getpage_wait_seconds_bucket           |    0.0006 |      297
 getpage_wait_seconds_bucket           |     0.001 |      298
 getpage_wait_seconds_bucket           |     0.002 |      298
 getpage_wait_seconds_bucket           |     0.003 |      298
 getpage_wait_seconds_bucket           |     0.006 |      300
 getpage_wait_seconds_bucket           |      0.01 |      300
 getpage_wait_seconds_bucket           |      0.02 |      300
 getpage_wait_seconds_bucket           |      0.03 |      300
 getpage_wait_seconds_bucket           |      0.06 |      300
 getpage_wait_seconds_bucket           |       0.1 |      300
 getpage_wait_seconds_bucket           |       0.2 |      300
 getpage_wait_seconds_bucket           |       0.3 |      300
 getpage_wait_seconds_bucket           |       0.6 |      300
 getpage_wait_seconds_bucket           |         1 |      300
 getpage_wait_seconds_bucket           |         2 |      300
 getpage_wait_seconds_bucket           |         3 |      300
 getpage_wait_seconds_bucket           |         6 |      300
 getpage_wait_seconds_bucket           |        10 |      300
 getpage_wait_seconds_bucket           |        20 |      300
 getpage_wait_seconds_bucket           |        30 |      300
 getpage_wait_seconds_bucket           |        60 |      300
 getpage_wait_seconds_bucket           |       100 |      300
 getpage_wait_seconds_bucket           |  Infinity |      300
 getpage_prefetch_requests_total       |           |       69
 getpage_sync_requests_total           |           |      231
 getpage_prefetch_misses_total         |           |        0
 getpage_prefetch_discards_total       |           |        0
 pageserver_requests_sent_total        |           |      323
 pageserver_requests_disconnects_total |           |        0
 pageserver_send_flushes_total         |           |      323
 file_cache_hits_total                 |           |        0
(39 rows)

and that can be converted to prometheus style format with a pretty simple query:

postgres=# select case when bucket_le is null then metric when bucket_le = 'Infinity' then format('%s{le="+Inf"}', metric) else format('%s{le="%s"}', metric, bucket_le::numeric) end, value from neon_perf_counters ;
                  format                   |  value   
-------------------------------------------+----------
 getpage_wait_seconds_count                |      312
 getpage_wait_seconds_sum                  | 0.051847
 getpage_wait_seconds_bucket{le="0.00002"} |        0
 getpage_wait_seconds_bucket{le="0.00003"} |        0
 getpage_wait_seconds_bucket{le="0.00006"} |       71
 getpage_wait_seconds_bucket{le="0.0001"}  |      124
 getpage_wait_seconds_bucket{le="0.0002"}  |      250
 getpage_wait_seconds_bucket{le="0.0003"}  |      288
 getpage_wait_seconds_bucket{le="0.0006"}  |      309
 getpage_wait_seconds_bucket{le="0.001"}   |      310
 getpage_wait_seconds_bucket{le="0.002"}   |      310
 getpage_wait_seconds_bucket{le="0.003"}   |      310
 getpage_wait_seconds_bucket{le="0.006"}   |      312
 getpage_wait_seconds_bucket{le="0.01"}    |      312
 getpage_wait_seconds_bucket{le="0.02"}    |      312
 getpage_wait_seconds_bucket{le="0.03"}    |      312
 getpage_wait_seconds_bucket{le="0.06"}    |      312
 getpage_wait_seconds_bucket{le="0.1"}     |      312
 getpage_wait_seconds_bucket{le="0.2"}     |      312
 getpage_wait_seconds_bucket{le="0.3"}     |      312
 getpage_wait_seconds_bucket{le="0.6"}     |      312
 getpage_wait_seconds_bucket{le="1"}       |      312
 getpage_wait_seconds_bucket{le="2"}       |      312
 getpage_wait_seconds_bucket{le="3"}       |      312
 getpage_wait_seconds_bucket{le="6"}       |      312
 getpage_wait_seconds_bucket{le="10"}      |      312
 getpage_wait_seconds_bucket{le="20"}      |      312
 getpage_wait_seconds_bucket{le="30"}      |      312
 getpage_wait_seconds_bucket{le="60"}      |      312
 getpage_wait_seconds_bucket{le="100"}     |      312
 getpage_wait_seconds_bucket{le="+Inf"}    |      312
 getpage_prefetch_requests_total           |       69
 getpage_sync_requests_total               |      243
 getpage_prefetch_misses_total             |        0
 getpage_prefetch_discards_total           |        0
 pageserver_requests_sent_total            |      335
 pageserver_requests_disconnects_total     |        0
 pageserver_send_flushes_total             |      335
 file_cache_hits_total                     |        0
(39 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants