Add views for metrics about pageserver requests #9008

hlinnaka · 2024-09-16T08:51:41Z

The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things.

The metrics are not yet exported anywhere, but you can query them manually.

This is what the view looks like:

postgres=# select * from neon_perf_counters ;
                   metric                   |  value  
--------------------------------------------+---------
 getpage_wait_seconds_count                 |    6881
 getpage_wait_seconds_sum                   | 0.93566
 getpage_wait_seconds_bucket{le="0.000020"} |       0
 getpage_wait_seconds_bucket{le="0.000030"} |       0
 getpage_wait_seconds_bucket{le="0.000060"} |    1324
 getpage_wait_seconds_bucket{le="0.000100"} |    2886
 getpage_wait_seconds_bucket{le="0.000200"} |    5589
 getpage_wait_seconds_bucket{le="0.000300"} |    6611
 getpage_wait_seconds_bucket{le="0.000600"} |    6875
 getpage_wait_seconds_bucket{le="0.001"}    |    6880
 getpage_wait_seconds_bucket{le="0.002"}    |    6880
 getpage_wait_seconds_bucket{le="0.003"}    |    6880
 getpage_wait_seconds_bucket{le="0.006"}    |    6880
 getpage_wait_seconds_bucket{le="0.010"}    |    6880
 getpage_wait_seconds_bucket{le="0.020"}    |    6881
 getpage_wait_seconds_bucket{le="0.030"}    |    6881
 getpage_wait_seconds_bucket{le="0.060"}    |    6881
 getpage_wait_seconds_bucket{le="0.100"}    |    6881
 getpage_wait_seconds_bucket{le="0.200"}    |    6881
 getpage_wait_seconds_bucket{le="0.300"}    |    6881
 getpage_wait_seconds_bucket{le="0.600"}    |    6881
 getpage_wait_seconds_bucket{le="1"}        |    6881
 getpage_wait_seconds_bucket{le="2"}        |    6881
 getpage_wait_seconds_bucket{le="3"}        |    6881
 getpage_wait_seconds_bucket{le="6"}        |    6881
 getpage_wait_seconds_bucket{le="10"}       |    6881
 getpage_wait_seconds_bucket{le="20"}       |    6881
 getpage_wait_seconds_bucket{le="30"}       |    6881
 getpage_wait_seconds_bucket{le="60"}       |    6881
 getpage_wait_seconds_bucket{le="100"}      |    6881
 getpage_wait_seconds_bucket{le="+Inf"}     |    6881
 prefetch_requests_total                    |      67
 sync_requests_total                        |    6815
 pageserver_requests_sent_total             |    6899
 pageserver_requests_disconnects_total      |       0
 pageserver_send_flushes_total              |    6899
 prefetch_misses_total                      |       0
 prefetch_discards_total                    |       0
 file_cache_hits_total                      |       0
(39 rows)

orca-security-us

Orca Security Scan Summary

Status	Check	Issues by priority
Passed	Infrastructure as Code	0 0 0 0	View in Orca
Passed	Secrets	0 0 0 0	View in Orca
Passed	Vulnerabilities	0 0 0 0	View in Orca

github-actions · 2024-09-16T09:02:43Z

4977 tests run: 4813 passed, 0 failed, 164 skipped (full report)

Flaky tests (7)

Postgres 17

test_pageserver_compaction_smoke: release-arm64
test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64
test_replica_start_scan_clog_crashed_xids: release-arm64
test_scrubber_physical_gc[4]: debug-x86-64

Postgres 16

test_slots_and_branching: release-x86-64

Postgres 15

test_scrubber_physical_gc[4]: release-arm64

Postgres 14

test_replica_start_scan_clog_crashed_xids: release-arm64

Code coverage* (full report)

functions: 31.8% (7415 of 23300 functions)
lines: 49.8% (59571 of 119717 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
367de54 at 2024-09-17T19:41:09.997Z :recycle:}

MMeent · 2024-09-16T09:27:20Z

getpage_wait_seconds_bucket{le="0.000200"}

Can't this use additional columns for dimensions? I'm an unfan of prometheus' format and would appreciate not exposing such opinionated names.

hlinnaka · 2024-09-16T09:55:29Z

getpage_wait_seconds_bucket{le="0.000200"}

Can't this use additional columns for dimensions? I'm an unfan of prometheus' format and would appreciate not exposing such opinionated names.

I actually wrote it that way at first, with an explicit "bucket" column. But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query. I found it easier to do that in C code directly.

MMeent · 2024-09-16T10:17:51Z

But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query

Why can't the exporter do the table-to-metrics transformation? Shouldn't it be able to handle that transformation by itself?

hlinnaka · 2024-09-16T10:26:48Z

But to then convert them to the prometheus metrics format in the exporter, you need a pretty complex SQL query

Why can't the exporter do the table-to-metrics transformation? Shouldn't it be able to handle that transformation by itself?

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

MMeent · 2024-09-16T10:50:47Z

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?

E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.

See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

koivunej · 2024-09-16T11:45:19Z

test_runner/regress/test_compute_metrics.py

+    cur.execute("SELECT * FROM neon_perf_counters")
+    cur.execute("SELECT * FROM neon_backend_perf_counters")


Surely these would be all zero?

Should the getpage wait thresholds be checked against pageserver getpage metric buckets that they are in sync?

Surely these would be all zero?

You mean, in this test, because it hasn't done anything? No, even just connecting to Postgres requires fetching some pages for the system catalogs that are needed for authentication.

Should the getpage wait thresholds be checked against pageserver getpage metric buckets that they are in sync?

That would be difficult to automate. There's always some network latency, so the latencies measured from the compute are expected to be somewhat higher. But they can also be lower, if prefetching is effective; these reported wait times start the clock when we enter the smgrread() call and know that we need to read a page. If a prefetch request has been issued for the page earlier, the response might already be in flight.

Comparing the values as measured from pageserver is a very valuable thing to do manually, though. It can tell a lot about network latency and how effective the prefetching is.

hlinnaka · 2024-09-16T21:44:13Z

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?

E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.

See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

A-ha, now I understand. I didn't know sql_exporter can do that. Yeah, that makes sense, I'll do that.

ololobus · 2024-09-17T09:14:25Z

There is an Epic from John for that #8926

pgxn/neon/neon.control

pgxn/neon/neon_perf_counters.h

ololobus · 2024-09-17T12:19:14Z

pgxn/neon/neon_perf_counters.h

+	uint64		getpage_wait_us_count;
+	uint64		getpage_wait_us_sum;
+	uint64		getpage_wait_us_bucket[NUM_GETPAGE_WAIT_BUCKETS];


Can we add per-shard and/or per-pageserver host labels here? I think this info could be useful and it's also requested in the mentioned Epic

Hmm, not impossible but it would be a bit more complicated. I'm going to punt on that for now, so that we have something, and we can add that as a followup PR.

ololobus · 2024-09-17T12:20:58Z

pgxn/neon/neon_perf_counters.h

+	 * backend had to reconnect. Note that this doesn't count the first
+	 * connection in each backend, only reconnects.
+	 */
+	uint64		pageserver_disconnects_total;


Are there any other error reasons except disconnections? Like WAL waiting timeout? Should it be a separate pageserver_requests_errors then? Or just replace pageserver_disconnects_total with it?

The pageserver could send an error, or a bogus response which is turned into an error in the compute. But those would result in transaction abort and the error would be logged. The correct number for those errors is zero, and we should rely on logs for them.

ololobus · 2024-09-17T12:26:48Z

pgxn/neon/neon--1.4--1.5.sql

+-- Show various metrics, for each backend. Note that the values are not
+-- reset when a backend exits. When a new backend starts with the backend
+-- ID, it will continue accumulating the values from where the old backend


NIT: The current implementation seems ambiguous to me. I see that you reused the same shared struct for both cumulative and per-backend stats; that's why we cannot reset the counters, but is the end result handy? Procno is some internal concept that is hard to reason about

The view exposes 'pid' as well, which is pretty convenient to use, e.g. to show stats of just your own backend

select * from neon_backend_perf_counters where pid=pg_backend_pid();

The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things. The metrics are not yet exported anywhere, but you can query them manually.

hlinnaka · 2024-09-17T17:09:08Z

You can give the exporter an arbitrary SQL, and do the transformation in the SQL query.

I know you can do that, but shouldn't the exporter by itself know how to treat (absense of) labels like le in results?
E.g. if I query select metric as __name__, bucket_range_max as le, count as value from my_metrics_view, shouldn't it automatically translate any non-null values for le as label? AFAIK, that's what happens for every other exported metric with labels.
See e.g. https://github.com/neondatabase/neon/blob/0a8c5e1214fcd3f59767a6ca4adeb68612977e51/vm-image-spec.yaml#L439C1-L449C85 where labels are read from columns.

A-ha, now I understand. I didn't know sql_exporter can do that. Yeah, that makes sense, I'll do that.

Done. It now looks like this:

postgres=# select * from neon_perf_counters ;
                metric                 | bucket_le |  value   
---------------------------------------+-----------+----------
 getpage_wait_seconds_count            |           |      300
 getpage_wait_seconds_sum              |           | 0.048506
 getpage_wait_seconds_bucket           |     2e-05 |        0
 getpage_wait_seconds_bucket           |     3e-05 |        0
 getpage_wait_seconds_bucket           |     6e-05 |       71
 getpage_wait_seconds_bucket           |    0.0001 |      124
 getpage_wait_seconds_bucket           |    0.0002 |      248
 getpage_wait_seconds_bucket           |    0.0003 |      279
 getpage_wait_seconds_bucket           |    0.0006 |      297
 getpage_wait_seconds_bucket           |     0.001 |      298
 getpage_wait_seconds_bucket           |     0.002 |      298
 getpage_wait_seconds_bucket           |     0.003 |      298
 getpage_wait_seconds_bucket           |     0.006 |      300
 getpage_wait_seconds_bucket           |      0.01 |      300
 getpage_wait_seconds_bucket           |      0.02 |      300
 getpage_wait_seconds_bucket           |      0.03 |      300
 getpage_wait_seconds_bucket           |      0.06 |      300
 getpage_wait_seconds_bucket           |       0.1 |      300
 getpage_wait_seconds_bucket           |       0.2 |      300
 getpage_wait_seconds_bucket           |       0.3 |      300
 getpage_wait_seconds_bucket           |       0.6 |      300
 getpage_wait_seconds_bucket           |         1 |      300
 getpage_wait_seconds_bucket           |         2 |      300
 getpage_wait_seconds_bucket           |         3 |      300
 getpage_wait_seconds_bucket           |         6 |      300
 getpage_wait_seconds_bucket           |        10 |      300
 getpage_wait_seconds_bucket           |        20 |      300
 getpage_wait_seconds_bucket           |        30 |      300
 getpage_wait_seconds_bucket           |        60 |      300
 getpage_wait_seconds_bucket           |       100 |      300
 getpage_wait_seconds_bucket           |  Infinity |      300
 getpage_prefetch_requests_total       |           |       69
 getpage_sync_requests_total           |           |      231
 getpage_prefetch_misses_total         |           |        0
 getpage_prefetch_discards_total       |           |        0
 pageserver_requests_sent_total        |           |      323
 pageserver_requests_disconnects_total |           |        0
 pageserver_send_flushes_total         |           |      323
 file_cache_hits_total                 |           |        0
(39 rows)

and that can be converted to prometheus style format with a pretty simple query:

postgres=# select case when bucket_le is null then metric when bucket_le = 'Infinity' then format('%s{le="+Inf"}', metric) else format('%s{le="%s"}', metric, bucket_le::numeric) end, value from neon_perf_counters ;
                  format                   |  value   
-------------------------------------------+----------
 getpage_wait_seconds_count                |      312
 getpage_wait_seconds_sum                  | 0.051847
 getpage_wait_seconds_bucket{le="0.00002"} |        0
 getpage_wait_seconds_bucket{le="0.00003"} |        0
 getpage_wait_seconds_bucket{le="0.00006"} |       71
 getpage_wait_seconds_bucket{le="0.0001"}  |      124
 getpage_wait_seconds_bucket{le="0.0002"}  |      250
 getpage_wait_seconds_bucket{le="0.0003"}  |      288
 getpage_wait_seconds_bucket{le="0.0006"}  |      309
 getpage_wait_seconds_bucket{le="0.001"}   |      310
 getpage_wait_seconds_bucket{le="0.002"}   |      310
 getpage_wait_seconds_bucket{le="0.003"}   |      310
 getpage_wait_seconds_bucket{le="0.006"}   |      312
 getpage_wait_seconds_bucket{le="0.01"}    |      312
 getpage_wait_seconds_bucket{le="0.02"}    |      312
 getpage_wait_seconds_bucket{le="0.03"}    |      312
 getpage_wait_seconds_bucket{le="0.06"}    |      312
 getpage_wait_seconds_bucket{le="0.1"}     |      312
 getpage_wait_seconds_bucket{le="0.2"}     |      312
 getpage_wait_seconds_bucket{le="0.3"}     |      312
 getpage_wait_seconds_bucket{le="0.6"}     |      312
 getpage_wait_seconds_bucket{le="1"}       |      312
 getpage_wait_seconds_bucket{le="2"}       |      312
 getpage_wait_seconds_bucket{le="3"}       |      312
 getpage_wait_seconds_bucket{le="6"}       |      312
 getpage_wait_seconds_bucket{le="10"}      |      312
 getpage_wait_seconds_bucket{le="20"}      |      312
 getpage_wait_seconds_bucket{le="30"}      |      312
 getpage_wait_seconds_bucket{le="60"}      |      312
 getpage_wait_seconds_bucket{le="100"}     |      312
 getpage_wait_seconds_bucket{le="+Inf"}    |      312
 getpage_prefetch_requests_total           |       69
 getpage_sync_requests_total               |      243
 getpage_prefetch_misses_total             |        0
 getpage_prefetch_discards_total           |        0
 pageserver_requests_sent_total            |      335
 pageserver_requests_disconnects_total     |        0
 pageserver_send_flushes_total             |      335
 file_cache_hits_total                     |        0
(39 rows)

hlinnaka requested review from problame and MMeent September 16, 2024 08:51

hlinnaka requested review from a team as code owners September 16, 2024 08:51

orca-security-us bot reviewed Sep 16, 2024

View reviewed changes

hlinnaka force-pushed the add-compute-getpage-metrics branch from 48130d2 to 1993f22 Compare September 16, 2024 08:56

hlinnaka force-pushed the add-compute-getpage-metrics branch from 1993f22 to d2f47a0 Compare September 16, 2024 09:20

jcsp mentioned this pull request Sep 16, 2024

Epic: compute performance observability #8926

Open

koivunej reviewed Sep 16, 2024

View reviewed changes

ololobus reviewed Sep 17, 2024

View reviewed changes

hlinnaka added 3 commits September 17, 2024 19:16

Add views for metrics about pageserver requests

9afbe3d

The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things. The metrics are not yet exported anywhere, but you can query them manually.

Add bucket_le as an explicit column to the views

382f30e

Add 'getpage_' prefix to some metrics, reorder for clarity

3a8408b

hlinnaka force-pushed the add-compute-getpage-metrics branch from 95dd419 to b80597a Compare September 17, 2024 17:07

Revert default 'neon' extension version to 1.4

367de54

hlinnaka force-pushed the add-compute-getpage-metrics branch from b80597a to 367de54 Compare September 17, 2024 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add views for metrics about pageserver requests #9008

Add views for metrics about pageserver requests #9008

hlinnaka commented Sep 16, 2024

orca-security-us bot left a comment

github-actions bot commented Sep 16, 2024 •

edited

Loading

Postgres 17

Postgres 16

Postgres 15

Postgres 14

MMeent commented Sep 16, 2024

hlinnaka commented Sep 16, 2024

MMeent commented Sep 16, 2024

hlinnaka commented Sep 16, 2024

MMeent commented Sep 16, 2024

koivunej Sep 16, 2024

hlinnaka Sep 16, 2024

hlinnaka commented Sep 16, 2024

ololobus commented Sep 17, 2024

ololobus Sep 17, 2024

hlinnaka Sep 17, 2024

ololobus Sep 17, 2024

hlinnaka Sep 17, 2024

ololobus Sep 17, 2024

hlinnaka Sep 17, 2024

hlinnaka commented Sep 17, 2024

		cur.execute("SELECT * FROM neon_perf_counters")
		cur.execute("SELECT * FROM neon_backend_perf_counters")

Add views for metrics about pageserver requests #9008

Are you sure you want to change the base?

Add views for metrics about pageserver requests #9008

Conversation

hlinnaka commented Sep 16, 2024

orca-security-us bot left a comment

Choose a reason for hiding this comment

Orca Security Scan Summary

github-actions bot commented Sep 16, 2024 • edited Loading

4977 tests run: 4813 passed, 0 failed, 164 skipped (full report)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

MMeent commented Sep 16, 2024

hlinnaka commented Sep 16, 2024

MMeent commented Sep 16, 2024

hlinnaka commented Sep 16, 2024

MMeent commented Sep 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlinnaka commented Sep 16, 2024

ololobus commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlinnaka commented Sep 17, 2024

github-actions bot commented Sep 16, 2024 •

edited

Loading