Rate limit restructure (server-side) #933

khetzl · 2025-12-15T19:17:41Z

Multiple rate limiter actors to be called from middleware/cowboy_handler processes.

Implementation based on Leaky Bucket Token (https://gist.github.com/humaite/21a84c3b3afac07fcebe476580f3a40b) with additional limits for concurrency (similarly to Ranch concurrency limiter).

It could potentially allow complex rules set to map requests to limiter pools and/or custom settings via (reworked) ar_config.

The current implementation is not "wired" in. So the new middleware module is not added to ar_http_iface_server, and the supervisor is not yet part of the ar_node supervision tree.

The functionality could be extended with values helpful to regulate clients via the Status 429 response headers.

JamesPiechota · 2025-12-23T19:44:11Z

Ha. Interesting - I would have thought the ++ would have caused an error. Good to know!

khetzl · 2025-12-23T20:09:17Z

Ha. Interesting - I would have thought the ++ would have caused an error. Good to know!

It did! However, the test was wrong as well. because the sliding window was set to 0, no timestamps were actually stored for the peers. So that part of the code never ran, until I started up a node, and started to experiment with the config values.
Two wrongs made a false right.

…quickstart documentation

…fixed

…seemingly causes tests to fail

… definition

…cal peers

humaite

looks okay to me at first glance. i will probably re-read it later.

humaite · 2026-01-09T13:10:11Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+    gen_server:stop(LimiterRef).
+
+%% gen_server callbacks
+init([Args]) ->


Why using [Args] and not just Args?

humaite · 2026-01-09T13:17:39Z

bin/arweave

+        TEST_PATH="$(./rebar3 as ${TEST_PROFILE} path)"
+
+        ## TODO: Generate path for all apps -> Should we fetch this from somewhere?
+        APPS=("arweave" "arweave_config" "arweave_limiter" "arweave_diagnostic")


You could remove the "bash-like" list. It's not really portable and could cause trouble. On POSIX shell, a list is a string made of words separated with spaces or null chars.

APPS="arweave arweave_config arweave_limiter arweave_diagnostic"

In this case, the code on L523 will simply looks like that:

for app in ${APPS} do ... done

Using specific bash features is usually not a good idea.

humaite · 2026-01-09T13:18:40Z

bin/arweave

-	PARAMS="-pa ${TEST_PATH} ${TEST_PATH_BASE} -config ${TEST_CONFIG} -noshell"
+        TEST_PATH="$(./rebar3 as ${TEST_PROFILE} path)"
+
+        ## TODO: Generate path for all apps -> Should we fetch this from somewhere?


In theory, those paths are generated using rebar3, but it seems for the case of tests, it was not correctly done.

humaite · 2026-01-09T13:21:36Z

apps/arweave/src/ar_http_iface_rate_limiter_middleware.erl

@@ -0,0 +1,69 @@
+-module(ar_http_iface_rate_limiter_middleware).


can you add an header with a bit of documentation to explain the role of this middleware?

humaite · 2026-01-09T13:28:00Z

apps/arweave/src/ar_http_iface_rate_limiter_middleware.erl

+
+config_peer_to_ip_addr({A, B, C, D, _}) -> {A, B, C, D}.
+
+path_to_limiter_ref([<<"chunk">> | _]) -> chunk;


If I understand correctly, this function is reading the path from the client request, and will return the kind of limiter used. Using a data structure like a proplist or a map can probably be easier to modify in the future. Not sure if could be easy to do right now though.

The problem I see here is, if we are adding a new path, we will need to add it in this function (at least if we need a special rate limit for it).

Yes, it's certainly going to some work to add a new rate limiter group.

add Rate Limiter Group (RLG) process to arweave_limiter_sup

add Rate Limiter Group config parameters as fields to arweave_config config record.
2b) add defaults for the parameters.

add Rate Limiter routing logic to ar_http_iface_rate_limiter_middleware

IMHO, this mapping function is a quite idiomatic way to map paths to atoms. (No dynamically named atoms, no need for lookups)

I'd debate that 2) is a bigger burden than 3) (extending this function).

I'd like to point out: this PR doesn't aim to introduce dynamic RLGs.

With a different approach on how we deal with the configuration, I think we can introduce RLGs started in a dynamic manner in the future, if necessary.

We are not adding a lot of new end-point, but perhaps documenting this procedure somewhere could be nice.

JamesPiechota · 2026-01-09T15:28:14Z

apps/arweave/src/ar_blacklist_middleware.erl

+-export([start/0, ban_peer/2, is_peer_banned/1, cleanup_ban/1]).
 -export([start_link/0]).

+-ifdef(TEST).


Suggested change

-ifdef(TEST).

-ifdef(AR_TEST).

We've had some name collisions with the ?TEST define in some dependencies, so we use ?AR_TEST now to indicate whether or not Arweave is being run in test-mode (it gets set in rebar.config)

There will be quite a few of these, I can change change all to AR_TEST

JamesPiechota · 2026-01-09T15:39:26Z

The RPM_BY_PATH logic as well as the ar_rate_limiter module - they're both related to the client-side rate limiting, right? i.e. no change to them in this PR, but they will likely be removed/updated in a future PR when the client-side functionality is added.

JamesPiechota · 2026-01-09T15:45:47Z

apps/arweave/src/ar_blacklist_middleware.erl

-execute(Req, Env) ->
-	IPAddr = requesting_ip_addr(Req),
-	{ok, Config} = arweave_config:get_env(),
-	case lists:member(blacklist, Config#config.disable) of


To be honest I didn't realize we had a disable blacklist option, and I'm not sure if it is currently in use. It looks like it would completely disable rate limiting on the server side. @ldmberman @humaite are you aware of any nodes that use it?

It's possible some large miners use it on internal nodes used to server data to mining nodes (for example). But I'm not sure it would even be that effective since by default clients will self-throttle regardless of the server-side behavior.

This is a long way of asking:

@khetzl did you mean to remove this option? (e.g. because you've replaced it with another configuration option)

Even if the removal was unintentional... I'm not sure we actually need or want this particular flag. In general the enable and disable options are poorly understood and rarely used, so if we want to disable server-side rate limiting a more formal config is a better choice. Only question is whether anyone is using the current flag?

I intended to remove this, we shouldn't allow unrestricted access to resources to the public.
Trusted peers can be listed so their requests are limited via a less restrictive Rate Limiter Group. (Currently, no limiting at all)

Note: Even if the disable blacklist flag is enabled in the current release, a global concurrency limit is enforced by cowboy/ranch.

Got it. I agree it's correct to remove. However worth providing some context:

In cases like this where it is the right call to remove an option, we may still need to flag and communicate the removal for any miners that are relying on it. Both in the release notes and in Discord. I suspect this option isn't used much (or at all) so probably fine to silently remove.

Many miners implement their own security in front of their node. Specifically they will run nodes on a LAN without direct internet connection - so even if they opt in to using the disable blacklist flag it doesn't necessarily mean they or we are providing unrestricted access to the public.

I don't think this flag is used, but if it is used it is likely due to the above. We've had a few miners run large networks with many internal (non publicly accessible nodes). In the past they tried using local_peers but since arweave doesn't currently support any sort of wildcards/subnetting, and because the ndoe shutdown/boot time can take a long time, some of the miners had trouble managing rate limiting for their internal nodes. It's in that scenario where I could see disable blacklist being useful (setup an internal data-sharing node, disable blacklist, and then let all your other nodes connect to without worrying about needing to restart periodically to update local_peers)

All that said: I do think fine to remove. But I wanted to call out that context as our operating model is often different than that of non-blockchain companies (e.g. we build infrastructure software to be run by 3rd parties as well as by us internally - and those 3rd parties vary from novice to enterprise-level)

khetzl · 2026-01-09T17:31:30Z

The RPM_BY_PATH logic as well as the ar_rate_limiter module - they're both related to the client-side rate limiting, right? i.e. no change to them in this PR, but they will likely be removed/updated in a future PR when the client-side functionality is added.

The RPM_BY_PATH macro was used on the server-side as well (apps/arweave/src/ar_blacklist_middleware.erl L153)

Yes, the client-side still relies on this. You're right, it will be removed later. Probably, the most important change we want to implement on the client side is peers not relying on their local configuration for throttling, but altering their behaviour (throttle) dynamically according to the remote peer's reponse headers. That renders the RPM_BY_PATH macro redundant.

JamesPiechota · 2026-01-09T17:27:18Z

apps/arweave_config/include/arweave_config.hrl

+-define(DEFAULT_HTTP_API_LIMITER_IS_MANUAL_REDUCTION_DISABLED, false).
+
+%% General RLG
+-define(DEFAULT_HTTP_API_LIMITER_GENERAL_SLIDING_WINDOW_LIMIT, 150).


These are the defaults that apply across the board to all endpoints unless specifically overridden?

JamesPiechota · 2026-01-09T19:12:39Z

apps/arweave_config/include/arweave_config.hrl

+	'http_api.limiter.get_previous_vdf_session.is_manual_reduction_disabled' =
+                     ?DEFAULT_HTTP_API_LIMITER_IS_MANUAL_REDUCTION_DISABLED,
+
+	'http_api.limiter.metrics.sliding_window_limit' =


@khetzl @humaite What's the recommended approach for the new config system regarding how deep to nest configs?

E.g. Is there a clear preference http_api_limiter.metrics.sliding_window_timestamp_cleanup_interval vs. http_api_limiter.metrics.sliding_window.timestamp_cleanup_interval?

Especially in a JSON format I could see grouping all the sliding_window or leaky_tick options together making it a little easier to keep everything straight.

JamesPiechota · 2026-01-09T19:13:52Z

apps/arweave_limiter/src/arweave_limiter.erl

+%%% @doc Arweave Rate Limiter.
+%%%
+%%% `arweave_limiter' module is an interface to the Arweave
+%%% Rate Limiter functionality.


Nice. I like this approach - really helps with modularization 👍

JamesPiechota · 2026-01-09T19:17:23Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+
+reduce_for_peer(LimiterRef, Peer) ->
+    Result = gen_server:call(LimiterRef, {reduce_for_peer, Peer}),
+    Result == ok andalso prometheus_counter:inc(ar_limiter_reduce_requests_total,


I haven't seen this structure before. Any benefit vs.

ok = Result = gen_server:call(LimiterRef, {reduce_for_peer, Peer}), prometheus_counter:inc(ar_limiter_reduce_requests_total,

Ah wait, I think I see the distinction. My version would crash, your version would just silently not call prometheus_count:inc, right?

JamesPiechota · 2026-01-09T19:19:26Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+
+%% gen_server callbacks
+init([Args]) ->
+    process_flag(trap_exit, true),


We've gone back and forth on process_flag(trap_exit, true),. At one point we were adding it to all gen_servers, and then we switched to only adding it when there was a specific cleanup action we wanted to take on exit.

What's the rationale for trapping the exit here?

JamesPiechota · 2026-01-09T19:40:25Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+            case Concurrency > ConcurrencyLimit of
+                true ->
+                    %% Concurrency Hard Limit
+                    ?LOG_WARNING([{event, ar_limiter_reject}, {reason, concurrency},


We may want to remove all of these ?LOG_WARNINGs or, at best, flip them to ?LOG_DEBUG.

We wrestle with how verbosely to log events that are initiated by clients. We've had issues in the past where a malicious client would attempt to flood a miner's logs via inexpensive HTTP requests.

I guess as a rule of thumb if a log can be generated by a client, we should aim to have the event that triggers the log be difficult/rare. Although log bloat is a constant battle, and I'm sure there are places in the code where this rule of thumb has been violated

JamesPiechota · 2026-01-09T19:51:53Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+                                        register_concurrent(
+                                          Peer, FromPid, ConcurrentRequests, ConcurrentMonitors),
+                                    {reply, {register, leaky},
+                                     State#{leaky_tokens => NewLeakyTokens,


Question for my own understanding: what's the rationale of => vs. := here? => is like upsert, right? and := will fail if the key doesn't already exists? Why do we only want to fail on concurrent_monitors?

JamesPiechota · 2026-01-09T19:56:49Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+                                            concurrent_monitors := NewMonitors}}
+                            end;
+                        _ ->
+                            {NewRequests, NewMonitors} =


It's possible that we hit this branch (i.e. we have capacity under the sleding window limit) while also still having a non-zero Tokens count, right? i.e. we have requests registered against the burst limit and also have space in the liding window.

I guess I could see this happening if we recently exceeded the sliding window limit and started registering tokens against the burst limit, and then some capacit opened up in the sliding window before we had finished "draining the burst limit bucket".

Is that correct? And that's expected, right?

JamesPiechota · 2026-01-09T19:59:59Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+            Accept
+    end.
+
+reduce_for_peer(LimiterRef, Peer) ->


Where is this called? I only see it called from ar_http_iface_middleware:handle_post_tx_accepted. I was expecting to see it called whenever an inbound HTTP request is processed, but perhaps I misunderstood

Ah yeah looks like I misunderstood. Reading the comment reduce_for_peer is a special case so that we don't penalize peers for sharing valid transactions with us. The normal leaky bucket and window counter tracking is handled elswhere.

Maybe add a comment to this function to describe it? Main relevant point that I see is that this function is not the standard way to reduce the request count for a peer, but is an edge case handler.

JamesPiechota · 2026-01-09T20:10:50Z

apps/arweave_limiter/src/arweave_limiter_group.erl

+    Timestamps.
+
+add_and_order_timestamps(Ts, Timestamps) ->
+    lists:reverse(do_add_and_order_timestamps(Ts, lists:reverse(Timestamps))).


Suggested change

lists:reverse(do_add_and_order_timestamps(Ts, lists:reverse(Timestamps))).

%% Timestamps is ordered oldest to newest. However we expect Ts to usually be the most recent timestamp. To optimize for this case we reverse, add (likely to the front), then re-reverse

lists:reverse(do_add_and_order_timestamps(Ts, lists:reverse(Timestamps))).

Or some other comment just to explain the double-reverse

JamesPiechota · 2026-01-09T20:14:28Z

apps/arweave_limiter/src/arweave_limiter_metrics.erl

+    ok = prometheus_histogram:new([
+                                   {name, ar_limiter_response_time_milliseconds},
+                                   {help, "Time it took for the limiter to respond to requests"},
+                                   {buckets, [infinity]}, %% For meaningful results on this, we need buckets


Without buckets we can still get an average over time because the histogram includes xxx_count and xxx_sum fields. Most of our internal charts just use this average over time value to track performance changes. But if some degree of bucketing would be helpful for this metric (e.g. if the distribution is skewed per timeslice), defintely add them!

We have a broader issue of how to deal with our growing number of metrics, but I think we'll likely need to tackle that holistically - until then feel free to add whatever metrics you think are necessary

JamesPiechota · 2026-01-09T20:18:49Z

rebar.config

 	]},
 	{test, [
-		{deps, [{meck, "0.8.13"}]},
+		{deps, [{meck, "1.1.0"}]},


Did upgrading meck address the bug you noticed?

JamesPiechota · 2026-01-09T20:19:11Z

PR looks great! Most of my comments are minor.

One annoying comment: unfortunately we're still using tab indenting rather than space. We'd like to reformat everything, but until then would be good to have new code indented with tabs.

One question: have you been able to run any performance tests using the new code? My guess is the impact of the extra logic is minimal. Only potential bottleneck I can see if the arweave_limiter_group process itself, now that all requests for a given group are sent to a single process I wonder if it could get backed up under load.

khetzl added 10 commits December 15, 2025 19:08

Rate limit restructure

f80ed18

Add tests, sliding window under leaky bucket token bursts

56b7389

Move tests to separate file, avoid failed tests leaking

80412c7

Define new metrics for ar_limiter

0af2d5f

fix: apply metrics in ar_limiter, update tests to mock prometheus calls

99266c7

Start ar_limit_sup

9f999f9

HTTP API Server now uses ar_limiter, and metrics collector

4cf6935

fix: ar_limiter_sup:start_link/0

00476e2

fix: ar_limiter_sup:start_link/0

d8cecd0

fix typo, test, and an interesting bug it hid

0cb8632

khetzl added 18 commits December 26, 2025 12:31

Rework tests, cleanup sliding window timestamps, additional metrics

cca8147

ar_limiter efault values to reflect similar numbers to what's in the …

d9602b4

…quickstart documentation

collector tests, add tests to github wf, minor cleanup

fb9efeb

limiter collector tests, hardwired config for metrics endpoint

01c6dbc

http iface tests, ar_config details trough ar_limiter_sup

bf2d29d

Apply all config parameters to general limiter pool

ba6c789

With limiter settings part of the config, ar_http_iface tests can be …

ce49360

…fixed

remove commented code

6644dd2

use the right metrics name for leaky bucket token use

ee527f3

arweave_limiter as a self-contained Erlang app

ef64e2f

rename tests in github action

744249e

fix module name

0ffad03

remove arweave_config and arweave_limiter from arweave.app.src as it …

d593010

…seemingly causes tests to fail

start applications with their API functions, rather than with app.src…

d54afbe

… definition

further fixes to startup and metrics collector registration

be39ec7

app.src and placeholders to fix tests

63f490c

rename limiter pools to groups

4d04b4f

peers per port, option to disable limiter group, and new group for lo…

fc74273

…cal peers

khetzl added 3 commits January 8, 2026 14:23

remove limiting from ar_blacklist_middleware

08dfb82

Replicate previous default settings for different paths

7d8310c

route requrests to appropriate rate limiting groups

4b8530a

khetzl requested review from JamesPiechota and humaite January 9, 2026 12:10

khetzl changed the title ~~WIP: Rate limit restructure~~ Rate limit restructure (server-side) Jan 9, 2026

Remove commented code from bin/arweave startup script

14e27e5

humaite reviewed Jan 9, 2026

View reviewed changes

JamesPiechota reviewed Jan 9, 2026

View reviewed changes

		@@ -0,0 +1,69 @@
		-module(ar_http_iface_rate_limiter_middleware).


		config_peer_to_ip_addr({A, B, C, D, _}) -> {A, B, C, D}.

		path_to_limiter_ref([<<"chunk">> \| _]) -> chunk;

	lists:reverse(do_add_and_order_timestamps(Ts, lists:reverse(Timestamps))).
	%% Timestamps is ordered oldest to newest. However we expect Ts to usually be the most recent timestamp. To optimize for this case we reverse, add (likely to the front), then re-reverse
	lists:reverse(do_add_and_order_timestamps(Ts, lists:reverse(Timestamps))).

Rate limit restructure (server-side) #933

Are you sure you want to change the base?

Rate limit restructure (server-side) #933

Uh oh!

Conversation

khetzl commented Dec 15, 2025

Uh oh!

JamesPiechota commented Dec 23, 2025

Uh oh!

khetzl commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

humaite left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesPiechota commented Jan 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

khetzl commented Jan 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesPiechota commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

khetzl commented Dec 23, 2025 •

edited

Loading

JamesPiechota Jan 9, 2026 •

edited

Loading