Add support for SSL error handling #9

minfrin · 2025-07-01T11:01:05Z

Allow the registration of an optional callback using serf_ssl_error_cb_set().
If the callback is registered, return a fixed string describing the error as created by the underlying crypto library.

Example:

[minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/
svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core'
svn: E120170: TLS: error:0308010C:digital envelope routines::unsupported
svn: E120170: TLS: could not parse PKCS12: /home/minfrin/.my-cert.p12

- Allow the registration of an optional callback using serf_ssl_error_cb_set(). - If the callback is registered, return a fixed string describing the error as created by the underlying crypto library. Example: [minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/ svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core' svn: E120171: TLS: error:0308010C:digital envelope routines::unsupported svn: E120171: Error running context: An error occurred during SSL communication

brainy · 2025-07-01T11:08:55Z

buckets/ssl_buckets.c

    ssl_ctx->protocol_userdata = NULL;

+    ssl_ctx->error_callback = NULL;
+    ssl_ctx->error_userdata = NULL;


We call those batons, so: error_baton. Hmph, should do the same for protocol_userdata ⟶ protocol_baton.

It has been made so.

brainy · 2025-07-01T11:09:53Z

serf_bucket_types.h

+ * Callback type for detailed TLS error strings.
+ */
+typedef apr_status_t (*serf_ssl_error_cb_t)(
+    void *data,


Again: void *baton.

brainy · 2025-07-01T11:09:55Z

serf_bucket_types.h

+void serf_ssl_error_cb_set(
+    serf_ssl_context_t *context,
+    serf_ssl_error_cb_t callback,
+    void *data);


And here, too.

brainy · 2025-07-01T11:17:12Z

Since this callback is on the context, it's serialised within the context run right? I assume that the idea is to store those messages at the caller and report them if serf_context_run() fails (or report them as warnings if it doesn't, I guess.)

If that's the case, it would be nice to document that in a docstring in serf_bucket_types.h. Right now it's not at all obvious how that callback should be implemented.

minfrin · 2025-07-01T11:43:54Z

Since this callback is on the context, it's serialised within the context run right? I assume that the idea is to store those messages at the caller and report them if serf_context_run() fails (or report them as warnings if it doesn't, I guess.)

If that's the case, it would be nice to document that in a docstring in serf_bucket_types.h. Right now it's not at all obvious how that callback should be implemented.

All of this is pretty much the caller's problem.

The callback just fires every time there is an error, it's up to the caller to pass a baton that makes sense, and a callback routine that understands the baton.

In existing subversion, the context run keeps going until an SSL error happens, and if an SSL error happens the whole context is shut down and the client gives up. In this case the callback fires first, and subversion (shortly) makes a copy of the message for later. When later comes, any error message from a callback is added to svn_error_t.

Other applications could work completely differently, serf shouldn't care.

brainy · 2025-07-01T11:48:08Z

It's not about Serf's caring, it's about the user knowing the callback will get called (just) before serf_context_run() returns (with an error status?). "Use the source, Luke!" is fine until it suddenly isn't. :)

minfrin · 2025-07-01T12:05:22Z

It's not about Serf's caring, it's about the user knowing the callback will get called (just) before serf_context_run() returns (with an error status?). "Use the source, Luke!" is fine until it suddenly isn't. :)

Does the added description make sense?

It's literally "here is the detail string for the failure you are about to experience, log it, add it to your error stack, do with it as you will".

Unfortunately the best option was for there to not be a comment "Detect more specific errors?" in the code followed by no way to return a more specific error. We're doing the best we can with what we have available.

Add the most detailed underlying crypto library error string to the error stack when the context fails due to an SSL failure. SSL errors are no longer reduced to "an error has occurred". This relies on the serf_ssl_error_cb_t callback as provided by serf in apache/serf#9. Example: [minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/ svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core' svn: E120171: TLS: error:0308010C:digital envelope routines::unsupported svn: E120171: Error running context: An error occurred during SSL communication

dsahlberg-apache-org · 2025-07-01T16:33:52Z

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me.
Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error.
In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

brainy · 2025-07-01T16:47:02Z

Serf is asynchronous, and one serf-context can juggle multiple connections. You need a callback if you want to have a hope in hell of keeping all this straight on the error-reporting side.

That said, however: the callback doesn't know which connection a given error belongs to. That could be a problem. typical case would be a context with one SSL connection to a server and one connection to an OCSP responder that validates the server cert. It'd be strange if, e.g., OCSP connection errors were reported against a Subversion server.

OK, OCSP connections shouldn't use SSL. Which brings up the point: why would this callback only be used to report SSL errors, when there can be other kinds of error descriptions available for other connection types, and they, too, can't know what actually happened.

minfrin · 2025-07-01T17:36:15Z

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me. Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error. In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

The problem with that approach is that it "keeps stock".

What you want is to hand over the errors you encounter with the least effort possible. Right now in a trivial implementation you could just write the errors to stderr from inside the callback. No copying, no saving state anywhere, you're done.

Subversion is more sophisticated, it has the ability to stack errors. This lines up nicely where the callback can add an additional error to the stack.

minfrin · 2025-07-01T18:08:10Z

Serf is asynchronous, and one serf-context can juggle multiple connections. You need a callback if you want to have a hope in hell of keeping all this straight on the error-reporting side.

That said, however: the callback doesn't know which connection a given error belongs to. That could be a problem. typical case would be a context with one SSL connection to a server and one connection to an OCSP responder that validates the server cert. It'd be strange if, e.g., OCSP connection errors were reported against a Subversion server.

OK, OCSP connections shouldn't use SSL. Which brings up the point: why would this callback only be used to report SSL errors, when there can be other kinds of error descriptions available for other connection types, and they, too, can't know what actually happened.

Options are limited inside serf.

The two places where errors are generated are https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L1538 and https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L1054, and the only context that is available at these locations are serf_ssl_context_t.

This isn't a crisis though. At a future date, we could add an error callback tied to a connection, and let the caller decide what scope of errors they want. In other words, we don't need to delay providing error messages today in the hope of being perfect in the future.

brainy · 2025-07-01T23:43:54Z

buckets/ssl_buckets.c


+        if (err && ctx->error_callback) {
+            char ebuf[256];
+            ERR_error_string_n(err, ebuf, sizeof(ebuf));


Just above, you call ERR_error_string() to log the error, while here, you copy the string to the stack. Is this actually necessary? I'd just document to callback implementors that they have to copy the string inside the callback and call ERR_error_string() just once.

The logging was left unchanged, but really should go.

The logging is the crutch that was supporting the lack of error handling. Libraries should definitely not be logging anything.

I disagree with the "libraries should not be logging" sentiment. They shouldn't dump to stdout or stderr, but Serf doesn't do that -- it writes to whatever logging sink the application cares to provide, so basically integrates with the application's logging infrastructure. I've found this to be extremely useful for debugging; not everything can be done with error handling, no matter how sophisticated. Because logging by its nature leaves an audit trail for states that are not error conditions and can't even be captured when an error occurs.

In this case, I agree, we shouldn't be doing both.

brainy · 2025-07-01T23:55:19Z

buckets/ssl_buckets.c

                        }
+                        else {
+                            err = ERR_get_error();
+                            ERR_clear_error();


You don't pass err to log_ssl_error() and clear it here; doesn't that mean that nothing will happen? Why even call ERR_get_error() here, given that it has no effect?

This was from before we used log_ssl_error(), I've simplified this.

gstein · 2025-07-02T08:52:41Z

If a context is running 4 connections to a server, and they are sharing an SSL context, then this callback attached to the shared context cannot identify the connection, can it? (I kinda recall reading that somewhere)

I'm also thinking that this "error callback" being specialized only to SSL is missing a review of how to do this properly for serf in general. And yes, I DO want a general design, rather than tack on a "single solution today" and "generalize later".

To generalize it (likely beyond scope of desire of this PR), I'd seek some kind of baton/identifier of "what request is this error attached to?" ... I believe the request is the "unit of work" and any error would be associated with that. Could be a response, but that is in response to a request. ... Hmm. I guess one could also associate an error with a connection, or a shared context. eg. timed out a connection setup before we even tried to deliver a request.

In short, I do not think it appropriate to apply a hack to fix one case, when we don't have a conceptual design (even if not fully-implemented) for a serf-wide sophisticated error handling design.

minfrin · 2025-07-02T09:38:38Z

If a context is running 4 connections to a server, and they are sharing an SSL context, then this callback attached to the shared context cannot identify the connection, can it? (I kinda recall reading that somewhere)

This is a general limitation of serf that exists already. When serf fails, it returns apr_status_t. What connection that apr_status_t belongs to, serf doesn't tell you.

I'm also thinking that this "error callback" being specialized only to SSL is missing a review of how to do this properly for serf in general. And yes, I DO want a general design, rather than tack on a "single solution today" and "generalize later".

To generalize it (likely beyond scope of desire of this PR), I'd seek some kind of baton/identifier of "what request is this error attached to?" ... I believe the request is the "unit of work" and any error would be associated with that. Could be a response, but that is in response to a request. ... Hmm. I guess one could also associate an error with a connection, or a shared context. eg. timed out a connection setup before we even tried to deliver a request.

In short, I do not think it appropriate to apply a hack to fix one case, when we don't have a conceptual design (even if not fully-implemented) for a serf-wide sophisticated error handling design.

Right now, I want to contact a vendor and say "I get this message, please help", and not receive from the vendor pages and pages of "it could be this, or it could be that, or maybe this, or maybe that", and then eventually find out that it was actually none of the above, and all of this occurred for the want of an actual error message.

Our community is orders of magnitude more important than the code.

I entirely get the desire to attach this error to "a unit of work", your choices for a unit of work in this code consist of just one structure, serf_ssl_context_t, and that is it.

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L136C8-L136C26

The parent structure for serf_ssl_context_t is serf_config_t. There are no other "units of work" today, serf just doesn't work like that.

Obviously one can go off on a tangent and rewrite serf to invent new "units of work" and then use this as a basis for a different error handling mechanism. This is however hugely disrespectful to the community, who need proper error messages today, and who cannot be held to ransom until someone is forced to do unrelated work.

We must provide error handling for the serf in front of us today, not some future serf that might exist in the future but doesn't.

gstein · 2025-07-02T09:51:05Z

Please do not lecture me about how serf works. "serf just doesn't work like that" ... I wrote it.

minfrin · 2025-07-02T09:51:24Z

I entirely get the desire to attach this error to "a unit of work", your choices for a unit of work in this code consist of just one structure, serf_ssl_context_t, and that is it.

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L136C8-L136C26

Looking at this closer:

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L148

    SSL *ssl;

The serf_ssl_context_t structure represents one connection by definition.

By passing serf_ssl_context_t as the baton, like we do in apache/subversion#31, we tie the error to the connection, thus solving your problem.

gstein · 2025-07-02T10:08:09Z

I'm not sold on this design, but the baton would be svn_ra_serf__connection_t * (for the svn scenario)

minfrin · 2025-07-02T11:25:12Z

I'm not sold on this design, but the baton would be svn_ra_serf__connection_t * (for the svn scenario)

We use svn_ra_serf__connection_t (third parameter conn) as a baton in the svn patch:

#if defined(HAVE_SERF_SSL_ERROR_CB_SET)
          serf_ssl_error_cb_set(conn->ssl_context,
                                ssl_error_cb,
                                conn);
#endif

dsahlberg-apache-org · 2025-07-02T20:39:17Z

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me. Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error. In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

The problem with that approach is that it "keeps stock".

What you want is to hand over the errors you encounter with the least effort possible. Right now in a trivial implementation you could just write the errors to stderr from inside the callback. No copying, no saving state anywhere, you're done.

Subversion is more sophisticated, it has the ability to stack errors. This lines up nicely where the callback can add an additional error to the stack.

Thanks (to both Brane and Graham) for their explanations. It makes sense...

send nothing instead and fail because the server said access is denied.

than just a low level openssl error.

dsahlberg-apache-org

I realise this was committed in r1926972, so the PR should really be closed...
However it needs some specific authz in GitHub that I don't seem to have.

dsahlberg-apache-org · 2025-07-15T19:45:13Z

buckets/ssl_buckets.c

+        if (err && ctx->error_callback) {
+            char ebuf[256];
+            ERR_error_string_n(err, ebuf, sizeof(ebuf));
+            ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuf);


Is it really necessary to use an internal char array and calling ERR_error_string_n to copy the error message to this buffer. The error_callback must copy the message to an application internal buffer anyway. Wouldn't it be enough to:

char *ebuf = ERR_error_string(err, NULL);
ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuff);

I agree with this. The documentation for the callback signature/compact should be that the life of the error string lasts only until the callback returns. Thus, the callback must use it immediately in some fashion, and not retain the pointer (eg. print it, or copy it).

As such, serf does not need to copy a portion of the error into a stack-based buffer.

Is it really necessary to use an internal char array and calling ERR_error_string_n to copy the error message to this buffer. The error_callback must copy the message to an application internal buffer anyway. Wouldn't it be enough to:

char *ebuf = ERR_error_string(err, NULL); ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuff);

Alas not.

ERR_error_string() uses an openssl-internal buffer that is overwritten on each call, and is not thread safe. ERR_error_string_n() fixed this.

Ok, good point. Missed that one, thanks!

Reverted to the original code form the PR in r1927348.

dsahlberg-apache-org · 2025-07-15T20:49:24Z

buckets/ssl_buckets.c

        else {
-            int err = ERR_get_error();
+            err = ERR_get_error();
            ERR_clear_error();


Why moving the declaration from here to the top of the function? Better keep scope limited whenever possible, just to catch accidental errors (pun intended).
(Yes it obviously should be unsigned long instead of int, so a change would be needed anyhow).

Also agree. All variables should be scoped as tightly as possible.

dsahlberg-apache-org · 2025-08-02T08:46:56Z

The basic patch has been committed to svn in r1926972 with some followups. There is still the ongoing discussion if we should change to a more generalised error callback structure but it can be handled in SVN since all participants are committers or with a new PR.

brainy reviewed Jul 1, 2025

View reviewed changes

The correct term is "baton".

f7c70c9

Add additional detail to the error callback description.

6b6d66e

minfrin mentioned this pull request Jul 1, 2025

Report SSL error messages from serf apache/subversion#32

Open

Add the APR status to the error callback.

217d1dd

brainy reviewed Jul 2, 2025

View reviewed changes

minfrin added 3 commits July 2, 2025 09:17

Don't log handled errors.

2a9afdf

Remove unnecessary ERR_get_error() and simplify.

4863887

Do not hang indefintely when the file can't be found, handle the error.

83cb75d

minfrin and others added 4 commits July 4, 2025 12:23

Merge branch 'trunk' into sslcb

329c475

Add error handling to new openssl3+ support.

1c8c2fa

Return a client side error when we cannot parse PKCS12 files. Don't

b881b44

send nothing instead and fail because the server said access is denied.

Put certificate failures into context, so that end users have more

1407bdc

than just a low level openssl error.

dsahlberg-apache-org reviewed Jul 15, 2025

View reviewed changes

dsahlberg-apache-org closed this Aug 2, 2025

Add support for SSL error handling #9

Add support for SSL error handling #9

Uh oh!

Conversation

minfrin commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brainy commented Jul 1, 2025

Uh oh!

minfrin commented Jul 1, 2025

Uh oh!

brainy commented Jul 1, 2025

Uh oh!

minfrin commented Jul 1, 2025

Uh oh!

dsahlberg-apache-org commented Jul 1, 2025

Uh oh!

brainy commented Jul 1, 2025

Uh oh!

minfrin commented Jul 1, 2025

Uh oh!

minfrin commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gstein commented Jul 2, 2025

Uh oh!

minfrin commented Jul 2, 2025

Uh oh!

gstein commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minfrin commented Jul 2, 2025

Uh oh!

gstein commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minfrin commented Jul 2, 2025

Uh oh!

dsahlberg-apache-org commented Jul 2, 2025

Uh oh!

dsahlberg-apache-org left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsahlberg-apache-org commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

minfrin commented Jul 1, 2025 •

edited

Loading

gstein commented Jul 2, 2025 •

edited

Loading

gstein commented Jul 2, 2025 •

edited

Loading