Skip to content

Conversation

@minfrin
Copy link
Contributor

@minfrin minfrin commented Jul 1, 2025

  • Allow the registration of an optional callback using serf_ssl_error_cb_set().

  • If the callback is registered, return a fixed string describing the error as created by the underlying crypto library.

Example:

[minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/
svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core'
svn: E120170: TLS: error:0308010C:digital envelope routines::unsupported
svn: E120170: TLS: could not parse PKCS12: /home/minfrin/.my-cert.p12

- Allow the registration of an optional callback using
  serf_ssl_error_cb_set().

- If the callback is registered, return a fixed string
  describing the error as created by the underlying crypto
  library.

Example:

[minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/
svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core'
svn: E120171: TLS: error:0308010C:digital envelope routines::unsupported
svn: E120171: Error running context: An error occurred during SSL communication
ssl_ctx->protocol_userdata = NULL;

ssl_ctx->error_callback = NULL;
ssl_ctx->error_userdata = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call those batons, so: error_baton. Hmph, should do the same for protocol_userdataprotocol_baton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been made so.

* Callback type for detailed TLS error strings.
*/
typedef apr_status_t (*serf_ssl_error_cb_t)(
void *data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: void *baton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void serf_ssl_error_cb_set(
serf_ssl_context_t *context,
serf_ssl_error_cb_t callback,
void *data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@brainy
Copy link
Member

brainy commented Jul 1, 2025

Since this callback is on the context, it's serialised within the context run right? I assume that the idea is to store those messages at the caller and report them if serf_context_run() fails (or report them as warnings if it doesn't, I guess.)

If that's the case, it would be nice to document that in a docstring in serf_bucket_types.h. Right now it's not at all obvious how that callback should be implemented.

@minfrin
Copy link
Contributor Author

minfrin commented Jul 1, 2025

Since this callback is on the context, it's serialised within the context run right? I assume that the idea is to store those messages at the caller and report them if serf_context_run() fails (or report them as warnings if it doesn't, I guess.)

If that's the case, it would be nice to document that in a docstring in serf_bucket_types.h. Right now it's not at all obvious how that callback should be implemented.

All of this is pretty much the caller's problem.

The callback just fires every time there is an error, it's up to the caller to pass a baton that makes sense, and a callback routine that understands the baton.

In existing subversion, the context run keeps going until an SSL error happens, and if an SSL error happens the whole context is shut down and the client gives up. In this case the callback fires first, and subversion (shortly) makes a copy of the message for later. When later comes, any error message from a callback is added to svn_error_t.

Other applications could work completely differently, serf shouldn't care.

@brainy
Copy link
Member

brainy commented Jul 1, 2025

It's not about Serf's caring, it's about the user knowing the callback will get called (just) before serf_context_run() returns (with an error status?). "Use the source, Luke!" is fine until it suddenly isn't. :)

@minfrin
Copy link
Contributor Author

minfrin commented Jul 1, 2025

It's not about Serf's caring, it's about the user knowing the callback will get called (just) before serf_context_run() returns (with an error status?). "Use the source, Luke!" is fine until it suddenly isn't. :)

Does the added description make sense?

It's literally "here is the detail string for the failure you are about to experience, log it, add it to your error stack, do with it as you will".

Unfortunately the best option was for there to not be a comment "Detect more specific errors?" in the code followed by no way to return a more specific error. We're doing the best we can with what we have available.

minfrin added a commit to minfrin/subversion that referenced this pull request Jul 1, 2025
Add the most detailed underlying crypto library error string
to the error stack when the context fails due to an SSL failure.

SSL errors are no longer reduced to "an error has occurred".

This relies on the serf_ssl_error_cb_t callback as provided
by serf in apache/serf#9.

Example:

[minfrin@rocky9 subversion]$ svn info https://svn.example.com/svn/example/core/
svn: E170013: Unable to connect to a repository at URL 'https://svn.example.com/svn/example/core'
svn: E120171: TLS: error:0308010C:digital envelope routines::unsupported
svn: E120171: Error running context: An error occurred during SSL communication
@dsahlberg-apache-org
Copy link
Collaborator

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me.
Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error.
In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

@brainy
Copy link
Member

brainy commented Jul 1, 2025

Serf is asynchronous, and one serf-context can juggle multiple connections. You need a callback if you want to have a hope in hell of keeping all this straight on the error-reporting side.

That said, however: the callback doesn't know which connection a given error belongs to. That could be a problem. typical case would be a context with one SSL connection to a server and one connection to an OCSP responder that validates the server cert. It'd be strange if, e.g., OCSP connection errors were reported against a Subversion server.

OK, OCSP connections shouldn't use SSL. Which brings up the point: why would this callback only be used to report SSL errors, when there can be other kinds of error descriptions available for other connection types, and they, too, can't know what actually happened.

@minfrin
Copy link
Contributor Author

minfrin commented Jul 1, 2025

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me. Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error. In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

The problem with that approach is that it "keeps stock".

What you want is to hand over the errors you encounter with the least effort possible. Right now in a trivial implementation you could just write the errors to stderr from inside the callback. No copying, no saving state anywhere, you're done.

Subversion is more sophisticated, it has the ability to stack errors. This lines up nicely where the callback can add an additional error to the stack.

@minfrin
Copy link
Contributor Author

minfrin commented Jul 1, 2025

Serf is asynchronous, and one serf-context can juggle multiple connections. You need a callback if you want to have a hope in hell of keeping all this straight on the error-reporting side.

That said, however: the callback doesn't know which connection a given error belongs to. That could be a problem. typical case would be a context with one SSL connection to a server and one connection to an OCSP responder that validates the server cert. It'd be strange if, e.g., OCSP connection errors were reported against a Subversion server.

OK, OCSP connections shouldn't use SSL. Which brings up the point: why would this callback only be used to report SSL errors, when there can be other kinds of error descriptions available for other connection types, and they, too, can't know what actually happened.

Options are limited inside serf.

The two places where errors are generated are https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L1538 and https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L1054, and the only context that is available at these locations are serf_ssl_context_t.

This isn't a crisis though. At a future date, we could add an error callback tied to a connection, and let the caller decide what scope of errors they want. In other words, we don't need to delay providing error messages today in the hope of being perfect in the future.


if (err && ctx->error_callback) {
char ebuf[256];
ERR_error_string_n(err, ebuf, sizeof(ebuf));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just above, you call ERR_error_string() to log the error, while here, you copy the string to the stack. Is this actually necessary? I'd just document to callback implementors that they have to copy the string inside the callback and call ERR_error_string() just once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging was left unchanged, but really should go.

The logging is the crutch that was supporting the lack of error handling. Libraries should definitely not be logging anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the "libraries should not be logging" sentiment. They shouldn't dump to stdout or stderr, but Serf doesn't do that -- it writes to whatever logging sink the application cares to provide, so basically integrates with the application's logging infrastructure. I've found this to be extremely useful for debugging; not everything can be done with error handling, no matter how sophisticated. Because logging by its nature leaves an audit trail for states that are not error conditions and can't even be captured when an error occurs.

In this case, I agree, we shouldn't be doing both.

}
else {
err = ERR_get_error();
ERR_clear_error();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't pass err to log_ssl_error() and clear it here; doesn't that mean that nothing will happen? Why even call ERR_get_error() here, given that it has no effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from before we used log_ssl_error(), I've simplified this.

@gstein
Copy link
Member

gstein commented Jul 2, 2025

If a context is running 4 connections to a server, and they are sharing an SSL context, then this callback attached to the shared context cannot identify the connection, can it? (I kinda recall reading that somewhere)

I'm also thinking that this "error callback" being specialized only to SSL is missing a review of how to do this properly for serf in general. And yes, I DO want a general design, rather than tack on a "single solution today" and "generalize later".

To generalize it (likely beyond scope of desire of this PR), I'd seek some kind of baton/identifier of "what request is this error attached to?" ... I believe the request is the "unit of work" and any error would be associated with that. Could be a response, but that is in response to a request. ... Hmm. I guess one could also associate an error with a connection, or a shared context. eg. timed out a connection setup before we even tried to deliver a request.

In short, I do not think it appropriate to apply a hack to fix one case, when we don't have a conceptual design (even if not fully-implemented) for a serf-wide sophisticated error handling design.

@minfrin
Copy link
Contributor Author

minfrin commented Jul 2, 2025

If a context is running 4 connections to a server, and they are sharing an SSL context, then this callback attached to the shared context cannot identify the connection, can it? (I kinda recall reading that somewhere)

This is a general limitation of serf that exists already. When serf fails, it returns apr_status_t. What connection that apr_status_t belongs to, serf doesn't tell you.

I'm also thinking that this "error callback" being specialized only to SSL is missing a review of how to do this properly for serf in general. And yes, I DO want a general design, rather than tack on a "single solution today" and "generalize later".

To generalize it (likely beyond scope of desire of this PR), I'd seek some kind of baton/identifier of "what request is this error attached to?" ... I believe the request is the "unit of work" and any error would be associated with that. Could be a response, but that is in response to a request. ... Hmm. I guess one could also associate an error with a connection, or a shared context. eg. timed out a connection setup before we even tried to deliver a request.

In short, I do not think it appropriate to apply a hack to fix one case, when we don't have a conceptual design (even if not fully-implemented) for a serf-wide sophisticated error handling design.

Right now, I want to contact a vendor and say "I get this message, please help", and not receive from the vendor pages and pages of "it could be this, or it could be that, or maybe this, or maybe that", and then eventually find out that it was actually none of the above, and all of this occurred for the want of an actual error message.

Our community is orders of magnitude more important than the code.

I entirely get the desire to attach this error to "a unit of work", your choices for a unit of work in this code consist of just one structure, serf_ssl_context_t, and that is it.

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L136C8-L136C26

The parent structure for serf_ssl_context_t is serf_config_t. There are no other "units of work" today, serf just doesn't work like that.

Obviously one can go off on a tangent and rewrite serf to invent new "units of work" and then use this as a basis for a different error handling mechanism. This is however hugely disrespectful to the community, who need proper error messages today, and who cannot be held to ransom until someone is forced to do unrelated work.

We must provide error handling for the serf in front of us today, not some future serf that might exist in the future but doesn't.

@gstein
Copy link
Member

gstein commented Jul 2, 2025

Please do not lecture me about how serf works. "serf just doesn't work like that" ... I wrote it.

@minfrin
Copy link
Contributor Author

minfrin commented Jul 2, 2025

I entirely get the desire to attach this error to "a unit of work", your choices for a unit of work in this code consist of just one structure, serf_ssl_context_t, and that is it.

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L136C8-L136C26

Looking at this closer:

https://github.com/apache/serf/blob/trunk/buckets/ssl_buckets.c#L148

    SSL *ssl;

The serf_ssl_context_t structure represents one connection by definition.

By passing serf_ssl_context_t as the baton, like we do in apache/subversion#31, we tie the error to the connection, thus solving your problem.

@gstein
Copy link
Member

gstein commented Jul 2, 2025

I'm not sold on this design, but the baton would be svn_ra_serf__connection_t * (for the svn scenario)

@minfrin
Copy link
Contributor Author

minfrin commented Jul 2, 2025

I'm not sold on this design, but the baton would be svn_ra_serf__connection_t * (for the svn scenario)

We use svn_ra_serf__connection_t (third parameter conn) as a baton in the svn patch:

#if defined(HAVE_SERF_SSL_ERROR_CB_SET)
          serf_ssl_error_cb_set(conn->ssl_context,
                                ssl_error_cb,
                                conn);
#endif

@dsahlberg-apache-org
Copy link
Collaborator

Maybe this is the same question already asked by Brane, and I'm just not clever enough to understand it. But the whole design seems a little bit convoluted to me. Why not have something similar to what OpenSSL seems to do: When a Serf function return non-success, it is possible for the caller to call something in Serf "Give me the first error". This could then be done repeatedly in case there are more than 1 error. In the case of OpenSSL errrors, that function could be just a wrapper around OpenSSL's ERR_get_error().

The problem with that approach is that it "keeps stock".

What you want is to hand over the errors you encounter with the least effort possible. Right now in a trivial implementation you could just write the errors to stderr from inside the callback. No copying, no saving state anywhere, you're done.

Subversion is more sophisticated, it has the ability to stack errors. This lines up nicely where the callback can add an additional error to the stack.

Thanks (to both Brane and Graham) for their explanations. It makes sense...

Copy link
Collaborator

@dsahlberg-apache-org dsahlberg-apache-org left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this was committed in r1926972, so the PR should really be closed...
However it needs some specific authz in GitHub that I don't seem to have.

if (err && ctx->error_callback) {
char ebuf[256];
ERR_error_string_n(err, ebuf, sizeof(ebuf));
ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to use an internal char array and calling ERR_error_string_n to copy the error message to this buffer. The error_callback must copy the message to an application internal buffer anyway. Wouldn't it be enough to:

char *ebuf = ERR_error_string(err, NULL);
ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuff);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. The documentation for the callback signature/compact should be that the life of the error string lasts only until the callback returns. Thus, the callback must use it immediately in some fashion, and not retain the pointer (eg. print it, or copy it).

As such, serf does not need to copy a portion of the error into a stack-based buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to use an internal char array and calling ERR_error_string_n to copy the error message to this buffer. The error_callback must copy the message to an application internal buffer anyway. Wouldn't it be enough to:

char *ebuf = ERR_error_string(err, NULL); ctx->error_callback(ctx->error_baton, ctx->fatal_err, ebuff);

Alas not.

ERR_error_string() uses an openssl-internal buffer that is overwritten on each call, and is not thread safe. ERR_error_string_n() fixed this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good point. Missed that one, thanks!

Reverted to the original code form the PR in r1927348.

else {
int err = ERR_get_error();
err = ERR_get_error();
ERR_clear_error();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why moving the declaration from here to the top of the function? Better keep scope limited whenever possible, just to catch accidental errors (pun intended).
(Yes it obviously should be unsigned long instead of int, so a change would be needed anyhow).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also agree. All variables should be scoped as tightly as possible.

@dsahlberg-apache-org
Copy link
Collaborator

The basic patch has been committed to svn in r1926972 with some followups. There is still the ongoing discussion if we should change to a more generalised error callback structure but it can be handled in SVN since all participants are committers or with a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants