Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent parsing of message_id and in_reply_to #197

Open
nielspeen opened this issue Feb 3, 2022 · 4 comments
Open

Inconsistent parsing of message_id and in_reply_to #197

nielspeen opened this issue Feb 3, 2022 · 4 comments
Labels
bug Something isn't working Feature request validated

Comments

@nielspeen
Copy link

nielspeen commented Feb 3, 2022

Describe the bug

Header::parse removes < and > from message_id, but not from in_reply_to.

A message with Message-ID: <[email protected]> will have $message->message_id="[email protected]"

A message with In-Reply-To: <[email protected]> will have $message->in_reply_to="<[email protected]>"

As a result the two cannot be compared or used in database queries without first removing < and > from in_reply_to.

Expected behavior

message_id and in_reply_to should use the same format/parsing, so they can be used in queries and comparisons.

@Webklex Webklex added bug Something isn't working validated labels Feb 3, 2022
@Webklex
Copy link
Owner

Webklex commented Feb 3, 2022

Hi @nielspeen ,
many thanks for your report.

I just checked the Header.php and you are absolutely right:

php-imap/src/Header.php

Lines 212 to 214 in ed93fe4

if (property_exists($header, 'message_id')) {
$this->set("message_id", str_replace(['<', '>'], '', $header->message_id));
}

The message_id gets a special treatment whereas in_reply_to doesn't.

Best regards,

@HelloSebastian
Copy link
Contributor

HelloSebastian commented Feb 16, 2022

It seems like the same thing is happening with references.

Does this have a particular background or could we remove the less than / greater than signs in reply-to and references?

PS: Awesome project :)

@HelloSebastian
Copy link
Contributor

Maybe it is easy not to delete the signs at the message id?

I already tried to find out why the characters at the message ids are present at all. Unfortunately I haven't found anything about it in the RFCs so far.

@HelloSebastian
Copy link
Contributor

After a long search to see if the angle brackets belong to the message id, I finally found something in the RFC.

According to RFC 2822 (page 25):

Semantically, the angle brackets are not part of the msg-id; the msg-id is what is contained between the two angle brackets characters.

Other sources:
RFC 822 Chapter 3.4.6

Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable reference (e.g., delimiting mailboxes), possibly including source-routing to the machine.

https://stackoverflow.com/a/34811337/10599992

The maximum line length per the RFC you cite is 998 characters. That would include the "Message-ID:" field name, but you can do line folding between the field name and the field body. The line containing the actual Message-ID would then contain a space (the folding whitespace), "<", Message-ID, and ">". Semantically, the angle brackets are not part of the Message-ID. Therefore you end up with a maximum of 998 - 3 = 995 characters.

I therefore interpret that all message ids must have the angle brackets removed, since they are only used to identify the number in question. We need to handle the In-Reply-to and references headers separately.

What do you think about it, @Webklex?

Should we maybe write a separate class for the three headers, similar to the address class? In this we could then also provide the message ids with the angle brackets.

I look forward to hearing your thoughts on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Feature request validated
Projects
None yet
Development

No branches or pull requests

3 participants