Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize whitespace in HTML token diff #198

Open
Mr0grog opened this issue Feb 20, 2025 · 1 comment
Open

Normalize whitespace in HTML token diff #198

Mr0grog opened this issue Feb 20, 2025 · 1 comment

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Feb 20, 2025

Sometimes we wind up showing a change on text that looks unchanged because nearby whitespace changed in a non-visible way. There are just a few main versions of this:

  1. Whitespace changed in a way that’s totally meaningless in HTML (outside of <pre> elements or white-space: pre* styled elements). Since multiple spaces and line breaks all get collapsed into a single space in HTML, they’re not meaningful changes unless in specific contexts. We should normalize them to a single space.
  2. Spaces getting swapped out for non-breaking spaces (or vice-versa). These changes are technically different an may have a subtle impact on page layout, but are not semantically different for users. I’m thinking a good way to handle this is to give tokens a diffable text representation (in which non-breaking spaces are replaced by spaces) and a literal text representation (where these types of characters are unchanged). The former is used for comparisons, but the latter is used when stitching the actual diff back together.
  3. Different kinds of more fancy spaces get swapped out (hair space, em space, etc.). These are more visible to the user, but still usually not that meaningful for most of this diff’s use cases. The right solution here is probably the same as for (2) above.

Here’s an example of case (2) above: https://monitoring.envirodatagov.org/page/4415ea86-293e-48ab-9b4f-da2382cc4200/c43894cb-b954-40d7-be18-4ce14a22a90b..e8844efa-fd2b-41d5-a451-1cc54c1d680a

Image

@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 20, 2025

We should probably have something similar in the links diff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant