Skip to content

Conversation

@xworld21
Copy link
Contributor

Fix #2482 and lots of other small issues with the parser.

My intention was to simply 'linearise' the grammar, i.e., make sure it never backtracks when parsing the rules. This should improve performance quite a bit (the parser is now guaranteed to run in linear time). I am wondering how much of that can be done for the maths grammar, which has dramatic performance issues.

However, while writing some tests, I discovered that the parser is very broken. So I fixed the easy stuff (wrong functions, string handling) and added a new test to exercise the basic functionality (right now, just the functions).

There are still major issues, e.g. you can't even evaluate bin(185) without LaTeXML claiming an overflow. Strings should become tokens in case they contain unexpanded primitives. The blacklist is very fragile. Maybe the LaTeXML parser should fall back to the PGF one when it fails.

@dginev
Copy link
Collaborator

dginev commented Jan 12, 2025

Thank you for looking into this! Could you also add the new test files to MANIFEST, so that we get CI working?

The one comment I will leave for archival is that - yes, indeed - the perl handling for pgf math has been all of fragile, buggy and incomplete. But importantly - so has the raw interpretataion of the TeX sources of that package. So incremental progress here is good (and I am especially grateful you took the time to add so many new tests).

@xworld21 xworld21 changed the title Pgfmathparse \pgfmathparse improvements Jan 12, 2025
@xworld21
Copy link
Contributor Author

The math parser is now slightly less broken, but still very broken. The operator precedences are all slightly wrong – they should have been set reading the \pgfmathdeclareoperator commands in the sources.

Still wrong: the factorial operator binds higher than multiplication, but lower than power; the radians 'unit' is between addition and multiplication. Hopefully I got the rest right.

@dginev are there notable examples from arXiv that might need all of this?

@dginev
Copy link
Collaborator

dginev commented Jan 12, 2025

@dginev are there notable examples from arXiv that might need all of this?

Oh yes, certainly, pgf is a very commonly used package. Looking at the loaded files report for ar5iv, pgfmathparser.code.tex gets loaded in 436,976 articles.

If you want to fish for examples, I remember from #2237 that most/all pgfplots use the math parser, and there are ~59,000 articles using that library. That previous PR patch was based on arXiv:2104.00602, and you can find a few other reported pgf-related issues at the ar5iv tracker. But naturally arXiv readers do not distill the problems down to the unit test level. It is possible that converting the pgf/tikz showcase sites are a better way to fish out relevant examples.

But yes, this is very much a sizeable need in arXiv.

@xworld21
Copy link
Contributor Author

Uhm, the <pagination role="newpage"></pagination> error changes with TeX Live version. Is there a standard solution for that particular problem?

@dginev
Copy link
Collaborator

dginev commented Jan 13, 2025

Is there a standard solution for that particular problem?

I am not really familiar with this kind of texlive version mismatch. So is the issue that there are two trailing \pars deposited just before \end{document}?

One simple workaround could be to replace the \par in your test macro with something like \newline I suppose. But would really need to get back to a texlive 2021 to test it out.

@dginev dginev requested review from brucemiller and dginev January 13, 2025 15:43
@dginev
Copy link
Collaborator

dginev commented Jan 13, 2025

@xworld21 OK, tried on a server with an old texlive, it looks like there is an extra \clearpage somehow added to the end of even the smallest pgf-loading document:

\documentclass{article}
\usepackage{pgf}
\begin{document}\end{document}

What you can do for the CI test is to add \let\clearpage\relax after \usepackage{pgf}, as I assume you won't be needing that feature in this particular math-oriented setup. That works in my setup.

@xworld21
Copy link
Contributor Author

What you can do for the CI test is to add \let\clearpage\relax after \usepackage{pgf}

Done!

In the meanwhile, I changed my mind. I can definitely fix the radians operator at least, and implement gcd. I'll do more tweaks and throw in some long expressions without parentheses in the tests, to catch other precedence errors.

@xworld21 xworld21 marked this pull request as ready for review January 18, 2025 18:24
@brucemiller
Copy link
Owner

Interestingly, our raw TeX is gradually good enough that using only the raw pgfmath.code.tex actually works (except for some shading errors), but it is sooooo slow.

@xworld21
Copy link
Contributor Author

By the way, this should be draft as well – I discovered that it breaks pgfplots somehow! Maybe because of ExpandedPartially? I don't know right now, I need to test it more carefully and squash the remaining bugs before calling it done.

@xworld21 xworld21 marked this pull request as draft March 13, 2025 20:43
@xworld21 xworld21 marked this pull request as ready for review November 22, 2025 18:39
@xworld21
Copy link
Contributor Author

I am afraid I didn't take note of which PGF plot was broken by this PR. I tried a few examples after rebasing to master and it looks fine. This needs to be tested in the wild.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retested with arXiv:2104.00602, which is OK.

However, the dominoes test which is mentioned in the issue regresses - the arrangement seems to flip around. I half-remember this being an old problem before #2237 got merged.

Would be nice to get that back to working before merging here.

@xworld21
Copy link
Contributor Author

However, the dominoes test which is mentioned in the issue regresses - the arrangement seems to flip around. I half-remember this being an old problem before #2237 got merged.

Ok, the problem comes from the line

\pgfpoint{sin(\yy)*(\xx)}{-((\xx)/75)^2+(\zz)/100*(\xx)}

Switching to

\pgfpoint{sin(\yy)*(\xx)}{-1*((\xx)/75)^2+(\zz)/100*(\xx)}

is fine. I guess the unary - is not associating properly and it becomes part of the left operand of ^. Probably.

@xworld21
Copy link
Contributor Author

The dominoes look good now. Ideally, one should really rewrite the grammar to follow pgfmathparser.code.tex more closely rather than patching the existing grammar, but let's shelve that for later.

Further note: I am fairly convinced that back in March, I pasted a pgfplots example without adding a necessary usepackage at the beginning, which made me think that the new parser was breaking things. False alarm, most likely!

dginev added a commit to arXiv/LaTeXML that referenced this pull request Dec 15, 2025
Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I can confirm the dominoes test works again.

I am also seeing some healthy (or at least healthier than before) plot conversions when testing on a handful of ar5iv issues that contain pgfplots reports:
dginev/ar5iv#147, dginev/ar5iv#335, dginev/ar5iv#402, dginev/ar5iv#493, dginev/ar5iv#557.

I also checked ar5iv:1804.07114, where the plots are not yet perfect, but the legend placement is better.

Lastly, the new XML test values match the PDF values, read through. The three classes of diffs in that test which I see are:

  • the minor precision differences, as with pi r computing 180.0 in latexml and 179.99962 in pdflatex.
  • the consistency in using floats in the binding, as in sign(0) being 0.0 instead of 0
  • width("Some Lovely Text") seems to show the full \textwidth (469.75) instead of the width of the box (83.99927)

So, to my limited expectations, this PR appears to be an improvement and is good to merge.
Thanks a lot for the expertise!

@dginev
Copy link
Collaborator

dginev commented Dec 15, 2025

I have started testing this PR (with some of the others tucked in) on our arXiv sandboxes.

There seems to be a widespread regression (Fatals with 100 errors) due to no longer parsing some pgfmath epxressions.

As a first example, arXiv:1511.02101 has errors of the kind:

Error:pgfparse:pgfparse Parse of '70.40544pt-\pgfdecoratedcompleteddistance-1sp' failed
	at inclusionvcd.tex; line 399 col 0 - line 399 col 178
	LTX: '70.40544'
	Left: -\pgfdecoratedcompleteddistance-1sp
	Next token is T_CS[\advance] ( == Core::Definition::Primitive[\advance Variable SkipKeyword:by])
	In Core::Definition::Primitive[\pgfmaths... 

I will try to add more details here later today.

@xworld21
Copy link
Contributor Author

xworld21 commented Dec 15, 2025

So, to my limited expectations, this PR appears to be an improvement and is good to merge.

Let me add: there are more inconsistencies. For instance, PGF parses 2^3^2 as (2^3)^2 but 2^-3^2 as 2^(-3^2), and that is not accounted for yet here. However, I am in favour of merging as is because it should be a substantial net improvement, hopefully with negligible regressions. We can add tests and improvements later, and eventually replace the grammar with something that tracks pgfmathparser almost line by line.

By the way, should we merge the new test with the stress_pgfmath, that I somehow did not notice?

@dginev
Copy link
Collaborator

dginev commented Dec 15, 2025

By the way, should we merge the new test with the stress_pgfmath, that I somehow did not notice?

Oh, sure. I think reusing that simplifies admin a little, they are exercising the same functionality.

@dginev
Copy link
Collaborator

dginev commented Dec 16, 2025

With the latest fixes we are closer to (but still a bit weaker than) the tikz-cd arXiv sandbox. That has just under 20,000 articles and at time of writing goes from 8.98% Fatal to 9.99% Fatal articles, as well as from 33.32% Error to 39.97% Error articles.

Also somewhat worryingly the conversions appear to be slower than before (anecdotally for now). That said, the changeset I am testing includes more than only this PR, so likely not all causes are rooted here. I will investigate some more today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

\pgfmathparse calls every function *twice*

3 participants