-
Notifications
You must be signed in to change notification settings - Fork 128
\pgfmathparse improvements #2485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for looking into this! Could you also add the new test files to MANIFEST, so that we get CI working? The one comment I will leave for archival is that - yes, indeed - the perl handling for pgf math has been all of fragile, buggy and incomplete. But importantly - so has the raw interpretataion of the TeX sources of that package. So incremental progress here is good (and I am especially grateful you took the time to add so many new tests). |
55f2867 to
34a7680
Compare
|
The math parser is now slightly less broken, but still very broken. The operator precedences are all slightly wrong – they should have been set reading the Still wrong: the factorial operator binds higher than multiplication, but lower than power; the radians 'unit' is between addition and multiplication. Hopefully I got the rest right. @dginev are there notable examples from arXiv that might need all of this? |
Oh yes, certainly, pgf is a very commonly used package. Looking at the loaded files report for ar5iv, pgfmathparser.code.tex gets loaded in 436,976 articles. If you want to fish for examples, I remember from #2237 that most/all pgfplots use the math parser, and there are ~59,000 articles using that library. That previous PR patch was based on arXiv:2104.00602, and you can find a few other reported pgf-related issues at the ar5iv tracker. But naturally arXiv readers do not distill the problems down to the unit test level. It is possible that converting the pgf/tikz showcase sites are a better way to fish out relevant examples. But yes, this is very much a sizeable need in arXiv. |
34a7680 to
356b535
Compare
356b535 to
cbf2017
Compare
|
Uhm, the |
I am not really familiar with this kind of texlive version mismatch. So is the issue that there are two trailing One simple workaround could be to replace the |
|
@xworld21 OK, tried on a server with an old texlive, it looks like there is an extra \documentclass{article}
\usepackage{pgf}
\begin{document}\end{document}What you can do for the CI test is to add |
cbf2017 to
5977884
Compare
Done! In the meanwhile, I changed my mind. I can definitely fix the radians operator at least, and implement gcd. I'll do more tweaks and throw in some long expressions without parentheses in the tests, to catch other precedence errors. |
5977884 to
542d3ee
Compare
542d3ee to
1985cd8
Compare
|
Interestingly, our raw TeX is gradually good enough that using only the raw |
|
By the way, this should be draft as well – I discovered that it breaks pgfplots somehow! Maybe because of |
Hardcoding the function names in a regex causes the parser to misinterpret function names (e.g. 'cosec' is interpreted greedily as 'cos e' and fails.)
1985cd8 to
d1e00ad
Compare
|
I am afraid I didn't take note of which PGF plot was broken by this PR. I tried a few examples after rebasing to master and it looks fine. This needs to be tested in the wild. |
d1e00ad to
d3d1af4
Compare
dginev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested with arXiv:2104.00602, which is OK.
However, the dominoes test which is mentioned in the issue regresses - the arrangement seems to flip around. I half-remember this being an old problem before #2237 got merged.
Would be nice to get that back to working before merging here.
Ok, the problem comes from the line \pgfpoint{sin(\yy)*(\xx)}{-((\xx)/75)^2+(\zz)/100*(\xx)}Switching to \pgfpoint{sin(\yy)*(\xx)}{-1*((\xx)/75)^2+(\zz)/100*(\xx)}is fine. I guess the unary |
d3d1af4 to
cd0b52a
Compare
|
The dominoes look good now. Ideally, one should really rewrite the grammar to follow pgfmathparser.code.tex more closely rather than patching the existing grammar, but let's shelve that for later. Further note: I am fairly convinced that back in March, I pasted a pgfplots example without adding a necessary usepackage at the beginning, which made me think that the new parser was breaking things. False alarm, most likely! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I can confirm the dominoes test works again.
I am also seeing some healthy (or at least healthier than before) plot conversions when testing on a handful of ar5iv issues that contain pgfplots reports:
dginev/ar5iv#147, dginev/ar5iv#335, dginev/ar5iv#402, dginev/ar5iv#493, dginev/ar5iv#557.
I also checked ar5iv:1804.07114, where the plots are not yet perfect, but the legend placement is better.
Lastly, the new XML test values match the PDF values, read through. The three classes of diffs in that test which I see are:
- the minor precision differences, as with
pi rcomputing180.0in latexml and179.99962in pdflatex. - the consistency in using floats in the binding, as in
sign(0)being0.0instead of0 width("Some Lovely Text")seems to show the full\textwidth(469.75) instead of the width of the box (83.99927)
So, to my limited expectations, this PR appears to be an improvement and is good to merge.
Thanks a lot for the expertise!
|
I have started testing this PR (with some of the others tucked in) on our arXiv sandboxes. There seems to be a widespread regression (Fatals with 100 errors) due to no longer parsing some pgfmath epxressions. As a first example, arXiv:1511.02101 has errors of the kind: I will try to add more details here later today. |
Let me add: there are more inconsistencies. For instance, PGF parses By the way, should we merge the new test with the stress_pgfmath, that I somehow did not notice? |
Oh, sure. I think reusing that simplifies admin a little, they are exercising the same functionality. |
|
With the latest fixes we are closer to (but still a bit weaker than) the tikz-cd arXiv sandbox. That has just under 20,000 articles and at time of writing goes from 8.98% Fatal to 9.99% Fatal articles, as well as from 33.32% Error to 39.97% Error articles. Also somewhat worryingly the conversions appear to be slower than before (anecdotally for now). That said, the changeset I am testing includes more than only this PR, so likely not all causes are rooted here. I will investigate some more today. |
Fix #2482 and lots of other small issues with the parser.
My intention was to simply 'linearise' the grammar, i.e., make sure it never backtracks when parsing the rules. This should improve performance quite a bit (the parser is now guaranteed to run in linear time). I am wondering how much of that can be done for the maths grammar, which has dramatic performance issues.
However, while writing some tests, I discovered that the parser is very broken. So I fixed the easy stuff (wrong functions, string handling) and added a new test to exercise the basic functionality (right now, just the functions).
There are still major issues, e.g. you can't even evaluate
bin(185)without LaTeXML claiming an overflow. Strings should become tokens in case they contain unexpanded primitives. The blacklist is very fragile. Maybe the LaTeXML parser should fall back to the PGF one when it fails.