Replies: 10 comments 8 replies
-
My personal problem (or maybe it's better to say "worry") are mainly these two points:
plus it is
The problem this discussion tries to solve is the smaller one for me - I'd go for Variant 4A for simplicity. It just requires building ctags and then copying the generated code which isn't really a big problem. The generated blob would have to stay in the repository but since it would be typically updated once a release, it shouldn't lead to big diffs. In any case, I don't think it's something the ctags project should worry about - it's really up to us what we want to do with ctags in Geany. |
Beta Was this translation helpful? Give feedback.
-
As @techee wrote, 4a may be the best. Making .c and .h files is easy.
|
Beta Was this translation helpful? Give feedback.
-
In my judgement only 4a "build the .h .c at upgrade" is viable, and if the clone of uctags has packcc thats fine. Pegof would need to be packaged by most distros since it uses a different build tool and is mostly a one person project so thats a future improvement. But as @techee said, we are sort of paranoid about performance, the parsers need to run between keystrokes, and stuttering typing is pretty unacceptable. I was told that the generated toml.c is 5000 lines, thats 10% of the total of all the other parsers except C++, and its more than a third of the C++ parser. Whilst the size is not a major indicator of speed, maybe only a teeny part of that code is run each parse, but then what is it there for, so the concern remains. And it would be compiled into everybodys Geany, whether they use toml or not. And how big is kotlin.c? Maybe someone who has a copy of toml.c and kotlin.c (not pegof-ed) could post to a gist so we can all see the gory details. I can see both arguments about reviewability of the toml.c @techee is right its a major risk vector, and @dolik-rce is right about trust your compiler, although its not a widely used one. So for me, since you are both correct, that doesn't make the decision.
Google says the kotlin LSP is looking for a maintainer, maybe you could improve that since Geany now has initial support for LSPs 😁 |
Beta Was this translation helpful? Give feedback.
-
Thats not C its a-cc-embler 😉 ... and TOTALLY unreviewable. But even though it seems pretty simple it will still be fairly large when compiled. One of the benefits of LSP is that it isn't compiled into Geany, so no matter how big one LSP is it won't cost anything to someone not using that language. Whilst the "small and lightweight" Geany ship sailed a long time ago we always have concerns that we should not expand Geany too much when there are regular posts from users on Raspberry Pi systems. Or a PR to support loadable DLL parsers instead would be a "good thing" ™️ so users only pay the price for the languages they use and we don't need to care how big any parser is. But that is probably a Geany thing, not a uctags thing. And kotlin.c was so big it broke gist 😄 Anyway since uctags includes the packcc tool it really doesn't have anything to do, its up to Geany how it addresses the issue. PS I was serious about the Kotlin LSP needing help https://github.com/fwcd/kotlin-language-server?tab=readme-ov-file#this-repository-needs-your-help |
Beta Was this translation helpful? Give feedback.
-
Kind of suspected that ;-). From what I remember, on Raspberry pi 3/4 a 400 LOC Kotlin file started do produce unacceptable slowdowns with the PEG parser (probably was the unoptimized version). And even if the optimized version is faster for normal editing, I'm quite worried that we can't easily check if there isn't some pathological path in the grammar that would produce much worse slow-downs when some specific conditions are met. I don't want to say we should never use a PEG parser in Geany and I don't want to veto such attempts if others a have different opinion, I'm just not very thrilled about it myself.
To be clear, I'm sure you did a great job with the PEG parser - for me the problem are PEG parsers in general, not your work. |
Beta Was this translation helpful? Give feedback.
-
@techee What do you think about bison/flex? This is just a question. |
Beta Was this translation helpful? Give feedback.
-
Or implement a LSP server based on universal-ctags (with all the parsers it contains). I think it would be a super-cool (but huge) project. One could grab what we call the "tag manager" in Geany plus some extra code from the files that use it and add the JSON RPC API of LSP. I was playing with the idea of implementing it myself but it's really a big project and I've just finished the LSP plugin which consumed a huge amount of my time and don't plan anything bigger now. |
Beta Was this translation helpful? Give feedback.
-
They'll be
Again, I personally prefer hand-written parsers which one can easily debug and see what's going on by looking at their code. |
Beta Was this translation helpful? Give feedback.
-
One point on Bison is it only handles a subset of syntaxes (LR(1) IIRC and context free), so its ok if your language happens to fit in that mold, but otherwise its forced into using the LALR or GLR extensions and those can have the same overheads as PEG. And no guarantee it will handle context sensitivity at all, it may be necessary to hand roll extras. Sadly nothing is free, to get better performance there is a trade off of generality, but if the language needs generality the performance goes down. You can't win My understanding is that several language implementations have moved away from Bison for some of these reasons. One thing that has been missed with the discussions of the Kotlin parser speed is that most hand rolled ctags parsers skip a lot of code, IIUC none of them parse expressions or statements fully, but at least at a quick glance the Kotlin PEG describes the full language including statements and expressions. So it is likely doing a lot of work that is not useful for the ctags use-case (just parse declarations). Maybe if that could be removed it will be a good deal faster since most programs have more statements and expressions than declarations. And how Bison can be made to skip statements and expressions is also unknown. Finally, how do PEG and Bison parsers handle the issue of incorrect code, being run between keystrokes there is no guarantee that the code will parse correctly (since the user hasn't finished typing yet). The ctagsd LSP that @masatake posted could push the problem out of Geany 😜 of course. |
Beta Was this translation helpful? Give feedback.
-
Geany developers are somewhat reluctant to adopt PEG based parsers (see geany/geany#3934 or geany/geany#3034). I do understand most of their points:
packcc
and now also optionallypegof
) on all supported platformsHopefully, at least some of these could be resolved. I have few ideas, which I'd like to share and discuss, even though I'm aware that none of them is perfect. Maybe other people will be able to think about other ways, or improve upon these. Here they are:
1. Extended source code distribution
The code generated by packcc is platform independent. So there is no need to compile it every time the final product (i.e. Geany) is built. There could be some way to distribute ctags sources with the PEG parsers pre-generated. The build system might need a few tweaks to be choose whether it should use the pre-generated files or generate them if they are not present.
Variant 1A: Extended source tarball
Ctags project could provide those sources directly in its releases as another source tarball.
Pros:
Cons:
Variant 1B: External repository
It should be possible to create a separate repository, that would mirror all the changes in ctags and automatically replace the PEG files with the generated code.
Pros:
Cons:
2. Distribute ctags as a library
Geany could simply link against ctags in form of static or dynamic library.
Pros:
Cons:
3. Keep the generated code in the repository
The parser code could by generated during the development process and kept in the version control. This is kind of against the good manners and I suggest it just for the sake of completeness.
Pros:
Cons:
4. Provide a way to generate the parsers easily
One of the main problems is that packcc and pegof are not readily available on most platforms. Even if they can be built for many, they can't be easily installed using package manager, since there are no packages for them.
However ctags repository could provide a simple script that would clone the repositories for these tools, compile them and then use them to generate the source code for parsers.
Variant 4A: generate parsers on ctags upgrade only
Since Geany just copies the ctags code into its own repository, this would mean the script only needs to be run by the person who upgrades ctags.
Pros:
Cons:
Variant 4B: generate parsers on each build
It would of course also be possible to just copy the *.peg files and run the generator on each build.
Pros:
Cons:
5. Parser-generator-as-a-service
It would be possible to create a web service that would accept PEG grammar as input and respond with the generated source code. This could be queried either during the ctags upgrade or as part of the build process (so it would have similar pros and cons as previous variant).
Pros:
Cons:
Conclusion
None of the ideas is perfect and I honestly don't know which is best. But I hope this might spark some discussion and hopefully we could come up with something that would be acceptable for everyone. So let the ideas flow please.
Beta Was this translation helpful? Give feedback.
All reactions