-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize fortran identifiers like gfortran #1957
base: master
Are you sure you want to change the base?
Conversation
this actually looks OK Could you resolve conflicts @gridaphobe ? tia |
Sorry I should have chimed in sooner. As @gridaphobe observed, this will be quite different with the modern Lucene @tarzanek, please can you hold off merging this for now? I have a sequence of serial PRs that right now is waiting for |
hehe, OK, so let's hold on with this PR I will try to have a look at LOC PR and talk to Vlad or Krystof to expedite it |
Should I go ahead and resolve the conflicts now, or do we expect the other PRs to cause their own conflicts? :) |
The other PRs will change the approach that you would use for this. |
Are the other PRs you mentioned merged yet? |
PR #2052 |
@gridaphobe fwiw we were able to merge the PRs that influence this change ... |
Thanks! I noticed, I'll try to update my branch this week. |
I've updated my branch. It passes the unit tests, but I'm having trouble running the indexer now, I get the following lucene error when I run
Any ideas? |
yes, please remove old index, current master is on updated lucene, so you need to index from scratch ... |
The test case failure is innocuous (#2030). |
The stale index failure should be better captured and appropriate message given to the user - might be worth filing a new issue for this. |
Thanks! That was indeed the issue. It looks like the context rendering isn't quite working anymore, it prints the correct lines but does not bold the matched token. I remember having to do some fiddling with the PlainLineTokenizer (Analyzer?) to get that to work originally. Did that part of the code change significantly? |
Nevermind, I sorted it out. Everything appears to be working now from my local tests. One thing that we might want to discuss is that fortran defs/refs must currently be searched for in their normalized form. For example, if I have a fortran function SUBROUTINE FOO
...
END I would have to search for I think it would be nice if the queries |
Hi, @gridaphobe , Thank you for your continued efforts in this area. OpenGrok on Lucene of course allows multiple tokens per offset in order to support synonyms. I have a completed patch from earlier this year for Objective-C support (as well as patches to support code-like synonyms for enhanced non-whitespace searching) that augments the symbol processing in The enhanced
Likely you will be able to hook your normalization routines into these APIs so that the same literal code location (i.e., I'm also hoping you can revise your approach so as not have to modify anything in (The caveat with this approach w.r.t. to Fortran will be that (Also, with the latest changes in |
hmm, getting objective C support would be cool, let's see if we can get it soon |
Aha! I had actually made the same change locally but wasn't sure how to deal with the Lucene side of things, hence my question :) I'll take a look at your branch.
Yes, that definitely would be nicer, let me see if I can work that out.
I see, I have not tested it in that mode (not since the rebase at least) and it seems like a corner case, so I'm happy to remove the support. |
@idodeclare I've rebased my branch against your There's one failing test, which seems to have something to do with reindexing, but it was present in your branch as well. |
Glad to hear! I'll add a few comments directly to that revision 1bfa324. Edit: oh no I guess the comments didn't take. Github seemed to accept them, but now I don't see them on that commit above. I guess it's only PRs that take reviews reliably. |
@idodeclare ack, sorry this fell off my radar. I actually addressed your comments a couple weeks ago, but it turns out I forgot to push them to GitHub :) How do you want to proceed with this? Merge your objective-c branch and then I can open a new PR with my rebased branch? |
@gridaphobe , the objective-c branch will be under review presently, after which you can rebase and PR. |
Fortran is a case-insensitive language, so it's important to have a
consistent normalization scheme in order to recognize cross-language
function calls, e.g. C-to-Fortran.
gfortran lowercases all Fortran identifiers and adds a trailing
underscore, so this is the convention I adopted.
One issue I have found along the way was that the results page for a
"symbol/refs" query stopped printing the matching lines. I was able to
trace this issue to the
Context.getContext
method, which for somereason seems to re-lex the matching file for tokens. It uses a generic
lexer
PlainLineTokenizer
that isn't aware of my normalization scheme,and so the symbols that the lexer extracts don't match the user's
query...
I did a quick and dirty fix to conditionally add the normalization logic
to the
PlainLineTokenizer
, but the whole thing seems a bitstrange. Why are we re-lexing the file instead of storing the relevant
data (filepath, line, etc) in the index, like we do for definitions?