Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstring indentation is too aggressive when semantic indentation is enabled. #18

Open
dannyfreeman opened this issue Sep 8, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@dannyfreeman
Copy link
Contributor

See commit 177ac05

Bascically, as you type and hit treesit indent functionality tries to unindent everything back to 2 spaces, so it's hard to add nicer indentation inside docstrings.

(defn foo 
  "My docstring
  I like to be indented like this
    If I come out to hear, treesit indent stuff pulls me back
  to this position"
  ...)
@dannyfreeman dannyfreeman added the bug Something isn't working label Sep 8, 2023
@sogaiu
Copy link

sogaiu commented Sep 9, 2023

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;


Not sure if the following will be at all useful, but FWIW...

I've wondered about how to handle things in docstrings because sometimes there are snippets of code which inevitably one may feel would be nice to not have to manipulate as plain text.

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

@dannyfreeman
Copy link
Contributor Author

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;

A markdown grammar would probably be the most appropriate. Some lsp clients will parse docstrings as markdown. While markdown is not standard, it could at least be used for things like `code snippets`

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

I thought those things were exclusively for font-locking, but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

dannyfreeman added a commit that referenced this issue Sep 9, 2023
@sogaiu
Copy link

sogaiu commented Sep 9, 2023

I thought those things were exclusively for font-locking,

It's true that font-locking can be affected, however:

A syntax table specifies the syntactic role of each character in a buffer. It can be used to determine where words, symbols, and other syntactic constructs begin and end. This information is used by many Emacs facilities, including Font Lock mode (see Font Lock Mode) and the various complex movement commands (see Motion).

via: https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-Tables.html

So these lines (specifically the put-text-property call):

           ;; tweaking syntax class of some chars for better behavior
           (let ((pos (1+ bt1-start))
                 (stop (1- n-end)))
             (while (< pos stop)
               (when (not (memq (char-after pos)
                                ;; XXX: might need to tweak this further
                                (list ?\s ?\n ?\r ?\t ?\f ?\v
                                      ?\( ?\) ?\[ ?\] ?{ ?})))
                 (put-text-property pos (1+ pos)
                                    'syntax-table
                                    (string-to-syntax "w")))

fiddle with stuff within strings in such a way that a variety of other commands (including motion ones) behave differently than otherwise.

but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

Yeah, sorry for the lack of explanation -- it took me quite some time to wrap my head around this stuff (well, to the degree that I succeeded anyway).

Anyway, may be this approach isn't needed.

dannyfreeman added a commit that referenced this issue Sep 9, 2023
@dannyfreeman
Copy link
Contributor Author

Anyway, may be this approach isn't needed.

It may be. The first path I'm going to pursue is using a nested markdown grammar.
I have the initial workings here https://github.com/clojure-emacs/clojure-ts-mode/tree/bug/18/docstring-formatting

That solves another outstanding todo I had for font-locking in docstrings. I might actually push that to main because it is working quite nicely. I haven't touched indentation with it yet.

@sogaiu
Copy link

sogaiu commented Sep 9, 2023

I gave that a try:

clojure-ts-mode-with-markdown-grammar-for-docstrings

The auto-installation of the grammar seemed to work fine :)

dannyfreeman added a commit that referenced this issue Sep 9, 2023
@dannyfreeman
Copy link
Contributor Author

The auto-installation of the grammar seemed to work fine :)

The auto-installation stuff makes me nervous. I'm glad it's been working so far.

I've merged the nested markdown grammar to main. I'm leaving my bug branch for this commit open though. I still need to tackle indentation with it. Haven't worked with nested parser indentation yet.

@sogaiu
Copy link

sogaiu commented Sep 10, 2023

The auto-installation stuff makes me nervous.

Yeah, that along with depending on multiple grammars that might change...perhaps motivation to consider alternative approaches?

Regarding change in grammars, I wonder about the semver-ish idea mentioned here:

  1. point updates are grammar fixes that don't affect queries at all;
  2. minor updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;
  3. major updates remove/change grammar features (nodes) that break current queries.

I wonder how much of the change in tree-sitter-clojure could be described in terms of 1 and 2. My impression (quite possibly mistaken) is that most of the change was somewhat in the neighborhood of 3. May be it depends a lot on "what stage" a grammar's development is at.

IIUC, the author of the idea has extensive experience looking after the integration of a very large number of grammars in the context of an editor so it seems reasonable to suppose their observations informed the 3-point idea above.

@dannyfreeman
Copy link
Contributor Author

dannyfreeman commented Sep 10, 2023 via email

@sogaiu
Copy link

sogaiu commented Sep 10, 2023

There is some discussion in the mailing list now on how to handle this,
that's actually what prompted me to open the issue you linked below.

I think I read some of it [1] -- thanks for mentioning it.

Personally I enjoy having all the grammars pre-installed from my OS packages.

I like relying on my OS packages for things that don't change much, but for things that end up as dependencies of things I'm working on, I find using OS packages to sometimes be a hindrance. (That's when not using NixOS or Guix though.)

As tree-sitter itself hasn't reached a 1.0 status, as I understand it, there is some feeling among the maintainers that certain things can be changed. Since the grammars depend on tree-sitter in a way, I view them as more in flux.

So yeah, if I were not doing tree-sitter-related work, surely I'd prefer to not be fiddling with different versions of grammars, but I wonder whether at this time how stable of an experience one can expect. Perhaps this also depends on the language in question.

When I use Neovim, I notice rebuilding of grammars from time to time and I do occasionally run into issues where I need to intervene manually. I have not found the error messages to be enough on their own to figure out what to do, but often enough someone else has already asked so a relevant query can lead to a solution (^^; Not great for sure, but this may have something to do with my lack of experience and skill with things vim-ish...

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

I wonder if there are any issues / PRs at nvim-treesitter that demonstrate the feasibility of the 3-point idea...My imagination is not succeeding in seeing the idea working well.


[1] Thread subject -> "Re: Update on tree-sitter structure navigation"

@dannyfreeman
Copy link
Contributor Author

dannyfreeman commented Sep 11, 2023 via email

@sogaiu
Copy link

sogaiu commented Sep 11, 2023

Demonstrating the feasibility of the semantic versioning guidelines? I
think my imagination is at a loss too. What might we expect to see? A
grammar successfully releasing a 0.0.X patch update and not breaking
anything? Then a 0.X.0 minor version update, adding nodes, and
everything keeps working?

Something along those lines would be neat evidence, but I was thinking more along the lines of just seeing that changes in particular grammars that happened to fall into categories 1 and 2 (without the explicit version labelling).

I have been wondering how often 1 or 2 has happened (slightly edited):

  1. updates are grammar fixes that don't affect queries at all;
  2. updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

I suppose without some effort to arrange for those kinds of changes they are not likely to occur by chance. Was curious whether there was already some kind of "track record".

I have my doubts about any scheme like this for at least the following reasons:

  1. Even if one wanted to follow such a scheme, it seems like it would be extra work to do so and quite possibly difficult / complicated to get right

  2. I don't know what sensible incentive there would be for maintainers to put forth the extra effort

  3. Limited resources suggest this might not be a high priority

I found just getting the grammars I've worked on [1] to behave appropriately to be challenging enough...


[1] ...and these were probably some of the simplest grammars even.

@sogaiu
Copy link

sogaiu commented Sep 11, 2023

I think it's going to be pretty general advice when people report
issues, to say "make sure you can reproduce your issue with this version
of the grammar first".

Sure. What was tricky about Neovim was that I was seeing error messages in contexts where I wasn't expecting to see them at all. I would start Neovim and then suddenly see errors which looked vaguely like they would have something to do with tree-sitter but I wasn't sure (again, that may have a lot to do with my relative vim-ish ignorance).

I haven't hit anything like that with Emacs + tree-sitter yet, but then my ignorance seems to be much less here.

In fact, that is problem something to include in the repo's github issue template.

Sounds like a plan :)

I also have this new ticket open #19 to write some kind of interactive command to help diagnose grammar
issues.

Do you know if any other *-ts-modes are doing anything similar?

@dannyfreeman
Copy link
Contributor Author

dannyfreeman commented Sep 12, 2023 via email

@sogaiu
Copy link

sogaiu commented Sep 12, 2023

I have with other grammars (javascript for sure).
Most of the time indentation just doesn't work, or the entire buffer
does not get font-locked (because a query is now invalid).

Oh I meant that in Emacs it was easier to investigate and potentially fix things (as compared with Neovim) -- possibly because of the existence of edebug but also it seems much easier to look up the source for various functions.

Once I see an error message in Neovim, I might get an idea of what file and line to look at, but from that point, I'm in code I have no idea about with no practically economic and reliable way of observing execution like I can with edebug. Anyway, this is perhaps kind of tangential. Sorry about that.

but we will probably end up pushing a breaking change to the grammar at some point. When that happens and the distro grammars inevitably lag behind I think compatibility issues are likely to occur. I think the"clojure-ts-grammar-doctor" will come in handy for users.

Yeah, it might be a good way to collect some relevant info for reproducing an issue too? May be that's along the lines of what you already stated here:

If the doctor does find issues, perhaps a special buffer clojure-ts-doctor can be shown with details about the issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants