Docstring indentation is too aggressive when semantic indentation is enabled. #18

dannyfreeman · 2023-09-08T21:15:29Z

See commit 177ac05

Bascically, as you type and hit treesit indent functionality tries to unindent everything back to 2 spaces, so it's hard to add nicer indentation inside docstrings.

(defn foo 
  "My docstring
  I like to be indented like this
    If I come out to hear, treesit indent stuff pulls me back
  to this position"
  ...)

The text was updated successfully, but these errors were encountered:

sogaiu · 2023-09-09T00:32:28Z

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;

Not sure if the following will be at all useful, but FWIW...

I've wondered about how to handle things in docstrings because sometimes there are snippets of code which inevitably one may feel would be nice to not have to manipulate as plain text.

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

dannyfreeman · 2023-09-09T02:19:18Z

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;

A markdown grammar would probably be the most appropriate. Some lsp clients will parse docstrings as markdown. While markdown is not standard, it could at least be used for things like `code snippets`

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

I thought those things were exclusively for font-locking, but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

sogaiu · 2023-09-09T04:05:24Z

I thought those things were exclusively for font-locking,

It's true that font-locking can be affected, however:

A syntax table specifies the syntactic role of each character in a buffer. It can be used to determine where words, symbols, and other syntactic constructs begin and end. This information is used by many Emacs facilities, including Font Lock mode (see Font Lock Mode) and the various complex movement commands (see Motion).

via: https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-Tables.html

So these lines (specifically the put-text-property call):

           ;; tweaking syntax class of some chars for better behavior
           (let ((pos (1+ bt1-start))
                 (stop (1- n-end)))
             (while (< pos stop)
               (when (not (memq (char-after pos)
                                ;; XXX: might need to tweak this further
                                (list ?\s ?\n ?\r ?\t ?\f ?\v
                                      ?\( ?\) ?\[ ?\] ?{ ?})))
                 (put-text-property pos (1+ pos)
                                    'syntax-table
                                    (string-to-syntax "w")))

fiddle with stuff within strings in such a way that a variety of other commands (including motion ones) behave differently than otherwise.

but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

Yeah, sorry for the lack of explanation -- it took me quite some time to wrap my head around this stuff (well, to the degree that I succeeded anyway).

Anyway, may be this approach isn't needed.

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

dannyfreeman · 2023-09-09T04:17:22Z

Anyway, may be this approach isn't needed.

It may be. The first path I'm going to pursue is using a nested markdown grammar.
I have the initial workings here https://github.com/clojure-emacs/clojure-ts-mode/tree/bug/18/docstring-formatting

That solves another outstanding todo I had for font-locking in docstrings. I might actually push that to main because it is working quite nicely. I haven't touched indentation with it yet.

sogaiu · 2023-09-09T04:48:21Z

I gave that a try:

The auto-installation of the grammar seemed to work fine :)

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

dannyfreeman · 2023-09-09T14:10:13Z

The auto-installation of the grammar seemed to work fine :)

The auto-installation stuff makes me nervous. I'm glad it's been working so far.

I've merged the nested markdown grammar to main. I'm leaving my bug branch for this commit open though. I still need to tackle indentation with it. Haven't worked with nested parser indentation yet.

sogaiu · 2023-09-10T00:11:36Z

The auto-installation stuff makes me nervous.

Yeah, that along with depending on multiple grammars that might change...perhaps motivation to consider alternative approaches?

Regarding change in grammars, I wonder about the semver-ish idea mentioned here:

point updates are grammar fixes that don't affect queries at all;

minor updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

major updates remove/change grammar features (nodes) that break current queries.

I wonder how much of the change in tree-sitter-clojure could be described in terms of 1 and 2. My impression (quite possibly mistaken) is that most of the change was somewhat in the neighborhood of 3. May be it depends a lot on "what stage" a grammar's development is at.

IIUC, the author of the idea has extensive experience looking after the integration of a very large number of grammars in the context of an editor so it seems reasonable to suppose their observations informed the 3-point idea above.

dannyfreeman · 2023-09-10T03:57:28Z

sogaiu ***@***.***> writes:

Yeah, that along with depending on multiple grammars that might change...perhaps motivation to consider alternative approaches?

There is some discussion in the mailing list now on how to handle this, that's actually what prompted me to open the issue you linked below. I'm not quite sure what the solution is other than to always install the exact grammar version we want in a unique location, and always load that. I'm not so sure people will like that though. Personally I enjoy having all the grammars pre-installed from my OS packages.

Regarding change in grammars, I wonder about the semver-ish idea mentioned [here](tree-sitter/tree-sitter#2611 (comment)): > 1. point updates are grammar fixes that don't affect queries at all; > 2. minor updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise; > 3. major updates remove/change grammar features (nodes) that break current queries. I wonder how much of the change in tree-sitter-clojure could be described in terms of 1 and 2. My impression (quite possibly mistaken) is that most of the change was somewhat in the neighborhood of 3. May be it depends a lot on "what stage" a grammar's development is at.

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

IIUC, the author has extensive experience looking after the integration of a very large number of grammars in the context of an editor so it seems reasonable to suppose their observations informed the idea stated above.

Of course, I would defer to someone with more experience with tree-sitter grammars in this regard. My own is fairly limited. I hope that issue results in something a little more concrete than "every grammar should use git tags with semvar".

…

-- Danny Freeman

sogaiu · 2023-09-10T11:53:32Z

There is some discussion in the mailing list now on how to handle this,
that's actually what prompted me to open the issue you linked below.

I think I read some of it [1] -- thanks for mentioning it.

Personally I enjoy having all the grammars pre-installed from my OS packages.

I like relying on my OS packages for things that don't change much, but for things that end up as dependencies of things I'm working on, I find using OS packages to sometimes be a hindrance. (That's when not using NixOS or Guix though.)

As tree-sitter itself hasn't reached a 1.0 status, as I understand it, there is some feeling among the maintainers that certain things can be changed. Since the grammars depend on tree-sitter in a way, I view them as more in flux.

So yeah, if I were not doing tree-sitter-related work, surely I'd prefer to not be fiddling with different versions of grammars, but I wonder whether at this time how stable of an experience one can expect. Perhaps this also depends on the language in question.

When I use Neovim, I notice rebuilding of grammars from time to time and I do occasionally run into issues where I need to intervene manually. I have not found the error messages to be enough on their own to figure out what to do, but often enough someone else has already asked so a relevant query can lead to a solution (^^; Not great for sure, but this may have something to do with my lack of experience and skill with things vim-ish...

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

I wonder if there are any issues / PRs at nvim-treesitter that demonstrate the feasibility of the 3-point idea...My imagination is not succeeding in seeing the idea working well.

[1] Thread subject -> "Re: Update on tree-sitter structure navigation"

dannyfreeman · 2023-09-11T13:04:52Z

sogaiu ***@***.***> writes:

As tree-sitter itself hasn't reached a [1.0 status](tree-sitter/tree-sitter#930), as I understand it, there is some feeling among the maintainers that certain things can be changed. Since the grammars depend on tree-sitter in a way, I view them as more in flux. So yeah, if I were not doing tree-sitter-related work, surely I'd prefer to not be fiddling with different versions of grammars, but I wonder whether at this time how stable of an experience one can expect. Perhaps this also depends on the language in question.

I don't think people in our positions, writing grammars and tooling based on them, can expect much stability. Users may not have the same expectations though. I think with Emacs at least there is a general understanding that the various *-ts-modes are still in an "experimental" stage.

When I use Neovim, I notice rebuilding of grammars from time to time and I do occasionally run into issues where I need to intervene manually. I have not found the error messages to be enough on their own to figure out what to do, but often enough someone else has already asked so a relevant query can lead to a solution (^^; Not great for sure, but this may have something to do with my lack of experience and skill with things vim-ish...

I think it's going to be pretty general advice when people report issues, to say "make sure you can reproduce your issue with this version of the grammar first". In fact, that is problem something to include in the repo's github issue template. I also have this new ticket open #19 to write some kind of interactive command to help diagnose grammar issues.

> In my experience it is very hard to make a significant change that falls into categories 1 and 2. > Even adding nodes can be breaking in their own way. They may not break queries (I think) but they > can break things like "node under point" and "parent of current node". That is kind of something > we saw with the transpose-sexp issue > [#17](#17) that was opened earlier. While > that was caused by the transpose function changes, I think it's easy to see how adding a new node > could cause the it to break. I wonder if there are any issues / PRs at nvim-treesitter that demonstrate the feasibility of the 3-point idea...My imagination is not succeeding in seeing the idea working well.

Demonstrating the feasibility of the semantic versioning guidelines? I think my imagination is at a loss too. What might we expect to see? A grammar successfully releasing a 0.0.X patch update and not breaking anything? Then a 0.X.0 minor version update, adding nodes, and everything keeps working? I'm afraid I'm at a loss as well.

…

-- Danny Freeman

sogaiu · 2023-09-11T14:03:44Z

Demonstrating the feasibility of the semantic versioning guidelines? I
think my imagination is at a loss too. What might we expect to see? A
grammar successfully releasing a 0.0.X patch update and not breaking
anything? Then a 0.X.0 minor version update, adding nodes, and
everything keeps working?

Something along those lines would be neat evidence, but I was thinking more along the lines of just seeing that changes in particular grammars that happened to fall into categories 1 and 2 (without the explicit version labelling).

I have been wondering how often 1 or 2 has happened (slightly edited):

updates are grammar fixes that don't affect queries at all;

updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

I suppose without some effort to arrange for those kinds of changes they are not likely to occur by chance. Was curious whether there was already some kind of "track record".

I have my doubts about any scheme like this for at least the following reasons:

Even if one wanted to follow such a scheme, it seems like it would be extra work to do so and quite possibly difficult / complicated to get right
I don't know what sensible incentive there would be for maintainers to put forth the extra effort
Limited resources suggest this might not be a high priority

I found just getting the grammars I've worked on [1] to behave appropriately to be challenging enough...

[1] ...and these were probably some of the simplest grammars even.

sogaiu · 2023-09-11T14:10:47Z

I think it's going to be pretty general advice when people report
issues, to say "make sure you can reproduce your issue with this version
of the grammar first".

Sure. What was tricky about Neovim was that I was seeing error messages in contexts where I wasn't expecting to see them at all. I would start Neovim and then suddenly see errors which looked vaguely like they would have something to do with tree-sitter but I wasn't sure (again, that may have a lot to do with my relative vim-ish ignorance).

I haven't hit anything like that with Emacs + tree-sitter yet, but then my ignorance seems to be much less here.

In fact, that is problem something to include in the repo's github issue template.

Sounds like a plan :)

I also have this new ticket open #19 to write some kind of interactive command to help diagnose grammar
issues.

Do you know if any other *-ts-modes are doing anything similar?

dannyfreeman · 2023-09-12T04:07:55Z

sogaiu ***@***.***> writes:

Sure. What was tricky about Neovim was that I was seeing error messages in contexts where I wasn't expecting to see them at all. I would start Neovim and then suddenly see errors which looked vaguely like they would have something to do with tree-sitter but I wasn't sure (again, that may have a lot to do with my relative vim-ish ignorance). I haven't hit anything like that with Emacs + tree-sitter yet, but then my ignorance seems to be much less here.

I have with other grammars (javascript for sure). Most of the time indentation just doesn't work, or the entire buffer does not get font-locked (because a query is now invalid).

> I also have this new ticket open [#19](#19) to write some kind of interactive command to help diagnose grammar issues. Do you know if any other *-ts-modes are doing anything similar?

I have not seen anything like it yet. I've also not seen any bug reports about grammar issues yet so it's not super high priority right now. As far as I know, any distros packaging tree-sitter-clojure currently use a compatible version with clojure-ts-mode. Great for now, but we will probably end up pushing a breaking change to the grammar at some point. When that happens and the distro grammars inevitably lag behind I think compatibility issues are likely to occur. I think the "clojure-ts-grammar-doctor" will come in handy for users.

…

-- Danny Freeman

sogaiu · 2023-09-12T07:21:14Z

I have with other grammars (javascript for sure).
Most of the time indentation just doesn't work, or the entire buffer
does not get font-locked (because a query is now invalid).

Oh I meant that in Emacs it was easier to investigate and potentially fix things (as compared with Neovim) -- possibly because of the existence of edebug but also it seems much easier to look up the source for various functions.

Once I see an error message in Neovim, I might get an idea of what file and line to look at, but from that point, I'm in code I have no idea about with no practically economic and reliable way of observing execution like I can with edebug. Anyway, this is perhaps kind of tangential. Sorry about that.

but we will probably end up pushing a breaking change to the grammar at some point. When that happens and the distro grammars inevitably lag behind I think compatibility issues are likely to occur. I think the"clojure-ts-grammar-doctor" will come in handy for users.

Yeah, it might be a good way to collect some relevant info for reproducing an issue too? May be that's along the lines of what you already stated here:

If the doctor does find issues, perhaps a special buffer clojure-ts-doctor can be shown with details about the issues found.

dannyfreeman added the bug Something isn't working label Sep 8, 2023

dannyfreeman added a commit that referenced this issue Sep 9, 2023

Use nested markdown grammar for highlighting docstrings

29433a8

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

dannyfreeman added a commit that referenced this issue Sep 9, 2023

Use nested markdown grammar for highlighting docstrings

f0eca67

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

dannyfreeman added a commit that referenced this issue Sep 9, 2023

Use nested markdown grammar for highlighting docstrings

9af0a6b

This uses the "inline" version of the markdown grammar. https://github.com/MDeiml/tree-sitter-markdown see issue #18

sogaiu mentioned this issue Jan 30, 2024

Smarter function documentation indentation. sogaiu/janet-ts-mode#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docstring indentation is too aggressive when semantic indentation is enabled. #18

Docstring indentation is too aggressive when semantic indentation is enabled. #18

dannyfreeman commented Sep 8, 2023

sogaiu commented Sep 9, 2023

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 9, 2023 •

edited

Loading

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 9, 2023

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 10, 2023 •

edited

Loading

dannyfreeman commented Sep 10, 2023 via email

sogaiu commented Sep 10, 2023

dannyfreeman commented Sep 11, 2023 via email

sogaiu commented Sep 11, 2023 •

edited

Loading

sogaiu commented Sep 11, 2023

dannyfreeman commented Sep 12, 2023 via email

sogaiu commented Sep 12, 2023 •

edited

Loading

Docstring indentation is too aggressive when semantic indentation is enabled. #18

Docstring indentation is too aggressive when semantic indentation is enabled. #18

Comments

dannyfreeman commented Sep 8, 2023

sogaiu commented Sep 9, 2023

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 9, 2023 • edited Loading

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 9, 2023

dannyfreeman commented Sep 9, 2023

sogaiu commented Sep 10, 2023 • edited Loading

dannyfreeman commented Sep 10, 2023 via email

sogaiu commented Sep 10, 2023

dannyfreeman commented Sep 11, 2023 via email

sogaiu commented Sep 11, 2023 • edited Loading

sogaiu commented Sep 11, 2023

dannyfreeman commented Sep 12, 2023 via email

sogaiu commented Sep 12, 2023 • edited Loading

sogaiu commented Sep 9, 2023 •

edited

Loading

sogaiu commented Sep 10, 2023 •

edited

Loading

sogaiu commented Sep 11, 2023 •

edited

Loading

sogaiu commented Sep 12, 2023 •

edited

Loading