Website engineering: Rewrite CommandlineHighlight plugin #6429

clemyan · 2024-08-01T16:02:36Z

WORK-IN-PROGRESS

Honestly I didn't envision this change become so big. There is a quite some more work to be done before this PR is ready so it may take some time. I'm opening this now to gather feedback and start the discussion around the open questions I'm encountering.

If you want to review the code, note that I am most likely going to rewrite some of these commits before the PR is marked as ready.

What's the problem this PR addresses?

In the website build chain, we have a custom CommandLineHighlight remark plugin that supercharges yarn calls with syntax highlighting, documentation links, and tooltips using command metadata generated directly from our TS sources.

The plugin parses the command as given in the MDX source, resolve the yarn command being invoked, do some processing on the command metadata, but then it mostly just dumps the data as a React component in JSX (something like this). Then, that component needs more runtime client-side logic to determine what is what so as to correctly render various parts of the command.

IMO this implementation has some maintainablility issues:

It is just unnecessarily complex and not particularly grokable. The data processing is split into two passes, so making sense of the data flow from command metadata to JSX props to rendered HTML takes a huge cognitive load.
It is not flexible. Because part of processing happens in the React component, it places quite a big limitation as to what can be rendered (without modifying the component itself).
Even for stuff it can render, it is difficult to author the JSX to do so by hand because that requires understanding how the React component uses the intermediate data structure given to it as JSX props.

These combined makes this system very difficult to extend and/or work with. In fact, this whole rewrite started as I was trying to make it correctly parse and highlight some commands used in the website like yarn node --inspect $(yarn bin jest) and wasn't able to make it work under the original implementation.

How did you fix it?

Rewrote the whole remark plugin. Under this new plugin, the React components are "dumb" -- they have minimal logic and does little more than styling. The vast majority of the data processing happens at build-time in the remark plugin.

The main "visible" change is that many more things are recognized and processed as commandlines out-of-the-box, without authoring the MDX in special ways. On the other hand, if something isn't recognized and processed as intended, there are many escape hatches to force the plugin to process it certain ways. And, if everything else fails, it is much easier than before to write JSX to get the desired rendering because the React components are "dumb".

Commandline Recognition

The old plugin recognizes the following as commandline:

Code blocks without a specified language, of which every line is either a comment or looks like a yarn command
Inline code that looks like a yarn command

The new plugin is able to process much more stuff, so it needs a better heuristic to reduce false positive and false negatives

For code blocks, it checks whether commandline is the specified language or appears in the metastring. This makes code block highlighting fully opt-in.
For inline code, it checks whether it starts with something that looks like an invocation of one of the known commands. In addition to yarn, there are a few commonly-used commands on the website (e.g. node) that will be recognized.

And there are a few escape hatches for this:

Because code block highlighting is opt-in, there are no false positives and negatives
False positives in inline code can be negated by using <code> elements instead of markdown inline code. The plugin only processes the latter
False negatives in inline code can be forced using a comment immediately before the inline code like this (Still on the fence on the syntax)
```
The {/* commandline */}`grep` command normally isn't recognized
```

More Commandline Forms and Syntaxes

Most commandline syntaxes are now supported, including
- Environment variables YARN_IGNORE_PATH=1 node --prof packages/yarnpkg-cli/bundles/yarn.js
- Pipes yarn exec env | grep COREPACK_ROOT
- Subshells yarn node --inspect $(yarn bin jest).
In many places, commands are written with placeholders like yarn add <package>. Taken literally, these are syntax errors because angled brackets are parsed as redirection. The new plugin treats words surrounded by angled brackets as a simple argument and can correctly style them.
Once a code block or inline code is detected as needing highlight, all commands in it are styled, not just the known commands. Of course there isn't rich metadata to process for those so there are just some very simple heuristics
- For unknown commands, the first arg is the binary, args that start with a dash is an option, everything else is positional. Not perfect but at least those command are styled now.
- For known commands, there are hardcoded lists of a few of their command paths. These are used in addition to the heuristics for unknown commands.

Bare options

A number of places needs styling for "bare options". Take the following paragraph as an example

By default, yarn patch will always reset the patch. If you wish to add new changes, use the --update flag and follow the same procedure as before - your patch will be regenerated.

Here, the "--update" refers to a flag of the yarn patch command. Ideally it should have a tooltip describing it, just as it would have in `yarn patch --update`. In the old plugin that can be done by writing `yarn patch ! --update` in the MDX, but the new plugin uses a simple heuristic to avoid the need to do so: When a flag is mentioned, the command for it is almost always also mentioned nearby. So when the new plugin sees a bare option, it will check if it is a flag for the command definition that was used most recently while processing that file. If it isn't, it will try again with the next definition used.

This resolves the majority of cases, and most of the rest can be fixed by rewording the documentation to include the command.

Open Questions

What should be the syntax to force inline code to be recognized? Currently I use {/* commandline */}`grep` but there are a few alternative I'm considering like `grep # commandline` or using markdown directives :commandline[grep]
It is all well and good to use escape hatches in most MDX files under docs and blog because these are written specifically for the website. Showing the escape hatches used (e.g. JSX) when viewed elsewhere like on GitHub does not matter because the only canonical way to view them is on the website. But there are a few markdown/MDX files where it isn't clear that's the case:
- The changelog already have some Docusaurus and commandline highlight specific syntax (e.g. admonitions, `yarn npm audit ! --no-deprecations`) so I guess that is also intended to be viewed only through the website?
- The Error Codes docs are used by yarn explain so we can't use escape hatches? But yarn explain already does not do so well with it (e.g. try yarn explain YN0002) so I guess it's fine? 🤷
Also, the majority of the cases where the dropping down to JSX is necessary are for distinguishing "implicit yarn run" (e.g. yarn build) vs a command that does not exist (e.g. yarn global, yarn policies set-version). Is there a way we can reduce the need to do that? E.g. a list of known script names and everything else is by default a false command? Or a list of known false commands? Etc.
Lastly, sometimes just saying "writing JSX" isn't a good solution. In particular, commands and options for Yarn commands need the command metadata to provide the tooltips and writing them by hand risk them drifting out of sync. This is especially important for bare options. Of course it generally is good if bare options are accompanied by mentioning the command so the reader can understand better, but what if that isn't the case? Should there be an escape hatch to tell the plugin to inject the metadata? What should that look like?

Checklist

I have read the Contributing Guide.

I have set the packages that need to be released for my changes to be effective.

I will check that all automated PR checks pass before the PR gets reviewed.

Option forms like `-vv`, `-o,--long`, and `--no-bool` are auto-parsable now

Some places use `yarn` to refer to the package itself, and not the command

clemyan added 9 commits July 30, 2024 23:47

feat(docs/clh): Reimplement command line highlight remark plugin

2b9407a

feat(docs/clh): Migrate docs to new CLH

83692f7

feat(docs/clh): Define yarn global as unknown command explicitly

5ef3b11

feat(docs/clh): Add comment activation

609dc10

feat(docs/clh): Add bare option auto-resolution

ab2f220

feat(docs/clh): Add more auto-parsable option forms

4d721d3

Option forms like `-vv`, `-o,--long`, and `--no-bool` are auto-parsable now

feat(docs/clh): Make bare yarn not auto-detectable as inline command

bf9e462

Some places use `yarn` to refer to the package itself, and not the command

feat(docs/clh): Resolve "implicit cwd" to underlying command

bbbfc96

feat(docs/clh): Add raw JSX escape hatch to commandline blocks

3ad6127

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Website engineering: Rewrite CommandlineHighlight plugin #6429

Website engineering: Rewrite CommandlineHighlight plugin #6429

clemyan commented Aug 1, 2024 •

edited

Loading

Website engineering: Rewrite CommandlineHighlight plugin #6429

Are you sure you want to change the base?

Website engineering: Rewrite CommandlineHighlight plugin #6429

Conversation

clemyan commented Aug 1, 2024 • edited Loading

WORK-IN-PROGRESS

What's the problem this PR addresses?

How did you fix it?

Commandline Recognition

More Commandline Forms and Syntaxes

Bare options

Open Questions

Checklist

clemyan commented Aug 1, 2024 •

edited

Loading