🪦 Archived: this document is not maintained. This document was made jointly with
micromark
, which was later also turned intomarkdown-rs
. At present, I don’t have the bandwidth to maintain 2 reference parsers and a spec.Markdown 💛 JSX
This document is currently in progress. See also micromark, cmsm, and mdxjs.
- 1 Background
- 2 Overview
- 3 MDX
- 4 Parsing
- 5 State machine
- 5.1 Before MDX block state
- 5.2 Before MDX span state
- 5.3 After MDX block state
- 5.4 After MDX span state
- 5.5 Data state
- 5.6 Before name state
- 5.7 Before closing tag name state
- 5.8 Primary name state
- 5.9 After primary name state
- 5.10 Before member name state
- 5.11 Member name state
- 5.12 After member name state
- 5.13 Before local name state
- 5.14 Local name state
- 5.15 After local name state
- 5.16 Before attribute state
- 5.17 Attribute expression state
- 5.18 Attribute name state
- 5.19 After attribute name state
- 5.20 Before attribute local name state
- 5.21 Attribute local name state
- 5.22 After attribute local name state
- 5.23 Before attribute value state
- 5.24 Attribute value double quoted state
- 5.25 Attribute value single quoted state
- 5.26 Attribute value expression state
- 5.27 Self-closing state
- 5.28 Expression state
- 5.29 Text state
- 5.30 Accent quoted open state
- 5.31 Accent quoted state
- 5.32 Accent quoted close state
- 5.33 Tilde quoted open state
- 5.34 Tilde quoted state
- 5.35 Tilde quoted close state
- 6 Adapter
- 6.1 Enter
'tag'
adapter - 6.2 Enter
'closingSlash'
adapter - 6.3 Enter
'attributeExpression'
adapter - 6.4 Enter
'attributeName'
adapter - 6.5 Enter
'selfClosingSlash'
adapter - 6.6 Exit
'closingSlash'
adapter - 6.7 Exit
'primaryName'
adapter - 6.8 Exit
'memberName'
adapter - 6.9 Exit
'localName'
adapter - 6.10 Exit
'name'
adapter - 6.11 Exit
'attributeName'
adapter - 6.12 Exit
'attributeLocalName'
adapter - 6.13 Exit
'attributeValue'
adapter - 6.14 Exit
'attributeValueExpression'
adapter - 6.15 Exit
'attributeExpression'
adapter - 6.16 Exit
'selfClosingSlash'
adapter - 6.17 Exit
'tag'
adapter - 6.18 Exit
'expression'
adapter
- 6.1 Enter
- 7 Appendix
- 8 References
- 9 Acknowledgments
- 10 License
MDX is the combination of Markdown with JSX. This document defines a syntax for MDX (without JavaScript, MDXjs does that) by describing how to parse it.
The idea of combining Markdown, JavaScript, and JSX was a collaborative effort by Guillermo Rauch (@rauchg), James K. Nelson (@jamesknelson), John Otander (@johno), Tim Neutkens (@timneutkens), Brent Jackson (@jxnblk), Jessica Stokes (@ticky), and more. Markdown was created by John Gruber (@gruber). CommonMark by John McFarlane et al. (@jgm) is a popular variant. JSX was created by Sebastian Markbåge et al. (@sebmarkbage) at Facebook, Inc.
Markdown does not have a syntax for custom components. MDX solves this.
There are many languages objectively better than Markdown, however, Markdown is great because:
- It looks like what it means and is relatively easy to read
- Although images are confusing, most stuff is relatively simple to write
- It’s loose and ambiguous: it may not work but you won’t get an error (great for someone posting a comment to a forum if they forgot an asterisk)
Markdown does have a way to extend it, HTML, but that has drawbacks:
- HTML in Markdown is naïve, how it’s parsed sometimes doesn’t make sense
- HTML is unsafe by default, so it’s sometimes (partially) unsupported
- HTML and Markdown don’t mix well, resulting in confusing rules such as
blank lines or
markdown="1"
attributes - HTML is coupled with browsers, Markdown is useful for other things too
The frontend world has an alternative to HTML: JSX. JSX is great, amongst other things, because:
- It has a relatively familiar syntax (like XML)
- It’s agnostic to semantics and intended for compilers (can have any domain-specific meaning)
- It’s strict and unambiguous (great if an editor forgot a slash somewhere, as they’ll get an error early, instead of a book going to print with broken stuff in it)
This document first talks about the MDX syntax for authors, in the following section. Further sections define the syntax in-depth and for developers. The appendix includes sections on notable differences from Markdown and JSX, and a list of common MDX gotchas.
This section explains MDX for authors.
The smallest MDX example looks like this:
# Hello, world!
It displays a heading saying “Hello, world!” on the page. With MDX you can add components:
<MyComponent># Hello, world!</MyComponent>
MDX syntax can be boiled down to being JSX in Markdown. It’s a superset of Markdown syntax that supports JSX.
Traditionally, Markdown is used to generate HTML. Many developers like writing markup in Markdown as it often looks more like what’s intended and it is typically terser. Instead of the following HTML:
<blockquote>
<p>A block quote with <em>some</em> emphasis.</p>
</blockquote>
You can write the equivalent in Markdown (or MDX) like so:
> A block quote with _some_ emphasis.
Markdown is good for content. MDX supports most standard Markdown syntax. It’s important to understand Markdown in order to learn MDX.
Recently, more and more developers have started using JSX to generate HTML markup. JSX is typically combined with a frontend framework like React or Vue. These frameworks add support for components, which let you change repeating things like the following markup:
<h2>Hello, Venus!</h2>
<h2>Hello, Mars!</h2>
…to JSX (or MDX) like this:
<Welcome name="Venus" />
<Welcome name="Mars" />
JSX is good for components. It makes repeating things more clear and allows for separation of concerns. MDX supports most standard JSX syntax.
MDX is the combination of Markdown and JSX, for example, like so:
<MyComponent>> Block quote</MyComponent>
<MyCodeComponent>
```html
<!doctype html>
```
</MyCodeComponent>
<MyOtherComponent>
# Heading<Footnote id="1" />
- List
- Items
</MyOtherComponent>
<Image
alt='Photo of Lilo sitting in a tiny box'
src='lilo.png'
/>
<also-component {attribute expression} />
<math value={attribute value expression} />
{
block expression
}
The sum of `1 + 1` as calculated by an inline expression is {1 + 1}.
The syntax of MDX within Markdown is formally defined by how to parse in § 4 Parsing and in further sections, relatively formally in § 7.1 Syntax), and informally by example here.
As MDX is not tied to HTML or JavaScript, the following examples do not show output examples in HTML, but instead show whether they are okay, or whether they crash.
For ease of reading, block elements will be capitalized, whereas span elements will be lowercase, in the following examples. But, casing does not affect parsing.
A block of MDX is an element or expression that is both the first thing on its opening line, and the last thing on its closing line.
A self-closing block tag:
<Component />
The start and end can be on different lines:
<Component
/>
An arbitrary number of lines can be between the start and end:
<Component
/>
This also applies to elements with opening and closing tags:
<Component>
</Component>
Expressions can also be blocks:
{
}
Parent containers of components don’t count when figuring out if something is the first or last thing, such as in a block quote, a list, or in another block component:
> <Component />
- <Component />
<Parent>
<Child />
</Parent>
A span of MDX is an element or expression that is not a block: it’s either not the first thing, or the last thing, or both:
This span is preceded by other things: <component />
<component /> This span is followed by other things.
These rules also apply to expressions ({ such as this one }).
An MDX block element can contain further Markdown blocks, whereas an MDX span element can contain further Markdown spans.
On a single line:
<Component>> Block quote</Component>
With generous whitespace:
<Component>
> Block quote
</Component>
With indentation:
<Component>
> Block quote
</Component>
Spans cannot contain blocks:
<component>> this is not a block quote</component>, because it’s not in a block
element.
Nor is this a <component># heading</component>
Blocks will create paragraphs:
<Component>**Strongly important paragraph in a component**.</Component>
This <component>**is strongly important text in a component**</component> in a
paragraph.
Which gets a bit confusing if you are expected HTML semantics (to MDX, elements
don’t have semantics, so h2
has no special meaning):
<h2>And this is a paragraph in a heading!</h2>
MDX expressions can contain arbitrary data, with the exception that there must
be a matching number opening braces (U+007B LEFT CURLY BRACE ({
)) and closing braces (U+007D RIGHT CURLY BRACE (}
)):
{
This is a fine expression: no opening or closing braces
}
So is this: {{{}}}.
And this, an expression with extra closing braces after it: {}}}.
This example is incorrect, as there are not enough closing braces:
{{{}.
MDX elements and expressions must be closed, and what closes them must be in an expected place:
This example is incorrect, an unclosed tag:
<Component>
This example is incorrect, because the “closing” tag is in fenced code.
<Component>
```js
</Component>
```
This example is incorrect, because the “closing” tag is outside of the block quote:
> <Component>
</Component>
This example is incorrect, because the “closing” tag is not in the paragraph:
A span component <component>
</component>
This example is incorrect, because the “closing” tag is in a different paragraph:
<component>This is one paragraph, with an inline opening tag.
This is another paragraph, with an inline closing tag</component>.
The same rules apply to expressions:
{This is all fine…
…but because there is a dot after the closing brace, it’s not a block, which
results in two paragraphs, which means that the first paragraph has an unclosed
expression}.
MDX elements can have three types of attributes.
Attribute expressions:
<Component {attribute expression} />
Boolean attributes:
<Component boolean another />
Or initialized attributes, with a value.
<Component key="value" other="more" />
Attribute values can also use single quotes:
<Component quotes='single quotes: also known as apostrophes' />
Finally, attribute value expressions can be used with braces:
<Component data={attribute value expression} />
Element names are optional, which is a feature called “fragments”:
<>Fragment block with a paragraph</>
A <>fragment span</> in a paragraph.
The syntax of the name of an element follows the syntax of variables in JavaScript, and dashes are also allowed (but not at the start):
This is fine: <π />.
Also fine: <ab /> (there’s a zero-width non-joiner in there).
Dashes are <c-d /> fine too!
Names can be prefixed with a namespace using a colon:
<svg:rect />
Similar to namespaces, dots can be used to access methods from objects:
<org.acme.example />
(Namespaces and methods cannot be combined).
Similar to names, keys of attributes also follow the same syntax as JavaScript variables, and dashes are also allowed:
This is all fine: <x π ab c-d />.
And namespaces can also be used:
This is all fine: <z xml:lang="de" />.
(Methods don’t work for keys).
Whitespace is mostly optional, except between two identifiers (such as the name and a key, or between two keys):
This is fine: <x/>.
Also fine: <x{attribute expression}/>.
Fine too: <v w=""x=''y z/>.
Most places accept whitespace:
A bit much, but sure: < w / >.
< x >Go ahead< / x >
< z do your = 'thing' >
The states of the MDX state machine have certain effects, such as that they create tokens in the stack and consume characters. The purpose of the state machine is to tokenize. The stack is used by adapters.
The MDX adapter handles tokens, which has further effects, such as validating whether they are conforming and figuring out when parsing is done. The purpose of the adapter is to handle the results of the tokenizer.
To parse MDX is to feed the input character to the state of the state machine, and when not settled, repeat this step.
If parsing crashed with a label the content is nonsensical and the document cannot be processed. Without label, no MDX was found.
How MDX, whether it’s found or not, is handled is intentionally undefined and left up to the host parser. When to feed an EOF is similarly undefined.
Host parsers must not support indented code and autlinks, as those conflict with MDX.
A character is a Unicode code point and is represented as a four to six digit
hexadecimal number, prefixed with U+
([UNICODE]).
Whitespace is any character defined as WhiteSpace
([JavaScript]).
Identifier start is any character defined as
IdentifierStart
, with the restriction that unicode
escape sequences do not apply ([JavaScript]).
Identifier is any character defined as
IdentifierPart
, with the restriction that unicode escape
sequences do not apply ([JavaScript]).
An EOF character is a conceptual character (as in, not real character) representing the lack of any further characters in the input.
The input stream consists of the characters pushed into it.
The input character is the first character in the input stream that has not been consumed. Initially, the input character is the first character in the input. Finally, when all character are consumed, the input character is an EOF.
The stack is a list of tokens that are open, initially empty. The current token is the last token in the stack.
The value of a token are all characters in the input stream from where the token was entered (including) to where it exited (excluding).
The element stack is a list of elements that are open, initially empty. The current element is the last element in the element stack.
Settled is used to signal when parsing is done, whether it was a success or not, and is initially off. Crashed is used to signal when parsing is unsuccessful, and is initially off.
The state is the way a character is handled.
A variable is declared with let
, cleared with unset
, or changed with
set
(to set a value), increment
(to add a numeric value), decrement
(to
subtract a numeric value), append
(to add a string value), push
(to add a
value to a list), or pop
(to remove a value from the end of a list).
Which values are used are left to the host programming language, but this definition requires compatibility with [JSON] for primitives (strings, numbers, booleans, and null) and structured types (objects and arrays).
The shared space is an object.
size
, sizeOpen
, currentAttribute
, and currentTag
are variables in the
shared space.
These variables are available globally to all states and adapters.
Other variables are available locally to a state or adapter and not shared.
To dedent is to remove up to X initial U+0009 CHARACTER TABULATION (HT) or U+0020 SPACE (SP) characters from each non-initial line in the given value, where X is the minimum number of U+0009 CHARACTER TABULATION (HT) or U+0020 SPACE (SP) characters of all non-initial lines that contain other characters.
To decode is to parse character references as defined in “Character reference state” of § 12.2 Parsing HTML documents ([HTML]).
The MDX state machine and MDX adapter have certain common effects.
To switch to a state is to wait for a character in the given state.
To consume the input character is to move on from it to the next character in the input stream.
To enter a token is to push a new token of the given type to the stack, making it the current token.
To exit is to pop the current token from the stack.
Done is used to mark parsing as settled.
Crash is used to mark parsing as settled and crashed. When crashing with a given label, crashing causes a parse error.
The MDX state machine is used to tokenize MDX blocks and MDX spans. Blocks (also known as flow) make up the structure of the document (such as headings), whereas spans (also known as text or inline) make up the intra-paragraph parts of the flow (such as emphasis).
The initial state varies based on whether flow or text is parsed, and is respectively either Before MDX block state or Before MDX span state.
The final state is switched to by the MDX adapter, which right before completion will switch to either After MDX block state or After MDX span state.
-
↪ U+0009 CHARACTER TABULATION (HT)
↪ U+0020 SPACE (SP) -
↪ Anything else
-
↪ U+003C LESS THAN (
<
)
↪ U+007B LEFT CURLY BRACE ({
)Switch to Data state
-
↪ Anything else
-
↪ U+0009 CHARACTER TABULATION (HT)
↪ U+0020 SPACE (SP) -
↪ EOF
↪ U+000A LINE FEED (LF)
↪ U+000D CARRIAGE RETURN (CR) -
↪ Anything else
-
↪ U+003C LESS THAN (
<
)Switch to Before name state, enter
'tag'
, and consume -
↪ U+007B LEFT CURLY BRACE (
{
)Switch to Expression state, enter
'expression'
, letsize
be1
, and consume -
↪ Anything else
Switch to Text state and enter
'text'
-
↪ U+002F SLASH (
/
)Switch to Before closing tag name state, enter
'closingSlash'
, consume, and exit -
↪ U+003E GREATER THAN (
>
) -
Switch to Primary name state, enter
'name'
, enter'primaryName'
, and consume -
↪ Anything else
Crash
'before name'
-
↪ U+003E GREATER THAN (
>
) -
Switch to Primary name state, enter
'name'
, enter'primaryName'
, and consume -
↪ Anything else
Crash
'before name'
-
↪ U+002D DASH (
-
)
↪ Identifier -
↪ U+002E DOT (
.
)
↪ U+002F SLASH (/
)
↪ U+003A COLON (:
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ WhitespaceSwitch to After primary name state and exit
-
↪ Anything else
Crash
'in name'
-
↪ U+002E DOT (
.
) -
↪ U+003A COLON (
:
)Switch to Before local name state and consume
-
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Identifier startSwitch to Before attribute state and exit
-
↪ Anything else
Crash
'after name'
-
Switch to Member name state, enter
'memberName'
, and consume -
↪ Anything else
Crash
'before member name'
-
↪ U+002D DASH (
-
)
↪ Identifier -
↪ U+002E DOT (
.
)
↪ U+002F SLASH (/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ WhitespaceSwitch to After member name state and exit
-
↪ Anything else
Crash
'in member name'
-
↪ U+002E DOT (
.
) -
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Identifier startSwitch to Before attribute state and exit
-
↪ Anything else
Crash
'after member name'
-
Switch to Local name state, enter
'localName'
, and consume -
↪ Anything else
Crash
'before local name'
-
↪ U+002D DASH (
-
)
↪ Identifier -
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ WhitespaceSwitch to After local name state, exit, and exit
-
↪ Anything else
Crash
'in local name'
-
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Identifier start -
↪ Anything else
Crash
'after local name'
-
↪ U+002F SLASH (
/
)Switch to Self-closing state, enter
'selfClosingSlash'
, consume, and exit -
↪ U+003E GREATER THAN (
>
)Switch to Data state, consume, and exit
-
↪ U+007B LEFT CURLY BRACE (
{
)Switch to Attribute expression state, enter
'attributeExpression'
, letsize
be1
, and consume -
Switch to Attribute name state, enter
'attributeName'
, and consume -
↪ Anything else
Crash
'before attribute name'
-
↪ EOF
Crash
'in attribute expression'
-
↪ U+007B LEFT CURLY BRACE (
{
)Increment
size
by1
and consume -
↪ U+007D RIGHT CURLY BRACE (
}
)If
size
is:-
↪
1
Switch to Before attribute state, unset
size
, consume, and exit -
↪ Anything else
Decrement
size
by1
and consume
-
-
↪ Anything else
-
↪ U+002D DASH (
-
)
↪ Identifier start -
↪ U+002F SLASH (
/
)
↪ U+003A COLON (:
)
↪ U+003D EQUALS TO (=
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ WhitespaceSwitch to After attribute name state and exit
-
↪ Anything else
Crash
'in attribute name'
-
↪ U+003A COLON (
:
) -
↪ U+003D EQUALS TO (
=
) -
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Identifier start -
↪ Anything else
Crash
'after attribute name'
-
Switch to Attribute local name state, enter
'attributeLocalName'
, and consume -
↪ Anything else
Crash
'before local attribute name'
-
↪ U+002D DASH (
-
)
↪ Identifier start -
↪ U+002F SLASH (
/
)
↪ U+003D EQUALS TO (=
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Whitespace -
↪ Anything else
Crash
'in local attribute name'
-
↪ U+003D EQUALS TO (
=
) -
↪ U+002F SLASH (
/
)
↪ U+003E GREATER THAN (>
)
↪ U+007B LEFT CURLY BRACE ({
)
↪ Identifier start -
↪ Anything else
Crash
'after local attribute name'
-
↪ U+0022 QUOTATION MARK (
"
)Switch to Attribute value double quoted state, enter
'attributeValue'
, and consume -
↪ U+0027 APOSTROPHE (
'
)Switch to Attribute value single quoted state, enter
'attributeValue'
, and consume -
↪ U+007B LEFT CURLY BRACE (
{
)Switch to Attribute value expression state, enter
'attributeValueExpression'
, letsize
be1
, and consume -
↪ Anything else
Crash
'before attribute value'
-
↪ EOF
Crash
'in attribute value'
-
↪ U+0022 QUOTATION MARK (
"
)Switch to Before attribute state, consume, and exit
-
↪ Anything else
-
↪ EOF
Crash
'in attribute value'
-
↪ U+0027 APOSTROPHE (
'
)Switch to Before attribute state, consume, and exit
-
↪ Anything else
-
↪ EOF
Crash
'in attribute value expression'
-
↪ U+007B LEFT CURLY BRACE (
{
)Increment
size
by1
and consume -
↪ U+007D RIGHT CURLY BRACE (
}
)If
size
is:-
↪
1
Switch to Before attribute state, unset
size
, consume, and exit -
↪ Anything else
Decrement
size
by1
and consume
-
-
↪ Anything else
-
↪ U+003E GREATER THAN (
>
)Switch to Data state, consume, and exit
-
↪ Anything else
Crash
'after self-closing slash'
-
↪ EOF
Crash
'in attribute value expression'
-
↪ U+007B LEFT CURLY BRACE (
{
)Increment
size
by1
and consume -
↪ U+007D RIGHT CURLY BRACE (
}
)If
size
is:-
↪
1
Switch to Data state, unset
size
, consume, and exit -
↪ Anything else
Decrement
size
by1
and consume
-
-
↪ Anything else
-
↪ EOF
Crash
'in element'
-
↪ U+003C LESS THAN (
<
)
↪ U+007B LEFT CURLY BRACE ({
)Switch to Data state and exit
-
↪ U+0060 GRAVE ACCENT (
`
)Switch to Accent quoted open state, let
sizeOpen
be1
, and consume -
↪ U+007E TILDE (
~
)Switch to Tilde quoted open state, let
sizeOpen
be1
, and consume -
↪ Anything else
-
↪ EOF
Crash
'in code'
-
↪ U+0060 GRAVE ACCENT (
`
)Increment
sizeOpen
by1
and consume -
↪ Anything else
Switch to Accent quoted state and consume
-
↪ EOF
Crash
'in code'
-
↪ U+0060 GRAVE ACCENT (
`
)Switch to Accent quoted close state, let
size
be1
, and consume -
↪ Anything else
-
↪ U+0060 GRAVE ACCENT (
`
)Increment
sizeOpen
by1
and consume -
↪ Anything else
If
size
is:-
↪
sizeOpen
Switch to Text state, unset
sizeOpen
, and unsetsize
-
↪ Anything else
Switch to Accent quoted state and unset
size
-
-
↪ EOF
Crash
'in code'
-
↪ U+007E TILDE (
~
)Increment
sizeOpen
by1
and consume -
↪ Anything else
Switch to Tilde quoted state and consume
-
↪ EOF
Crash
'in code'
-
↪ U+007E TILDE (
~
)Switch to Tilde quoted close state, let
size
be1
, and consume -
↪ Anything else
-
↪ U+007E TILDE (
~
)Increment
sizeOpen
by1
and consume -
↪ Anything else
If
size
is:-
↪
sizeOpen
Switch to Text state, unset
sizeOpen
, and unsetsize
-
↪ Anything else
Switch to Tilde quoted state and unset
size
-
The MDX adapter handles tokens from the MDX state machine, which has further effects, such as validating whether they are conforming and figuring out when parsing is done.
Adapters are defined to handle a token either when a token enters right before it’s pushed to the stack, or when a token exits right after it’s popped off the stack.
The adapters does not define how to construct a syntax tree, but does provide the essentials for that. Constructing syntax trees, whether abstract or concrete, is intentionally undefined.
- Let
currentTag
be a new object - Let
name
ofcurrentTag
benull
- Let
close
ofcurrentTag
befalse
- Let
selfClosing
ofcurrentTag
befalse
If there is no current element, crash 'before name'
(note: a
closing tag with no open elements)
If close
of currentTag
is true
, crash 'on closing tag after name'
(note: a closing tag with an attribute)
If close
of currentTag
is true
, crash 'on closing tag after name'
(note: a closing tag with an attribute)
If close
of currentTag
is true
, crash 'on closing tag before tag end'
(note: a self-closing closing tag)
Let close
of currentTag
be true
Let name
of currentTag
be the value of current token
Append U+002E DOT (.
) and the value of current token to name
of currentTag
Append U+003A COLON (:
) and the value of current token to name
of currentTag
If close
of currentTag
is true
and name
of currentTag
is not the same
as name
of current element, crash 'on closing tag after name'
(note: mismatched tags)
- Let
currentAttribute
be a new object - Let
name
ofcurrentAttribute
be the value of current token - Let
value
ofcurrentAttribute
benull
Append U+003A COLON (:
) and the value of current token to name
of
currentAttribute
Let value
of currentAttribute
be the decoded value, excluding its
first and last characters, of current token
Let value
of currentAttribute
be the dedented value, excluding its
first and last characters, of current token
- Let
currentAttribute
be a new object - Let
type
ofcurrentAttribute
be'mdxAttributeExpression'
- Let
value
ofcurrentAttribute
be the dedented value, excluding its first and last characters, of current token
Let selfClosing
of currentTag
be true
Note: if there is no current element, the input character is the start of the element’s content. If
close
ofcurrentTag
istrue
, and there is a single value in the element stack, the first character of the token is the end of the element’s content. The content should be parsed further by the host parser to find nested MDX constructs.
- If
close
ofcurrentTag
istrue
, pop the current element from the element stack - Otherwise, if
selfClosing
ofcurrentTag
isfalse
, pushcurrentTag
to the element stack
Finally, if there is no current element, switch to either After MDX block state or After MDX span state, based on whether flow or text is parsed.
Note: if there is no current element, the first character after the start of the token is the start of the expression’s content, and the last character before the end of the token is the end of the expression’s content. The content could be parsed by the host parser.
If there is no current element, switch to either After MDX block state or After MDX span state, based on whether flow or text is parsed.
The syntax of MDX is described in W3C Backus–Naur form with the following additions:
A - B
— matches any string that matchesA
but does not matchB
.'string'
— same as"string"
but with single quotes.BREAK
— lookahead match for a block break opportunity (either EOF, U+000A LINE FEED (LF), or U+000D CARRIAGE RETURN (CR))
The syntax of MDX is defined as follows, however, do note that interleaving (mixing) of Markdown and MDX is defined elsewhere.
; Entries
mdxBlock ::= *spaceOrTab (element | expression) *spaceOrTab BREAK
mdxSpan ::= element | expression
element ::= selfClosing | closed
selfClosing ::=
; constraint: tag MUST be named, MUST NOT be closing, and MUST be self-closing
tag
closed ::=
; constraint: tag MUST NOT be closing and MUST NOT be self-closing
tag
*data
; constraint: tag MUST be closing, MUST NOT be self-closing, MUST not have
; attributes, and either both tags MUST have the same name or both tags MUST
; be nameless
tag
data ::= expression | element | tickQuoted | tildeQuoted | text
tag ::=
'<' *1closing
*1(*whitespace name *1attributesAfterIdentifier *1closing)
*whitespace '>'
attributesAfterIdentifier ::=
1*whitespace (attributesBoolean | attributesValue) |
*whitespace attributesExpression |
attributesAfterValue ::=
*whitespace (attributesBoolean | attributesExpression | attributesValue)
attributesBoolean ::= key *1attributesAfterIdentifier
attributesExpression ::= expression *1attributesAfterValue
attributesValue ::= key initializer *1attributesAfterValue
closing ::= *whitespace '/'
name ::= identifier *1(local | members)
key ::= identifier *1local
local ::= *whitespace ':' *whitespace identifier
members ::= member *member
member ::= *whitespace '.' *whitespace identifier
identifier ::= identifierStart *identifierPart
initializer ::= *whitespace '=' *whitespace value
value ::= doubleQuoted | singleQuoted | expression
expression ::= '{' *(expressionText | expression) '}'
tickQuoted ::=
tickFence
; constraint: nested fence MUST NOT be the same size as the opening fence
*(tickText | tickFence)
; constraint: closing fence MUST be the same size as the opening fence
tickFence
tildeQuoted ::=
tildeFence
; constraint: nested fence MUST NOT be the same size as the opening fence
*(tildeText | tildeFence)
; constraint: closing fence MUST be the same size as the opening fence
tildeFence
tickFence ::= 1*'`'
tildeFence ::= 1*'~'
doubleQuoted ::= '"' *doubleQuotedText '"'
singleQuoted ::= "'" *singleQuotedText "'"
spaceOrTab ::= " " | "\t"
text ::= character - '<' - '{' - '`' - '~'
whitespace ::= esWhitespace
doubleQuotedText ::= character - '"'
singleQuotedText ::= character - "'"
tickText ::= character - '`'
tildeText ::= character - '~'
expressionText ::= character - '{' - '}'
identifierStart ::= esIdentifierStart
identifierPart ::= esIdentifierPart | '-'
; Unicode
; Any unicode code point
character ::=
; ECMAScript
; See “IdentifierStart”: <https://tc39.es/ecma262/#prod-IdentifierStart>
esIdentifierStart ::=
; See “IdentifierPart”: <https://tc39.es/ecma262/#prod-IdentifierPart>
esIdentifierPart ::=
; See “Whitespace”: <https://tc39.es/ecma262/#prod-WhiteSpace>
esWhitespace ::=
MDX adds constructs to Markdown but also prohibits certain normal Markdown constructs.
Whether block or inline, HTML in Markdown is not supported.
Character data, processing instructions, declarations, and comments are not supported at all. Instead of HTML elements, use JSX elements.
Incorrect:
# Hello, <span style=color:red>world</span>!
<!--To do: add message-->
<img>
Correct:
# Hello, <span style='color:red'>world</span>!
<img />
Indentation to create code blocks is not supported. Instead, use fenced code blocks.
The reason for this change is so that elements can be indented.
Incorrect:
console.log(1)
Correct:
```js
console.log(1)
```
Autolinks are not supported. Instead, use links or references.
The reason for this change is because whether something is an element (whether
HTML or JSX) or an autolink is ambiguous (Markdown normally treats <svg:rect>
,
<xml:lang/>
, or <svg:circle{...props}>
as links).
Incorrect:
See <https://example.com> for more information
Correct:
See [example.com](https://example.com) for more information.
Whereas all Markdown is valid, incorrect MDX will crash.
MDX removes certain constructs from JSX, because JSX is typically mixed with JavaScript whereas MDX is usable without it.
JavaScript comments in JSX are not supported.
Incorrect:
<hi/*comment!*//>
<hello// comment!
/>
Correct:
<hi/>
<hello
/>
JSX elements or JSX fragments as attribute values are not supported.
The reason for this change is that it would be confusing whether Markdown would work.
Incorrect:
<welcome name=<>Venus</> />
<welcome name=<span>Pluto</span> />
Correct:
<welcome name='Mars' />
JSX does not allow U+003E GREATER THAN (>
) or U+007D RIGHT CURLY BRACE (}
) literally in text, they need to be encoded as
character references.
There is no good reason for this (some JSX parsers agree with us and don’t crash
either).
In Markdown, U+003E GREATER THAN (>
) is used to start a block quote.
Therefore, in MDX, U+003E GREATER THAN (>
) and U+007D RIGHT CURLY BRACE (}
) are fine literally and don’t need to be encoded.
JSX allows valid JavaScript inside expressions.
We support anything in braces.
Because JSX parses JavaScript, it knows when it sees a U+007D RIGHT CURLY BRACE (}
) whether it means the
end of the expression, or if there is more JavaScript after it.
As we don’t parse JavaScript, but do want to allow further braces in
expressions, we count opening braces (U+007B LEFT CURLY BRACE ({
)) and expect just as many closing
braces (U+007D RIGHT CURLY BRACE (}
)) in expressions.
Incorrect:
<punctuation
data={{
'{': false // Left curly brace
}}
/>
Correct:
<punctuation
data={{
'{': false, // Left curly brace
'}': false // Right curly brace
}}
/>
Markdown first looks for blocks (such as a heading) and only later looks for spans (such as emphasis) in those blocks.
This becomes a problem typically in the two cases listed below. However, as MDX has parse errors, parsing will crash, and an error will be presented.
Incorrect:
The plot for the movie was, <span>wait for it…
…that she didn’t die!</span>
Correct:
The plot for the movie was, <span>wait for it…
…that she didn’t die!</span>
Incorrect:
Here’s a cute photo of my cat: <Image
alt='Photo of Lilo sitting in a tiny box'
src='lilo.png'
/
>
Correct:
Here’s a cute photo of my cat: <Image alt='Photo of Lilo sitting in a tiny box' src='lilo.png' />
Or as a block (U+003E GREATER THAN (>
) is fine in JSX blocks):
Here’s a cute photo of my cat:
<Image
alt='Photo of Lilo sitting in a tiny box'
src='lilo.png'
/
>
- [Markdown]: CommonMark. J. MacFarlane, et al.
- [HTML]: HTML standard. A. van Kesteren, et al. WHATWG.
- [JavaScript]: ECMAScript language specification. Ecma International.
- [JSON]: The JavaScript Object Notation (JSON) Data Interchange Format. T. Bray. IETF.
- [UNICODE]: The Unicode standard. Unicode Consortium.
Thanks to Gatsby, Inc. for funding the work to define MDX further.
Copyright © 2020 Titus Wormer. This work is licensed under a Creative Commons Attribution 4.0 International License.