Tokenizing CSS is non-trivial and some packages might choose to deviate from the specification. This library is not intended to rank tokenizers.
I can not stress enough that this is comparing apples to oranges. Different tokenizers are build for different purposes. Some do not track source offsets, others do not expose parsed/unescaped values.
It is intended to make it easier to find and resolve issues when that is desirable.
It is also a corpus of CSS that can be used to build a comprehensive test suite for your tokenizer.
import { testCorpus } from '@rmenke/css-tokenizer-tests';
// A specific test case.
const testCase = testCorpus['tests/at-keyword/0001'];
// The CSS source for a test case.
const cssSource = testCase.css;
// The reference tokens for the test case.
const tokens = testCase.tokens;
// Iterate all test cases.
for (const aTestCaseName in testCorpus) {
const aTestCase = testCorpus[aTestCaseName];
}
This test corpus strictly follows the CSS specification. The token type names are taken directly from the specification.
type
is the token type nameraw
is the literal representation of the token in the CSS source.startIndex
andendIndex
are the index of the first and last character in the CSS source.structured
contains extracted data. (numeric values fornumber-token
, unescapedident
names, ...)
{
"type": "at-keyword-token",
"raw": "@foo",
"startIndex": 0,
"endIndex": 4,
"structured": {
"value": "foo"
}
}
The CSS specification does not require tokenizers to expose this exact interface or the values therein. This is intended as data to verify that a tokenizer works as expected, nothing more.
You choose which bits you want to compare and how. This is also why this package is not a test framework.