English | 简体中文
Tip
Before starting, you need to make sure your local Java environment is set up, otherwise you will not be able to generate from the grammar file. You can check it by running java --version
.
-
Install dependencies
pnpm install
-
Compile g4 Files
# Compile all g4 files pnpm antlr4 # Compile for a specific language pnpm antlr4 --lang mysql
-
Run Unit Tests
pnpm test
-
Run Benchmark Tests
pnpm benchmark
src/grammar
: Contains g4 files (grammar files)src/lib
: Generated files from g4 grammar (produced by runningpnpm antlr4
)src/parser
: Implementations of SQL Parser classessrc/parser/common
: Base classes and utility methods for SQL Parserstest
: Unit testsbenchmark
: Benchmark tests
-
Add New Grammar Files
Add the new g4 grammar file to
src/grammar/<SQL name>
. Name the file in PascalCase. The grammar rules within the file should adhere to the following:- The root rule should be named
program
. - Support parsing multiple SQL statements.
- Enable case-insensitive options (if the SQL language is case-insensitive).
- Lexical rules for all keywords should prefix with
KW_
(e.g.,KW_SELECT: 'SELECT';
). This aids in differentiating keyword lexical rules for autocomplete functionality.
- The root rule should be named
-
Generate Files from Grammar
Run the following command to generate files from the new grammar:
pnpm antlr4 --lang <SQL name>
Check that the corresponding Lexer, Parser, Listener, and Visitor files are generated in the
src/lib/<SQL name>/
directory. -
Implement SQL Parser Class
Create a file
src/parser/<SQL name>/index.ts
and implement the corresponding SQL Parser class. This class should extend from theBasicSQL
base class. Initially, implement thecreateLexerFromCharStream
andcreateParserFromTokenStream
methods; other methods can be left empty for now. -
Add Basic Unit Tests
Add basic unit tests in
test/parser/<SQL name>
for:- Lexer
- Visitor
- Listener
parser.validate
method
You can reference tests from other SQL parsers.
-
SQL Syntax Unit Tests
Add unit tests for SQL syntax in the
test/parser/<SQL name>/syntax
directory. Ensure coverage of all SQL syntax rules. It is recommended to add tests based on the official grammar documentation to ensure accuracy. -
Implement SQLSplitListener
Implement the
SQLSplitListener
and add thesplitListener
getter in the SQL Parser class. Also, add unit tests for theparser.splitSQLByStatement
method, which splits SQL into individual statements. -
Autocomplete Features
Implement methods
processCandidates
andpreferredRules
for autocomplete functionality. Familiarize yourself with antlr4-c3. Then, add autocomplete-related unit tests intest/parser/<SQL name>/suggestion
. -
Context Information Collection
Implement the
SQLEntityCollector
class and thecreateEntityCollector
method in the SQL Parser class for SQL context information collection. This enhances the autocomplete functionality. For more details, refer to here.Then, add tests for entity collection methods in
test/parser/<SQL name>/contextCollect
.
SQL grammar files can be quite complex. If you want to add a new SQL language to dt-sql-parser, it is not recommended to start from scratch. Consider the following sources, listed in order of preference:
-
Official SQL Repositories:
Some official SQL repositories use Antlr4 for SQL parsing. You can find the corresponding grammar files in their source code. For example:
Grammar files from official repositories are generally the most reliable, stable, and performant.
-
Grammar-v4 Repository:
This is the official grammar file repository maintained by Antlr. It includes a variety of SQL grammar files. The files here are typically reliable.
-
Community/Other Open Source Repositories:
Grammar files obtained from the community or other open source repositories may be less reliable and often require significant time to fix grammar issues.