NTI Buddhist Text Reader
A text reader, including dictionary and tools, for analyzing and managing Buddhist texts in Chinese and Sanskrit. This is a non-profit, open source project.
Goals:
-
Create a dictionary that is easy to use for everybody interested in Buddhism, including lay people reading Buddhist texts, students, translators, and academics. Importantly, the goal is to create useful tools rather than authoritative definitions of terms.
-
Create tools that are useful for lingustic analysis of Buddhist texts, including identification of specialist Buddhist terms and comparison of Chinese and Sanskrit texts.
-
Use the tools to analyze and annotate a number of texts and share the content with the general public.
There are three parts to the project:
-
The web user interface. This includes HTML, PHP, and JavaScript files.
-
The data. This is the dictionary and text files. The data files are in UTF-8 tab delimited text. There is also a corpus directory, which contains the literature to build the vocabulary and word sense frequency from. These are Chinese and Sanksrit texts from the Buddhist canon and related collections. The corpus files include part-of-speech (POS) tagged documents and untagged documents.
-
Command line tools. For building vocabulary. These are in Python. This includes a POS tagger and HTML annotation tool.
The license for the web site and dictionary content is Creative Commons Attribution-Share Alike 3.0. The license for source code and markup templates, is Apache 2.0.
Copyright Nan Tien Institute 2013, http://www.nantien.edu.au.