Preliminary Research Results

For five years (2018–2022), a team in Leiden, whose members have included (in alphabetical order), Bai Yu (PhD student), Rafal Felbur (Post-doc), Gregory Forgues (Post-doc), Christopher Handy (programmer), Jiang Yixiu (PhD student), Antonello Palumbo (Post-doc), Jonathan Silk (PI), and Péter-Dániel Szántó (Post-doc), assisted also by Marieke Meelen (Cambridge), Paul Vierthaler (College of William & Mary) and Sasha Goldstein-Sabbah (Groningen), have worked toward two major goals.

The first has consisted in the development of:

The second has encompassed individual projects centered on texts found in the Mahāratnakūṭa (MRK) collection of 49 sūtras, preserved as a collection in the Chinese and Tibean Buddhist canons.

Editing environment. Automated alignment. Bibliographic database

a. Editing environment

The production of critical editions is based on the production of a record of all rele-vant evidence and its presentation in a meaningful manner. A primary task is the col-lation of witnesses and the establishment of a text, usually understood to represent the closest possible reconstruction of some "original." In the case of translations of Buddhist texts into Chinese and Tibetan, this means that editors seek the closest possible reconstruction of what left the pen of the translator. However, a critical edi-tion also records significant and insignificant variants to this established text.
The editing environment built by the team (and developed by software developers X-Five) takes the output of a collation of multiple witnesses produced by CollateX and permits editors to select a main text, establish lemmas, determine if variants attested in witnesses are significant or insignificant, offer emendations, make multi-user collaborative comments, and output the result in a form suitable for the production of a critical edition, online or in print.
Use of this tool does not require knowledge of any mark-up code (thus, no TEI for instance). While the tool was developed primarily with Tibetan texts in mind, it is equally usable for texts in Chinese, and potentially in other languages as well. The tool is a web-based app, and the code will be published in open access. We expect to launch the tool by early 2023. Upon its launch, we invite fellow scholars to contact us to open user accounts.

b. Automated alignment of texts in Chinese and Tibetan

The team has developed a procedure for the automatic (i.e. computer-generated) alignment of highly-similar sequences of text in Chinese and Tibetan translations of Buddhist sūtra literature. Aside from the linguistic interest such a task entails, the development of a practical way to align translations will save researchers endless hours spent trying to locate parallel passages. It may also act as a cross-lingual "shared text detector," that is, allow the location of similar materials even though they are in a different language.

The procedure is described in:

Felbur, R., Meelen, M., & Vierthaler, P. (2022). Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan. Journal of Open Humanities Data, 8, 23. DOI:

In our procedure, our first step is to create a cross-lingual embedding space by tak-ing the cosine similarity of average sequence vectors for Chinese and Tibetan. This allows us then to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and paragraph level. Our procedure is independent of sentence punctuation and based entirely on semantic value. Our initial results, as reported in the article and illustrated with numerous concrete examples, show that our method lays a solid foundation for the future development of a fully-fledged Information Re-trieval tool for Chinese and Tibetan, as well as potentially for other low-resource his-torical languages.

The team welcomes interest from those who might want to work together to realize a practical instantiation of this design.

c. Bibliographic database

As long as 30 years ago the PI prepared a bibliography of the MRK collection as a text file. On this basis, Paul Vierthaler, in coordination with Rafal Felbur, developed an online bibliographic database in which all 49 texts of the collection are recorded, and all relevant information about sources, translations, studies and so on is pre-sented for each text. (An exception applies to the Larger Sukhāvatīvyūha and the Śrīmālādevisiṁhanāda, texts which are of extreme importance in East Asia and which consequently have a massive bibliography, which we could not encompass in full.)

The team hopes and expects that users will contribute to the expansion and correction of the database going forward.

Studies of MRK Texts and Related Topics

Each team member has engaged in one or more projects centered around the MRK collection, including producing text editions and studying the historical environment of the preparation of the collection in China.

All editions include English translations as well as studies.

Our team has established an Open Access book series with Brill Academic Publishers, the Buddhist Open Philology Project series. Among volumes in preparation are two dedicated to “Ratnakūṭa Studies.” These will include an edition, in Chinese and Tibetan, of the Gaṅgottarāparipṛcchā, an edition (Chinese) of the Shi’er toutuo jing (Twelve dhutas sūtra), a Chinese composition based in part on the Ratnarāśi, a study of quotations of Mahāratnakūṭa scriptures in Indian anthologies (the Śikṣāsamuccaya, Sūtrasamuccaya and Mahāsūtrasamuccaya), the edition of the Siṁhaparipṛcchā, and other studies. These volumes will appear in 2023.