The Open Philology project works to create a novel digital environment for the study of Buddhist sūtra literature, with the goal of helping the researcher overcome the limitations of traditional paper editions.

Preliminary Research Results

For five years (2018–2022), a team in Leiden, whose members have included (in alphabetical order), Bai Yu (PhD student), Rafal Felbur (Post-doc), Gregory Forgues (Post-doc), Christopher Handy (programmer), Jiang Yixiu (PhD student), Antonello Palumbo (Post-doc), Jonathan Silk (PI), and Péter-Dániel Szántó (Post-doc), assisted also by Marieke Meelen (Cambridge), Paul Vierthaler (College of William & Mary) and Sasha Goldstein-Sabbah (Groningen), have worked toward two major goals.

The first has consisted in the development of:

  • An online editing environment for the production of scholarly editions
  • A method for the automated alignment of texts in Chinese and Tibetan
  • An interactive bibliographic database

The second has encompassed individual projects centered on texts found in the Mahāratnakūṭa (MRK) collection of 49 sūtras, preserved as a collection in the Chinese and Tibean Buddhist canons.

Editing environment. Automated alignment. Bibliographic database

a. Editing environment


The Open Philology Editing Environment (OPEn) web app is live! User instructions: open-philology-editing-environment-user-instructions.pdf

The production of critical editions is based on the production of a record of all rele-vant evidence and its presentation in a meaningful manner. A primary task is the col-lation of witnesses and the establishment of a text, usually understood to represent the closest possible reconstruction of some "original." In the case of translations of Buddhist texts into Chinese and Tibetan, this means that editors seek the closest possible reconstruction of what left the pen of the translator. However, a critical edi-tion also records significant and insignificant variants to this established text.
The editing environment built by the team (and developed by software developers X-Five) takes the output of a collation of multiple witnesses produced by CollateX and permits editors to select a main text, establish lemmas, determine if variants attested in witnesses are significant or insignificant, offer emendations, make multi-user collaborative comments, and output the result in a form suitable for the production of a critical edition, online or in print.

Use of this tool does not require knowledge of any mark-up code (thus, no TEI for instance). While the tool was developed primarily with Tibetan texts in mind, it is equally usable for texts in Chinese, and potentially in other languages as well. The tool is a web-based app, and the code has been published in open access on our GitHub page. The tool launched in July 2023. We invite fellow scholars to contact us to open user accounts.

b. Automated alignment of texts in Chinese and Tibetan

The team has developed a procedure for the automatic (i.e. computer-generated) alignment of highly-similar sequences of text in Chinese and Tibetan translations of Buddhist sūtra literature. Aside from the linguistic interest such a task entails, the development of a practical way to align translations will save researchers endless hours spent trying to locate parallel passages. It may also act as a cross-lingual "shared text detector," that is, allow the location of similar materials even though they are in a different language.

The procedure is described in:

Felbur, R., Meelen, M., & Vierthaler, P. (2022). Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan. Journal of Open Humanities Data, 8, 23. DOI: http://doi.org/10.5334/johd.86

In our procedure, our first step is to create a cross-lingual embedding space by tak-ing the cosine similarity of average sequence vectors for Chinese and Tibetan. This allows us then to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and paragraph level. Our procedure is independent of sentence punctuation and based entirely on semantic value. Our initial results, as reported in the article and illustrated with numerous concrete examples, show that our method lays a solid foundation for the future development of a fully-fledged Information Re-trieval tool for Chinese and Tibetan, as well as potentially for other low-resource his-torical languages.

The team welcomes interest from those who might want to work together to realize a practical instantiation of this design.

c. Bibliographic database

As long as 30 years ago the PI prepared a bibliography of the MRK collection as a text file. On this basis, Paul Vierthaler, in coordination with Rafal Felbur, developed an online bibliographic database in which all 49 texts of the collection are recorded, and all relevant information about sources, translations, studies and so on is pre-sented for each text. (An exception applies to the Larger Sukhāvatīvyūha and the Śrīmālādevisiṁhanāda, texts which are of extreme importance in East Asia and which consequently have a massive bibliography, which we could not encompass in full.)

The team hopes and expects that users will contribute to the expansion and correction of the database going forward.

Studies of MRK Texts and Related Topics

Each team member has engaged in one or more projects centered around the MRK collection, including producing text editions and studying the historical environment of the preparation of the collection in China.

  • Bai Yu (PhD student) is writing a thesis on Imperial Prefaces to Chinese Buddhist scriptures, centering on those composed during the Tang dynasty.
  • Rafal Felbur (Post-doc) is editing, in Chinese and Tibetan, the Sūrataparipṛcchā, number 27 of the MRK, and took the lead in the development of the approach to Tibetan–Chinese alignment, the development of the editing environment, and the elaboration of the bibliographic database.
  • Gregory Forgues (Post-doc) is editing the Acintyabuddhaviṣayanirdeśa, number 38 of the MRK, and further contributed to the fundamental ideas behind our editing environment.
  • Jiang Yixiu (PhD student) is editing the Svapnanirdeśa, MRK number 4.
  • Antonello Palumbo (Post-doc) is exploring historiographical issues around the composi-tion of Buddhist works and Buddhist–State relations in the Tang.
  • Jonathan Silk (PI) edited the Gaṅgottarāparipṛcchā, MRK 31, and is editing, in Tibetan, Chinese and Sanskrit, the Ratnarāśisūtra, MRK number 44, as well as the Kāśyapaparivarta, MRK 43. He further has prepared a number of studies (see below).
  • Péter-Dániel Szántó (Post-doc) is editing in Sanskrit and Tibetan the Tathāgatācintyaguhyanirdeśa, MRK number 3, as well as the Siṁhaparipṛcchā, MRK 37, and among other texts has prepared for publication an edition (Sanskrit and Tibet-an) of Nāgārjuna’s Suhṛllekha.

All editions include English translations as well as studies.

Our team has established an Open Access book series with Brill Academic Publishers, the Buddhist Open Philology Project series. Among volumes in preparation are two dedicated to “Ratnakūṭa Studies.” These will include an edition, in Chinese and Tibetan, of the Gaṅgottarāparipṛcchā, an edition (Chinese) of the Shi’er toutuo jing (Twelve dhutas sūtra), a Chinese composition based in part on the Ratnarāśi, a study of quotations of Mahāratnakūṭa scriptures in Indian anthologies (the Śikṣāsamuccaya, Sūtrasamuccaya and Mahāsūtrasamuccaya), the edition of the Siṁhaparipṛcchā, and other studies. These volumes will appear in 2023.