We aim to create a digital toolkit for multi-lingual alignments across arbitrary texts, using the Chinese-Tibetan-Sanskrit witnesses of the Mahāratnakūṭa Collection as a proof of concept.
An alignment in our system consists of a string and location in one text that has some relationship to a string and location in another text. For the initial phase of the Open Philology project, this means in practice a set of relationships between Chinese and Tibetan texts, Chinese and Chinese texts, and Tibetan and Tibetan texts. Many supposed Sanskrit originals for these texts are available only in fragmentary form and are to be added later.
We consider two phrases to be aligned to some degree when they meet any of the following criteria:
We automatically produce and score alignments using a custom genetic algorithm that combines statistical analysis methods with traditional dictionaries and other philological resources.
Our complete alignment system comprises 3 layers:
Technical description of stack: Open source stack composed of Ubuntu Linux, Python, Django.
Details of the technical implementation of our software designs may be found on our software developer's GitHub page: github.com/handyc