The Corpus

The corpus is a part of the project “Lithuanian particles and discourse structure from a synchronic and diachronic perspective”, funded by the Research Council of Lithuania (No. S-MIP-21-18). The corpus includes the works of fiction of the most outstanding Lithuanian writers who lived and created in the 19th century: V. Kudirka (1858–1899), V. Pietaris (1850–1902), M. Valančius (1801–1875), and Žemaitė (1845–1921). The corpus consists of 342,039 words. The corpus is divided into four sub-corpora, each dedicated to one of the authors. The compiled corpus is not annotated. It offers a wide range of future research avenues for linguists, literary scholars, and researchers in other related fields. Furthermore, it is of paramount importance for the cultural community as a whole, as it contributes to the fostering of the heritage of the Lithuanian language and literature. 

The corpus contains reprints of works by the authors mentioned above, which closely resemble the original texts. A textual analysis of various reprints was conducted, and editions of the works were compared based on manuscript material available in the Reading Rooms of Manuscripts and Rare Books of Vilnius University Library, digitized catalogs of Vilnius University Library, and the ePaveldas platform at https://www.epaveldas.lt/main. As the corpus was compiled for the study of particles in the Lithuanian language, the first step was to analyse the forms of particles in different editions.

The corpus can be accessed at https://midas.lt/public-app.html#/researches/private?name=10.18279%2FMIDAS.LT-LIT-19.243978&lang=lt

The inventory of the particles examined in the 19th century Lithuanian fiction corpus:

Interrogative particles: ar, argi, bau, be, begu, bene, gal, galgi, jaugi kasžin, kažin, kaži, kiba, rasi

Response particles: aha, je, kad, na, ne, no, nu, taip, teip

Demonstrative particles: tai, taigi, tasis, va, ot, vot, štai, šitai, šit, antai, aure

Emotive particles: o, nagi, ogi, ugi, vai, ei