Belarusian N-corpus

Name
  • Belarusian N-corpus
Creator
  • Koshchanka, Uladzimir
  • Bulojchyk, Alies
Description
  • The Belarusian N-corpus is a corpus of texts in modern Belarusian with structural and grammatical marking and certification. The corpus consists of the following subcorpora: 1) Basic corpus (14.8 thousand texts, 43.4 million word usages); 2) Concordance of Belarusian of the 19th century (515 texts, 278 thousand word usages); 3) Belarusian Wikipedia corpus (287 thousand texts, 126 million word usages); 4) Translations corpus (1.22 thousand texts, 6.91 million word usages); 5) Unprocessed texts corpus (68.7 thousand texts, 892 million word usages); 6) Biblic corpus (16 Bible translations into Belarusian and other languages (Latin, Jewish, Ukrainian, Polish) for comparison). In total, the corpus comprises 372 thousand texts and 1.07 billion word usages.The basic corpus contains texts of 5 different styles: artistic, scientific, publicistic, official, religious. Within styles, texts are classified into genres; for example, the artistic style is divided into the following genres: narrative, short novel, ballad, fable, verse, fairy tale, ode, poem, play, novel, plot. As in most Slavic corpora, the Belarusian N-corpus encodes morphological (grammatical) information: initial word forms and grammatical characteristics. The Lexical and Grammatical Base is used for grammatical marking of the corpus. The base is a collection of words with morphological and other tags. The paradigm header provides the identification number of the paradigm, the initial form, and the grammatical feature of the token. If necessary, additional information is recorded: government (for verbs), meaning, remarks. Each declensional form has its own characteristics. The source of the word or word form, stress, spelling and non-canonical forms are also indicated. To date, the Lexical and Grammatical Base has approximately 304 thousand paradigms and more than 4,4 million word forms.
Collection
  • Belarusian N-corpus
Language
Resource type
  • Lexical resource
Data provider
  • Other

More like this...

The following records may also interest you:
Belarusian grammatical dictionary of the numeral, 2013
The list was created in the Speech synthesis and recognition laboratory of the Belarusian Academy of Sciences on the basis of "The grammatical dictionary of the adjective, pronoun,…
Belarusian grammatical dictionary of the verb, 2013
The register of the dictionary is based on the verb vocabulary of modern Belarusian, taking into account the rules that came into force in 2008. The dictionary articles contain inf…
The dictionary of the Belarusian language, 1987
The dictionary is an academic publication, which most fully reflects the vocabulary of modern Belarusian of the twentieth century. The dictionary serves as a handbook on orthograph…
Grammatical Dictionary Processor
The service allows the user to receive previously loaded and converted to the required format lexicographic data of the grammar dictionary in the form of an HTML table, and to rece…
Morphological lexicon Sloleks 1.0
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains appro…