Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
These levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2014052…
The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2014052711. A 34-volume collection of Finnic oral poetry, lyric, short rhymes, incantations etc., collected and recorded from the 16th century to the 1930s and published mostly between 1908 and 1948, with a…
This resource is available for download in Kielipankki – the Language Bank of Finland. This resource consists of .txt a…
This resource is available for download in Kielipankki – the Language Bank of Finland. This resource consists of .txt and .wav files in four languages pertaining to the Finnish Christmas Gospel verses Luke 2. 1–20 The four languages include Komi-Zyrian (kpv), Erzya (myv), Karelian (krl) and Olonets-Karelian (olo, aka …
This resource is available via Korp in Kielipankki – the Language Bank of Finland. This resource consists of .txt and .…
This resource is available via Korp in Kielipankki – the Language Bank of Finland. This resource consists of .txt and .wav files in four languages pertaining to the Finnish Christmas Gospel verses Luke 2. 1–20 The four languages include Komi-Zyrian (kpv), Erzya (myv), Karelian (krl) and Olonets-Karelian (olo, aka Livv…
The resource is available via Korp in Kielipankki – the Language Bank of Finland. The corpus content has been annotated…
The resource is available via Korp in Kielipankki – the Language Bank of Finland. The corpus content has been annotated according to the Universal Dependencies version 2.10 (http://hdl.handle.net/11234/1-4758) for the following Uralic languages: Erzya, Estonian, Finnish, Hungarian, Karelian, Komi-Permyak, Komi-Zyrian,…
The Korp version of Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 2…
The Korp version of Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora i…
Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundat…
HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-17…
HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-1700 sentences (averaging c. 150 characters) per second from a file using one core and around 4,3 gigabytes of memory on a modern laptop. # Requirements Java The software has been created and tested o…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…