Search results

Uralic UD v2.10, Kielipankki Korp version

The resource is available via Korp in Kielipankki – the Language Bank of Finland. The corpus content has been annotated…

The resource is available via Korp in Kielipankki – the Language Bank of Finland. The corpus content has been annotated according to the Universal Dependencies version 2.10 (http://hdl.handle.net/11234/1-4758) for the following Uralic languages: Erzya, Estonian, Finnish, Hungarian, Karelian, Komi-Permyak, Komi-Zyrian,…

Erzya Estonian Finnish Hungarian Karelian … (+6)

VCR

JRC-Acquis Multilingual Parallel Corpus

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This …

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 o…

Bulgarian Czech Danish German Modern Greek … (+17)

VCR

CSLU: 22 Languages Corpus

(Part of The LDC Corpus Catalog)

*Introduction* CSLU: 22 Languages v 1.2 was developed by the Center for Spoken Language Understanding (CSLU) and contai…

*Introduction* CSLU: 22 Languages v 1.2 was developed by the Center for Spoken Language Understanding (CSLU) and contains approximately 84 hours of fixed vocabulary and fluent continuous telephone speech in 21 languages and orthographic transcriptions for a subset of the utterances. The corpus is distributed by the L…

Yue Chinese Vietnamese Tamil Swedish Russian … (+16)

VCR

HeLI-OTS 1.1

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.0

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 2.0

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-17…

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-1700 sentences (averaging c. 150 characters) per second from a file using one core and around 4,3 gigabytes of memory on a modern laptop. # Requirements Java The software has been created and tested o…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.5

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

Europarl Parallel Corpus

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 Eu…

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 European languages: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavik (Bulgarian, Czech, Polish, Slovak, Slovene), Finni-Ugric (Finnish…

Finnish Danish German Modern Greek English … (+16)

VCR

Opus, Helsinki Korp Version

The Helsinki Korp version of the Opus open parallel corpus (http://opus.lingfil.uu.se/), containing scrambled sentences,…

The Helsinki Korp version of the Opus open parallel corpus (http://opus.lingfil.uu.se/), containing scrambled sentences, has been published in Korp, http://urn.fi/urn:nbn:fi:lb-2016012101 The subcorpora of Opus, Helsinki Korp Version are: OPUS Finnish–Czech OPUS Finnish–Danish OPUS Finnish–Dutch OPUS Finnish–Engli…

Finnish Modern Greek English Swedish German … (+11)

VCR

Hunspell

Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is …

Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like Mac OS X, InDesign, memoQ, Opera and SDL Trados. Main features: Extended support for language peculiarities; Unicode character encoding, compound…

Hungarian

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Multilingual

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

Uralic UD v2.10, Kielipankki Korp version

JRC-Acquis Multilingual Parallel Corpus

CSLU: 22 Languages Corpus

HeLI-OTS 1.1

HeLI-OTS 1.0

HeLI-OTS 2.0

HeLI-OTS 1.5

Europarl Parallel Corpus

Opus, Helsinki Korp Version

Hunspell