Search results

JRC-Acquis Multilingual Parallel Corpus

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This …

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 o…

Bulgarian Czech Danish German Modern Greek … (+17)

VCR

Parallel corpora finely aligned (subsentencial granularity)

(Part of PORTULAN CLARIN)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: …

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Portuguese Spanish; Cas.. German English French … (+3)

Landing page for this record

VCR

HeLI-OTS 1.0

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 2.0

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-17…

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-1700 sentences (averaging c. 150 characters) per second from a file using one core and around 4,3 gigabytes of memory on a modern laptop. # Requirements Java The software has been created and tested o…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

Europarl Parallel Corpus

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 Eu…

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 European languages: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavik (Bulgarian, Czech, Polish, Slovak, Slovene), Finni-Ugric (Finnish…

Finnish Danish German Modern Greek English … (+16)

VCR

HeLI-OTS 1.4

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.1

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

eSpeak

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. …

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synt…

Afrikaans Albanian Armenian Catalan; Val.. Croatian … (+37)

VCR

HeLI-OTS 1.5

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.3

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Multilingual

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

JRC-Acquis Multilingual Parallel Corpus

Parallel corpora finely aligned (subsentencial granularity)

HeLI-OTS 1.0

HeLI-OTS 2.0

Europarl Parallel Corpus

HeLI-OTS 1.4

HeLI-OTS 1.1

eSpeak

HeLI-OTS 1.5

HeLI-OTS 1.3