Search results

HeLI-OTS 1.1

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.0

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 2.0

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-17…

HeLI off-the-shelf language identifier with language models for 220 languages. # Performance It can identify c. 600-1700 sentences (averaging c. 150 characters) per second from a file using one core and around 4,3 gigabytes of memory on a modern laptop. # Requirements Java The software has been created and tested o…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.5

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.4

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

HeLI-OTS 1.3

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…

Zulu Nenets Yiddish Mingrelian Walloon … (+195)

VCR

Concreteness and imageability lexicon MEGA.HR-Crossling

(Part of CLARIN.SI data & tools)

The lexicon contains concreteness and imageability predictions of words in 77 languages. The resource is built via super…

The lexicon contains concreteness and imageability predictions of words in 77 languages. The resource is built via supervised machine learning, using average human responses obtained for Croatian lexemes inside the MEGAHR project (http://megahr.ffzg.unizg.hr) as the response variable, and the Facebook cross-lingual wor…

Afrikaans Arabic Azerbaijani Belarusian Bulgarian … (+76)

Landing page for this record

VCR

The Emille Corpus (Beta Release Version)

(Part of OTA Core Collection)

Engineering and Physical Science Research Council (EPSRC)

English Gujarati Tamil Hindi Punjabi; Pan.. … (+2)

Landing page for this record

VCR

HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India

(Part of LINDAT / CLARIAH-CZ Data & Tools)

HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India Languages This is a collec…

HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India Languages This is a collection of folksongs for 26 languages that form a dialect continuum in North India and nearby regions. Namely Angika, Awadhi, Baiga, Bengali, Bhadrawahi, Bhili, Bhojpuri, Braj, Bundeli, Chhattisgarhi,…

Hindi Marathi Magahi Awadhi Bhojpuri … (+20)

Landing page for this record

VCR

C4Corpus (CC-BY part)

(Part of LRT + Open Submissions Data & Tools)

A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been…

A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.

Afrikaans Arabic Bengali; Ban.. Bulgarian Czech … (+51)

Landing page for this record

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Multilingual

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

HeLI-OTS 1.1

HeLI-OTS 1.0

HeLI-OTS 2.0

HeLI-OTS 1.5

HeLI-OTS 1.4

HeLI-OTS 1.3

Concreteness and imageability lexicon MEGA.HR-Crossling

The Emille Corpus (Beta Release Version)

HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India

C4Corpus (CC-BY part)