Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
These levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
A morphological layer for the German part of the SMULTRON corpus. Layer was annotated according to the STTS tagset and t…
A morphological layer for the German part of the SMULTRON corpus. Layer was annotated according to the STTS tagset and the annotation guidelines of the Tiger corpus. Coordinator: Thomas Müller Annotators: Francesca Caratti, Arne Recknagel This distribution contains a morphological layer for the SMULTRON corpus […
This bilingual thesaurus (French-English), developed at Inist-CNRS, covers the concepts from the emerging COVID-19 outbr…
This bilingual thesaurus (French-English), developed at Inist-CNRS, covers the concepts from the emerging COVID-19 outbreak which reminds the past SARS coronavirus outbreak and Middle East coronavirus outbreak. This thesaurus is based on the vocabulary used in scientific publications for SARS-CoV-2 and other coronaviru…
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff…
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 wo…
The file contains the charts, tables and figures serving to delineate the metaphor-metonymy cognitive mechanism behind E…
The file contains the charts, tables and figures serving to delineate the metaphor-metonymy cognitive mechanism behind English denominal verbs. The data was obtained by questionnaires and interviews, which was then documented into charts and tables. Figures submitted mainly provide clear outline and concise outline of …
This resource is the third version of the Italian morphological dictionary for content words (http://hdl.handle.net/1137…
This resource is the third version of the Italian morphological dictionary for content words (http://hdl.handle.net/11372/LRT-2630), encoded in a JSON Lines format. Compared to the previous version, it contains some minor improvements.
In the recent years, Transformer-based models have lead to significant advances in language modelling for natural langua…
In the recent years, Transformer-based models have lead to significant advances in language modelling for natural language processing. However, they require a vast amount of data to be (pre-)trained and there is a lack of corpora in languages other than English. Recently, several initiatives have presented multilingual…
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been…
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Embeddings from word2vec model described in "From Diachronic to Contextual Lexical Semantic Change: Introducing Semantic…
Embeddings from word2vec model described in "From Diachronic to Contextual Lexical Semantic Change: Introducing Semantic Difference Keywords (SDKs) for Discourse Studies". Full reference TBC.
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been…
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.