Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
These levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies an…
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies and categorises syntactic chunks in plain text Tools in workflow: Freeling shallow parser web service (service provided by the PANACEA project) NOTE: The licence provided covers the U-Compare web servi…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and cl…
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one c…
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries…
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora comprise between 9 and 126 million words and the complete set contains over 1.1 billion words. The…
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries…
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora comprise between 9 and 126 million words and the complete set contains over 1.1 billion words. The…
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 an…
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size. The corpora have extensive metadata, including aspects of the parliament; the speakers (name, ge…
The lexicon contains concreteness and imageability predictions of words in 77 languages. The resource is built via super…
The lexicon contains concreteness and imageability predictions of words in 77 languages. The resource is built via supervised machine learning, using average human responses obtained for Croatian lexemes inside the MEGAHR project (http://megahr.ffzg.unizg.hr) as the response variable, and the Facebook cross-lingual wor…