Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
These levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all W…
Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Severa…
A page listing all resources in WALS Online which are relevant to the language Wu.
A page listing all resources in WALS Online which are relevant to the language Wu.
*Introduction* 2008 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) …
*Introduction* 2008 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 942 hours of multilingual telephone speech and English interview speech along with transcripts and other materials used as test d…
*Introduction* 2007 NIST Language Recognition Evaluation Supplemental Training Set was developed by the Linguistic Data…
*Introduction* 2007 NIST Language Recognition Evaluation Supplemental Training Set was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). It consists of 118 hours of conversational telephone speech segments in the following languages and dialects: Arabic (E…
*Introduction* Mixer 3 Speech was developed by the Linguistic Data Consortium (LDC) and comprises 3,200 hours of audio …
*Introduction* Mixer 3 Speech was developed by the Linguistic Data Consortium (LDC) and comprises 3,200 hours of audio recordings of conversational telephone speech involving 3,875 speakers and 26 distinct languages. This material was collected by LDC from 2005-2007 as part of the Mixer project, and recordings in this…
Dataset on LiLi Wu in the form of 300 hours of audio material, collected in LiLi and Suzhou, China. Considers tone. Tran…
Dataset on LiLi Wu in the form of 300 hours of audio material, collected in LiLi and Suzhou, China. Considers tone. Transciption included, type unknown; -