Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
These levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
<p>The En_VA_Analyzer is a natural language processing web service that performs automatic detection and classification …
<p>The En_VA_Analyzer is a natural language processing web service that performs automatic detection and classification of specific types of verbal attacks expressed in Tweets written in English against specific targets. The current version of the tool is designed to address verbal attacks against specific groups of ta…
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twit…
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator agreement, or to study the differences between l…
<p>This corpus consists of two files in Greek and English of Yanis Varoufakis' tweets, when appointed Minister of Financ…
<p>This corpus consists of two files in Greek and English of Yanis Varoufakis' tweets, when appointed Minister of Finance. The tweets were written in the period from 05/01/2015 to 27/06/2015 in the Greek language, and from 01/01/2015 to 27/06/2015 in the English language.</p>; Το σώμα κειμένου αποτελείται από δύο αρχεί…
Ce corpus reprend les tweets produits par des eurodéputé·e·s belges, français, britanniques et espagnols depuis octobre …
Ce corpus reprend les tweets produits par des eurodéputé·e·s belges, français, britanniques et espagnols depuis octobre 2013. Il comprend en outre les tweets produits par des candidat·e·s à un mandat au parlement européen pendant les campagnes électorales de 2014 et 2019.
This dataset features all the tweetids and labels that were used to model the language of 24 hashtags, and test the perf…
This dataset features all the tweetids and labels that were used to model the language of 24 hashtags, and test the performance on predicting the hashtags in unseen tweets. This study is described in: Kunneman, F.A., Liebrecht, C.C. & Bosch, A.P.J. van den (2014). The (Un)Predictability of Emotional Hashtags in Twitte…
Presentation of the researchers of the PROMAP research project in the proceedings of the 10th International Conference o…
Presentation of the researchers of the PROMAP research project in the proceedings of the 10th International Conference on Web and Social Media (ICWSM 2016), workshop's title "Social Media in the Newsroom" that took place in Cologne, Germany. In this presentation, the authors present a platform for automated data proces…
Dataset on English with code-switching to Serbian in the form of 10 hours of audio material, collected in Belgrade, Novi…
Dataset on English with code-switching to Serbian in the form of 10 hours of audio material, collected in Belgrade, Novi Sad and Nis, Serbia. Data collected online through Facebook and Twitter. Transciption included, type unknown; -
This dataset features the training models, emotion classifications and emotion patterns before and after events, related…
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper: F. Kunneman, M. van Mulken and A. Van den Bosch, Anticipointment detection in event tweets (under review) Abstract of the study: We developed a system to detect positive expectation,…
The corpus contains 668,529 tweets (tweet IDs) relevant to "impact investing", accompanied by sentiment labels given by …
The corpus contains 668,529 tweets (tweet IDs) relevant to "impact investing", accompanied by sentiment labels given by an automated sentiment classifier. Impact investing involves investments made into companies, organizations, and funds with the intention to generate social and environmental impact alongside a fin…
The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe. Within the framework of …
The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe. Within the framework of the collaborative research project OLiNDiNUM (Observatoire LINguistique du DIscours NUMérique, Linguistic Observatory of Online Debate) the MIGR-TWIT Corpus is created with the aim of developing langu…