Online Personal Exploration and Navigation of SoNaR

Name
  • Online Personal Exploration and Navigation of SoNaR
Description
  • OpenSoNaR contains the SoNaR500 and SoNaR New Media written Dutch texts from 1950 to 2102. The texts are enriched with several annotations (Part of Speech and lemma information) and are available as FoLiA xml files (folia.xml). The system relies on BlackLab server as back-end and WhiteLab as user-interface.
  • The SoNaR corpus was developed within the context of the research program STEVIN.
  • Documentation in English
  • annotation of syntax: Part of Speech tagging; annotator: Frog tagger; set: CGN annotation set; annotator type: automatic
Collection
  • INT
Language
Modality
  • written
Continent
  • Europe
Country
  • Netherlands
Organisation
  • Radboud University Nijmegen
  • Dutch Language Institute (Instituut voor de Nederlandse Taal, INT)
National project
  • CLARIAH
Resource type
  • text
Data provider
  • Instituut voor de Nederlandse Taal
Record identifier

More like this...

The following records may also interest you:
SoNaR
SoNaR is a 500-million-word reference corpus of contemporary written Dutch for use in different types of linguistic (incl. lexicographic) and HLT research and the development of ap…
SoNaR: STEVIN Nederlandstalig Referentiecorpus
The SoNaR Corpus is a large corpus of written Dutch. The corpus consists of two datasets, viz. SoNaR-500 and SoNaR-1. SONAR-500 contains over 500 million wo…
SoNaR: STEVIN Nederlandstalig Referentiecorpus - Nieuwe Media Corpus
The SoNaR New Media Corpus 1.0 consists of the text categories WR-P-E-L_tweets, WR-U-E-A_chats, WR-U-E-D_sms and contains over 35 million words. Those texts were collected within t…
SoNaR N-grams
From the SoNaR Corpus version 1.2 (SONAR500) n-grams have been derived with the lengths 1, 2, and 3. The original text files were converted to txt-files in utf8. With a Perl-script…
MBT – Memory-Based Tagger-Generator and Tagger
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger pa…