N-gram Language Models based on DeWaC German web corpus

Name
  • N-gram Language Models based on DeWaC German web corpus
Creator
  • Phonetics Group at Saarland University
Description
  • The resource is a set of language models at syllable and phone level. The contex t length (n-gram length) of the models ranges from 1 to 4 for syllable level and 1 to 6 for phone level. For each n-gram length, two model versions are provided: A forward version which contains the probability of a unit to occur given the preceding context, and a backward version which contains the probability of a unit to occur in the follow ing context. Each forward and backward model has a version that includes syllable boundary in formation and a version without syllable boundaries. The models were trained on the DeWaC German web corpus (Baroni and Kilgarriff 20 06) using the SRILM language modeling toolkit (Stolcke 2002). Syllabification was performed using HMM syllable tagger (Schmid, Möbius and Weidenkaff 2007).
Collection
  • Universität des Saarlandes CLARIN-D-Zentrum, Saarbrücken
Subject
  • language model
  • phonetics
  • german
National project
  • CLARIN-D
Resource type
  • collection
Data provider
  • Universität des Saarlandes
Temporal coverage
  • [2009 TO 2009]
Record identifier

More like this...

The following records may also interest you:
Kven N-grams
The Kven N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of the la…
Lule Saami N-grams
The Lule Saami N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of …
North Saami N-grams
The North Saami N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of…
South Saami N-grams
The South Saami N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of…
unit 7 of the level B Indian Sign Language course
This is unit 7 of the advanced level (level B) Indian Sign Language course.