CLARIN Virtual Language Observatory: Help
The Virtual Language Observatory (VLO) faceted browser was developed within CLARIN as a means to explore linguistic resources, services and tools available within CLARIN and related communities. Its aim is to provide an easy to use interface, allowing for a uniform search and discovery process for a large number of resources from a wide variety of domains and providers.
More information can be found on the VLO page on the CLARIN website. For answers to common questions about the VLO, please consult the FAQ. More documentation and other references are listed on the "About" page.
The VLO search interface presents a number of facets, for each of which a value can be selected in order to narrow down the selection of displayed records. For example, to only include records that relate to Spain as a country, open the facet Country and select the value Spain. Notice that next to each available value, a number is displayed that indicates the number of records within the current selection that contain that value, in other words the number of remaining records should that value be selected.
Only the values that occur in the current selection (that is, the records that match the already selected values and the optional textual query (see below)) are shown. The VLO shows up to ten of the most frequently occurring values for each facet when you click on the facet name. If there are more then ten available values, there is a link labeled more..., which leads to a pop-up showing all available values (given the current selections), than can be filtered textually and sorted either alphabetically or by number of matching records. It is also possible to search for facet values by typing (part of) a value in the filter box below the facet name and above the facet values ('Type to search for more') in the panel next to the search results.
Facets that do not have any matching records given the current selection will not be displayed in the facets panel in the VLO search interface.
In addition to navigating the resources by means of the selection of facet values, the VLO faceted browser also allows for searching by means of textual queries.
Such queries are to be entered in the large text box at the top of the main page or faceted browsing page with the button labeled 'Search' next to it.
In its simplest form, a search query consists of one or more terms, separated by a space character. Such queries result in the retrieval of all documents that have one or more occurrences of all of the included search terms. In other words, an AND operator is implied by default.
The Lucene Query Parser syntax allows for the following boolean operators: 'AND', 'OR', 'NOT', '+' and '-'. It also allows for grouping of terms by means of parenthesis. Terms can be combined into phrases by means of double quote characters.The following examples illustrate the usage of these operators. Click any of the following examples to perform that query on the actual data currently in the VLO:
- German AND acquisition (all records that match both terms)
- German acquisition is an equivalent query
- Myanmar OR Birma OR Burma (all records that match at least one of these terms)
- newspaper AND (Czech OR Slovak) (all records that match the term 'newspaper' and at least one of the terms 'Czech' or 'Slovak')
- Portuguese -Brazil (all records that match 'Portuguese' but do not have an occurrence of 'Brazil')
- Portuguese NOT Brazil is an equivalent query
- "sign language" AND (Sweden OR Finland) (all records that match the phrase 'sign language' and at least one of the terms 'Sweden' or 'Finland')
In addition to the logical operators, the syntax also allows for search for occurrences of a term within a specific field, such as language or modality. To do so, enter the name of the field and the term to search for, separated by a semicolon. The asterisk ('*') can be used to achieve partial matches. Quotes are required to match a term that contains spaces.
The following field names are available:
language, country, continent, modality, genre, subject, format,
organisation, resourcetype, keyword, resources.
- resourcetype:"Linguistic corpora" (equivalent to selecting this value below the 'Resource Type' facet)
- language:Sign (all records with a language name that contains the word 'sign')
- country:(Belgium -Netherlands) (all records with Belgium as country but not also Netherlands)
- country:Finland AND language:Swedish AND NOT language:Finnish (records in Finland with resources in the Swedish language but not in Finnish)
- format:video/* (all records with video resources)
- resources:[5 TO *] (all records with at least 5 resources)
A full overview of syntax features, including options for fuzzy search, ranges, and term boosting, can be found at the Lucene syntax description page.