D4.1 Controlled Vocabularies Specification

This deliverable summarizes the controlled vocabularies to be used in the VHH project as well as the methodologies and processes applied to create them.

In a project such as VHH which aims at linking images and texts, controlled vocabularies play a crucial role in creating a shared index for different media types. This is particularly true if indexes are created in two ways, manually and automatically, the latter through automated analysis tools. A controlled vocabulary is a glossary, not a dictionary: it does not contain every word or phrase actually used in a specific language but rather a carefully selected and well-defined list from a specific domain of that language, excluding homonyms and synonyms (or rather relating to them).

Controlled vocabularies are at the core of subject headings, thesauri and taxonomies. Indexing has been a key task in library science since its beginnings. Its methods have been adopted as well as adapted by archival science. The art of indexing always consisted of a certain and sometimes uncertain balance between the literal repetition of terms used in a document and their subsumption under terms that may or may not be used literally in the document. In that sense, even descriptive terms have an analytical dimension as far as they generate at least some normalization. Anticipating and facilitating search queries are the main functions of every index in the digital as well as in the analog realm.

Whenever possible, VHH
• uses existing vocabularies;
• adapts existing vocabularies instead of creating new ones;
• embeds newly created vocabularies in existing ontologies;
• adapts existing ontologies instead of creating new ones.
In order to achieve these goals, VHH heavily draws on Linked Open Data (LOD).

Although adapted and created for specific purposes, the VHH Vocabularies and its principles may be used for other projects as well.