OAEI 2007 environment
Trilateral Thesaurus Mapping Task

Abstract

To fulfill the OAEI 2007 environment trilateral thesaurus mapping task, participants are required to align three SKOS thesauri using relations from the SKOS Mapping vocabulary. The results are collected and validated by domain experts.

GEMET-AGROVOC-NALT-GEMET

Current Status

The evaluation is done.

The results of the participants can be found at the following web address: http://www.few.vu.nl/~wrvhage/oaei2007/results
The Gold Standard used to evaluate Precision and Recall can be found at the following web address: http://www.few.vu.nl/~wrvhage/oaei2007/gold_standard

Task

Create an alignment between the SKOS version of

  1. the European Environment Agency (EEA) GEMET thesaurus (±6,500 terms, multilingual: bg, cs, da, de, el, en, en-US, es, et, eu, fi, fr, hu, it, nl, no, pl, ru, sk, sl, sv),
  2. the United Nations Food and Agriculture Organization (FAO) AGROVOC thesaurus (±28,000 terms, multilingual: ar, cs, de, en, es, fr, hu, ja, pt, sk, th, zh) and
  3. the United States National Agricultural Library (NAL) Agricultural thesaurus (±42,000 terms, monolingual: en),

preferably using relatios from the SKOS Mapping Vocabulary. This constitutes three separate mappings, GEMET-AGROVOC, AGROVOC-NALT, and NALT-GEMET. For the construction of each mapping any background knowledge may be used, including the third thesaurus.

A specification of the SKOS vocabularies can be found at the SKOS website. (http://www.w3.org/2004/02/skos/)
A description of these relations can be found in the SKOS Mapping Vocabulary. (http://www.w3.org/2004/02/skos/mapping/)

Participants are advised to use the alignment API to produce the common format for alignments, but using the following mapping relations:

http://www.w3.org/2004/02/skos/mapping#narrowMatch
http://www.w3.org/2004/02/skos/mapping#exactMatch
http://www.w3.org/2004/02/skos/mapping#broadMatch
The other relations and boolean combinators of the SKOS Mapping Vocabulary are also allowed, but will not be evaluated for the OAEI 2007 environment trilateral thesaurus mapping task.
http://www.w3.org/2004/02/skos/mapping#minorMatch
http://www.w3.org/2004/02/skos/mapping#majorMatch
http://www.w3.org/2004/02/skos/mapping#AND
http://www.w3.org/2004/02/skos/mapping#OR
http://www.w3.org/2004/02/skos/mapping#NOT

An example broaderMatch mapping between AGROVOC “hard cheese” and NALT “cheeses” in the common format for alignments, produced by the API looks like this:

<rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
  <Alignment>
  <xml>yes</xml>
    <level>0</level>
    <type>**</type>
    <onto1>http://www.fao.org/aos/agrovoc</onto1>
    <onto2>http://agclass.nal.usda.gov/nalt/2007.xml</onto2>
    <map>
      <Cell>
        <entity1 rdf:resource="http://www.fao.org/aos/agrovoc#16492" />
        <entity2 rdf:resource="http://agclass.nal.usda.gov/nalt/2007.xml#cheeses" />
        <measure rdf:datatype="&xsd;float">1.0</measure>
        <relation>http://www.w3.org/2004/02/skos/mapping#broadMatch</relation>
      </Cell>
    </map>
  </Alignment>
</rdf:RDF>

The mappings between GEMET-AGROVOC, AGROVOC-NALT, and NALT-GEMET should be submitted in three separate alignment files and by e-mail to .

Evaluation procedure

  1. Each participant submits his preliminary mappings, in the common format for alignments, before September 3rd 2007.
  2. Each participant submits his final mappings before October 1st 2007.
  3. A sample of the mappings will be assessed by domain experts at the EEA, FAO, USDA, TNO, and WUR.
  4. The domain experts are required to assess the mappings appointed to them before October 11th 2007.
  5. The results are published before October 11th 2007.
  6. Evaluation measurements of the participants' systems calculated based on this list of reference alignments.
  7. The final list of judgements is given to domain experts and librarians for manual extension to create an official mapping between the thesauri by November 11th 2007.

Thesauri

The latest SKOS version and a naieve OWL Lite translation of the thesauri can be downloaded from the directories listed below. (updated june 18th 2007) The OWL version was derived in the same way as for the library case. The conversion SeRQL queries can be downloaded here. Be advised, when you use the OWL version of the thesauri, that both skos:prefLabel and skos:altLabel have been mapped to rdfs:label. The skos:altLabel is often used to represent synonyms, but also to refer to omitted related terms. If you have any questions about the format, or if you prefer the input in a different format, please let me know. ( )

GEMET
Download GEMET. (version 1.0, 2005-07)
Read more about GEMET at http://www.eionet.europa.eu/gemet.
The GEMET thesaurus has three types of top concepts: themes, groups, and supergroups. Read more about these classification schemes in the GEMET documentation. In the OWL version these are represented as owl:Class, like normal concepts. In addition to this, they also have an rdf:type gemet:Theme, gemet:Group, or gemet:SuperGroup.

AGROVOC
Download AGROVOC. (version 2007-02-19, updated 2007-06-28)
Read more about AGROVOC at http://www.fao.org/agrovoc.

NAL thesaurus
Download the NAL thesaurus. (version 2007)
Read more about the NAL thesaurus at http://agclass.nal.usda.gov/agt.

Results

collection system relation type Precision Recall
NALT-AGROVOC Falcon-AO exactMatch 0.84 0.48
NALT-AGROVOC DSSim exactMatch 0.49 0.20
NALT-AGROVOC X-SOM exactMatch 0.45 0.06
NALT-AGROVOC RIMOM exactMatch 0.62 0.42
NALT-AGROVOC SCARLET exactMatch 0.66 0.003
NALT-AGROVOC SCARLET broadMatch/narrowMatch 0.25 0.006
NALT-AGROVOC SCARLET disjoint 0.64 0
GEMET-AGROVOC Falcon-AO exactMatch 0.88 0.39
GEMET-AGROVOC DSSim exactMatch 0.33 0.15
GEMET-NALT Falcon-AO exactMatch 0.86 0.3
GEMET-NALT DSSim exactMatch 0.44 0.16

A more detailed listing of all the results can be found in this Excel sheet (104KB) and this PDF presentation (3MB)

The results of the participants can be found at the following web address: http://www.few.vu.nl/~wrvhage/oaei2007/results
The Gold Standard used to evaluate Precision and Recall can be found at the following web address: http://www.few.vu.nl/~wrvhage/oaei2007/gold_standard

Organization

Send any questions, comments, or suggestions to:
Willem Robert van Hage ( )