Task #09: Multilevel Semantic Annotation of Catalan and
June 15th, 2007: Check the 'Systems & Results' section for an advance of the results report.
May 22nd, 2007: The competition is now over. The evaluation has been carried on and its results will be reported during the Semeval-2007 Workshop celebration. We hope to see you all there!.
April 12th, 2007: Updated version of the NER official scorer available. Download it together with updated Semeval task#9 software v1.4.
March 24th, 2007: Full NSD dictionaries relating lemmas and WordNet senses for Catalan and Spanish available. Check the Download section.
March 21th, 2007: NSD note: We only expect sense labels for the nouns marked as targets (nouns labeled with 'CS' are not marked as targets).
March 20th, 2007: Updated version of the official scorer available. Download it together with updated Semeval task#9 software v1.3. Check the Download section.
March 12th, 2007: Test set available at the SemEval-2007 webpage for task #9.
March 12th, 2007: (update) Complete training data following textual order of the predicates available at the SemEval-2007 webpage for task #9. SRL columns must follow the textual order of the predicates. Check the Technical Setting section to get updated information.
March 9th, 2007 : Beta version of the official scorer released. Download it together with updated Semeval task#9 software v1.2. Check the Download section. Updated verbal lexicon for the whole train dataset. Check the Download section.
March 5th, 2007: Complete training data available at the SemEval-2007 webpage for task #9. Semeval task#9 software version 1.0 available. Check the Download section.
March 2nd, 2007: Updated verbal lexicon for the whole train dataset. Check the Download section.
February 27th, 2007: Dictionaries relating nouns and WordNet senses for Catalan and Spanish available. Check the Download section.
February 26th, 2007: Evaluation period begins, the first part of the training data is already available at the SemEval-2007 webpage for task #9. Check the Technical Setting section for updated information on the Evaluation. Updated Verbal Lexicon
February 23rd, 2007: Train and test data release calendar available, check the Download section.
February 21st, 2007: Updated trial dataset, with minor errors fixed, available at the SemEval-2007 webpage for task #9.
February 19th, 2007: Registration open at the registration site for SemEval-2007
February 18th, 2007: Evaluation period extended to 4 weeks: from the moment you download the traning set you will have 4 weeks to upload the outputs of your system on the test set. Updated Verbal Lexicon and full Catalan and Spanish WordNets available in the Download section.
January 10th, 2007: Trial datasets are already available. All the website has been updated accordingly. The task description has been updated and several information added. Check the Technical Setting and Download sections
October 13, 2006: This new website has been posted. Welcome to the task!!!
AckowledgementsThe following people worked very hard on the developement of the corpora used in this task, manually annotating all linguistic layers and developing various software tools for their processing. Thanks to all. We owe you a lot!
Juan Aparicio, Manu Bertran, Oriol Borrega, Núria Bufí, Joan Castellví, Maria Jesús Díaz, Marina Lloberes, Difda Monterde, Aina Peris, Lourdes Puiggrós, Marta Recasens, Santi Reig, and Bàrbara Soriano.
General Task Description
SummaryIn this task, we aim at evaluating and comparing automatic systems for semantic annotation at several levels for the Catalan and Spanish languages. The three semantic levels considered include: semantic roles and verb disambiguation, disambiguation of all nouns, and named entity recognition.
[1, Semantic Role Labeling, SRL] The annotation of semantic roles of verb predicates is similar to PropBank style (Palmer et al. 2005; Taulé et al. 2005; Taulé et al. 2006), and the task setting similar to that of 2005 CoNLL shared task. Verb disambiguation refers to the assignment of the proper semantic-class tag to the verb, which is a much coarser grained level than the usual sense disambiguation. This tag is composed by the thematic structure number (as indexed in the role set file for the verb predicate) and the lexico-semantic class, which is used to map the numbered arguments into semantic roles.
[2, Noun Sense Disambiguation, NSD] The disambiguation of nouns will have a similar shape to an "all-words" disambiguation task, with the exception that only the frequent nouns will be treated. The sense repository used for the annotation will consist of the current versions of the Catalan and Spanish WordNets.
[3, Named Entity Recognition, NER] The annotation of named entities will include recognition and classification of simple entity types (person, location, organization, etc.) but including embedding of entities. We will be considering core "strong" entities (e.g., [US]_loc) and "weak" entities, which, by definition, include some strong entities (e.g., The [president of [US]_loc]_per) (Arévalo, Civit & Martí 2004; Arévalo et al. 2002).
All semantic annotation tasks will be performed on exactly the same corpora for each language. We present all the annotation levels together as a complex global task, since we are interested in approaches which address these problems jointly, possibly taking into account cross-dependencies among them. However, we will be also accepting systems approaching the annotation in a pipeline style, or addressing any of the particular subtasks in any of the languages (3 levels x 2 languages = 6 subtasks).
More particularly, the input for training will consists of a medium-size set of sentences (~150Kwords per language) with gold-standard full syntactic annotation (including function tags) and the semantic annotations of SRL, NSD, and NER, which is the target knowledge to be learned. The full parse trees are provided only to ease the learning process, but participants are not committed to use them if they do not want. The test corpus will be about 10 times smaller than the training corpus and will include the full syntactic annotation without the semantic levels, which have to be predicted. The parse trees of the test set will be also the manually revised gold-standard ones. Unfortunately, we have had no time to prepare automatic parsers for both languages to provide the automatic generated syntactic input levels, as we initially planned.
Formats are formally described in the Technical Setting section of the task webpage. They are very similar to those of the CoNLL-2005 shared task (column style presentation of levels of annotation). in order to be able to share evaluation tools and already developed scripts for format conversion.
As previously said, we will use standard evaluation metrics for each of
the defined subtasks (SRL, NSD, NER), based on precision/recall/F1
measures, since they are basically recognition tasks. Classification
accuracy will be also calculated for verb disambiguation and NSD.
For more information, visit the SemEval-2007 home page.