SemEval-2007

Task #09: Multilevel Semantic Annotation of Catalan and Spanish

Home Technical Setting Download Systems & Results

| | |

This section provides partial post-competition information including:

Task description and participant papers.
Full datasets (train + test) and gold standard.
Participants outputs.
Baselines and results obtained by the best participant system in the three subtasks on the test set.

The full version of this section will be available after the SemEval-2007 Workshop.

Task description and participant papers

(Links to the papers will be available after the SemEval-2007 Workshop.)

Task description paper:

SemEval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish.
Lluís Màrquez, Luis Villarejo, M. Antònia Martí and Mariona Taulé.

Participant papers:

UPC: Experiments with Joint Learning within SemEval Task 9.
Lluís Màrquez, Lluís Padro, Mihai Surdeanu and Luis Villarejo
ILK2: Semantic Role Labeling of Catalan and Spanish using TiMBL.
Roser Morante and Bertjan Busser.

Full datasets and gold standard

Participants outputs

Only 2 participants from a set of a dozen interested teams.

ILK2 (Tilburg University): SRL system is based on memory-based classification of syntactic constituents using a rich feature set (including semantic features from WordNet generalizations). A post-process using manual rules was performed to improve results on adjuncts. [ILK2 output]
UPC (Technical University of Catalonia) used several machine learning algorithms for addressing the different subtasks (AdaBoost, SVM, Perceptron). For SRL, the system implements a re-ranking strategy using global features. The candidates are generated using a state-of-the-art SRL base system. [UPC output]

No system attempted a collaborative resolution of several subproblems.

Baselines & results

Baselines and results are presented along 2 dimensions:

(a) language ('ca' = Catalan; 'es' = Spanish)
(b) corpus source ('in' = in domain corpus; 'out' = out of domain corpus)

We will use a 'language.source' pair to denote a particular test set. And finally, '*' will denote the addition of the two subcorpora, either in the language or source dimensions.

Baseline and best system results on the NERC subtask:

Test	Baseline			Best system
Test	Prec.	Recall	F1	Prec.	Recall	F1
ca.*	75.85	15.45	25.68	80.94	77.96	79.42
es.*	71.88	12.07	20.66	70.65	65.69	68.08
*.in	83.06	17.43	28.82	78.21	74.04	76.09
*.out	68.63	12.20	20.72	76.21	72.51	74.31
.	74.45	14.11	23.72	76.93	73.08	74.96

Baseline and best system accuracies on the NSD subtask:

Test	All words			Selected words
Test	Baseline	Best system	Baseline	Best system
ca.*	85.49	86.47	70.06	72.75
es.*	84.22	85.10	61.80	65.17
*.in	84.84	86.49	67.30	72.24
*.out	85.02	85.33	67.07	67.87
.	84.94	85.87	67.19	70.12

Baseline and best system results on the SRL subtask: semantic class tagging (SC)

Test	Baseline	Best system
Test	F1	Prec.	Recall	F1
ca.*	63.99	90.25	88.50	89.37
es.*	49.21	84.30	83.63	83.83
*.in	52.50	84.68	83.11	83.89
*.out	60.69	90.04	89.08	89.56
.	56.60	87.12	85.81	86.46

Baseline and best system results on the SRL subtask: semantic role labeling (SR)

Test	Baseline			Best system
Test	Prec.	Recall	F1	Prec.	Recall	F1
ca.*	83.28	76.88	79.95	84.72	82.12	83.40
es.*	81.61	76.05	78.73	84.30	83.98	84.14
*.in	82.07	80.70	81.38	84.71	84.12	84.41
*.out	82.88	71.48	76.76	84.26	81.84	83.03
.	82.42	76.46	79.32	84.50	83.07	83.78

Last update: June 15th, 2007

For more information, visit the SemEval-2007 home page.