|
This section provides partial post-competition information including:
-
Task description and participant papers.
-
Full datasets (train + test) and gold standard.
-
Participants outputs.
-
Baselines and results obtained by the best participant system in the three subtasks on the test set.
The full version of this section will be available after the SemEval-2007 Workshop.
Task description and participant papers
(Links to the papers will be available after the SemEval-2007 Workshop.)
- Task description paper:
- SemEval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish.
Lluís Màrquez, Luis Villarejo, M. Antònia Martí and Mariona Taulé.
- Participant papers:
- UPC: Experiments with Joint Learning within SemEval Task 9.
Lluís Màrquez, Lluís Padro, Mihai Surdeanu and Luis Villarejo
- ILK2: Semantic Role Labeling of Catalan and Spanish using TiMBL.
Roser Morante and Bertjan Busser.
Full datasets and gold standard
Participants outputs
Only 2 participants from a set of a dozen interested teams.
- ILK2 (Tilburg University): SRL system is based on memory-based classification of syntactic constituents
using a rich feature set (including semantic features from WordNet generalizations). A post-process using
manual rules was performed to improve results on adjuncts. [ILK2 output]
- UPC (Technical University of Catalonia) used several machine learning algorithms for addressing the
different subtasks (AdaBoost, SVM, Perceptron). For SRL, the system implements a re-ranking strategy
using global features. The candidates are generated using a state-of-the-art SRL base system. [UPC output]
No system attempted a collaborative resolution of several subproblems.
Baselines & results
Baselines and results are presented along 2 dimensions:
-
(a) language ('ca' = Catalan; 'es' = Spanish)
-
(b) corpus source ('in' = in domain corpus; 'out' = out of domain corpus)
We will use a 'language.source' pair to denote a particular test set. And finally, '*' will denote the addition of the two subcorpora, either in the language or source dimensions.
-
Baseline and best system results on the NERC subtask:
Test |
Baseline |
Best system |
Prec. |
Recall |
F1 |
Prec. |
Recall |
F1 |
ca.* |
75.85 |
15.45 |
25.68 |
80.94 |
77.96 |
79.42 |
es.* |
71.88 |
12.07 |
20.66 |
70.65 |
65.69 |
68.08 |
*.in |
83.06 |
17.43 |
28.82 |
78.21 |
74.04 |
76.09 |
*.out |
68.63 |
12.20 |
20.72 |
76.21 |
72.51 |
74.31 |
*.* |
74.45 |
14.11 |
23.72 |
76.93 |
73.08 |
74.96 |
-
Baseline and best system accuracies on the NSD subtask:
Test |
All words |
Selected words |
Baseline |
Best system |
Baseline |
Best system |
ca.* |
85.49 |
86.47 |
70.06 |
72.75 |
es.* |
84.22 |
85.10 |
61.80 |
65.17 |
*.in |
84.84 |
86.49 |
67.30 |
72.24 |
*.out |
85.02 |
85.33 |
67.07 |
67.87 |
*.* |
84.94 |
85.87 |
67.19 |
70.12 |
-
Baseline and best system results on the SRL subtask: semantic class tagging (SC)
Test |
Baseline |
Best system |
F1 |
Prec. |
Recall |
F1 |
ca.* |
63.99 |
90.25 |
88.50 |
89.37 |
es.* |
49.21 |
84.30 |
83.63 |
83.83 |
*.in |
52.50 |
84.68 |
83.11 |
83.89 |
*.out |
60.69 |
90.04 |
89.08 |
89.56 |
*.* |
56.60 |
87.12 |
85.81 |
86.46 |
-
Baseline and best system results on the SRL subtask: semantic role labeling (SR)
Test |
Baseline |
Best system |
Prec. |
Recall |
F1 |
Prec. |
Recall |
F1 |
ca.* |
83.28 |
76.88 |
79.95 |
84.72 |
82.12 |
83.40 |
es.* |
81.61 |
76.05 |
78.73 |
84.30 |
83.98 |
84.14 |
*.in |
82.07 |
80.70 |
81.38 |
84.71 |
84.12 |
84.41 |
*.out |
82.88 |
71.48 |
76.76 |
84.26 |
81.84 |
83.03 |
*.* |
82.42 |
76.46 |
79.32 |
84.50 |
83.07 |
83.78 |
Last
update: June 15th, 2007
|