CoNLL-2004 and CoNLL-2005 Shared Tasks

Semantic Role Labeling


Introduction    F.A.Q.    References    CoNLL Conferences


CoNLL-2005 :       Description&Goal       Examples       Data&Software      Systems&Results 


CoNLL-2004 :      Summary Page (data, systems & results)



A semantic role in language is the relationship that a syntactic constituent has with a predicate. Typical semantic arguments include Agent, Patient, Instrument, etc. and also adjunctive arguments indicating Locative, Temporal, Manner, Cause, etc. aspects. Recognizing and labeling semantic arguments is a key task for answering "Who", "When", "What", "Where", "Why", etc. questions in Information Extraction, Question Answering, Summarization, and, in general, in all NLP tasks in which some kind of semantic interpretation is needed.

The following sentence, taken from the PropBank corpus, exemplifies the annotation of semantic roles:

[A0 He ] [AM-MOD would ] [AM-NEG n't ] [V accept ] [A1 anything of value ] from [A2 those he was writing about ] .
Here, the roles for the predicate accept (that is, the roleset of the predicate) are defined in the PropBank Frames scheme as:
V: verb
A0: acceptor
A1: thing accepted
A2: accepted-from
A3: attribute
AM-MOD: modal
AM-NEG: negation

The Shared Tasks of CoNLL-2004 and CoNLL-2005 concerned the recognition of semantic roles for the English language, based on PropBank predicate-argument structures. Given a sentence, the task consists of analyzing the propositions expressed by some target verbs of the sentence. In particular, for each target verb all the constituents in the sentence which fill a semantic role of the verb have to be recognized. We will refer to this problem as Semantic Role Labeling (SRL).

As in all previous CoNLL shared tasks, the general goal is to come forward with machine learning strategies which address the proposed NLP problem, SRL in the present case. In CoNLL-2004, the goal was to develop SRL systems based on partial parsing information (see the main conclusions and system descriptions and evaluations here). In CoNLL-2005, the main focus of interest was to increase the amount of syntactic and semantic input information, aiming to boost the performance of machine learning systems on the SRL task. Following earlier editions of the shared task, the input information contained several levels of annotation apart from the role labeling information: words, POS tags, chunks, clauses, named entities, and parse trees. All participants were suggested to propose novel learning architectures for better exploiting the data structures, relations and constraints of the problem.

Compared to the shared task of CoNLL-2004, the novelties introduced in the 2005 edition were:

  1. The training corpus was substantially enlarged. This allows to test the scalability of learning-based SRL systems to big datasets and compute learning curves to see how much data is necessary to train.
  2. Aiming at evaluating the contribution of full parsing in SRL, the complete syntactic trees given by several alternative parsers was provided as input information for the task.
  3. In order to test the robustness of the presented systems, a cross-corpora evaluation was performed using fresh test sets from corpora other than the one used for training.
  4. Though encouraged, taking part in the closed setting was no longer obligatory. Thus, teams developing systems that rely intrinsically on external resources will be able to participate. However, no team contributed to the open challenge.

In both editions, participant systems were evaluated in two different categories depending on whether they use the information strictly contained in the training data (closed challenge) or they make use of external sources of information and/or tools (open challenge). The closed setting allows to compare systems under strict conditions, while the open setting aims at exploring the contributions of other sources of information and the limits of the current learning-based systems on the SRL task. Participants in the open challenge were encouraged to propose novel ideas for using rich semantic information, e.g., verbnet, wordnet and other lexico-semantic resources, word sense disambiguation, etc. The use of unlabeled examples might be also considered. No system was presented at the open challenge of the tasks.


Take a look at the specification of the CoNLL-2005 task at the Description&Goal page.
The data and software of CoNLL-2005 is available at the Data&Software page.
The Systems&Results page contains the evaluation results of the 19 systems presented in CoNLL-2005, as well as the description papers of the systems, the introduction paper of the task, and the talks presented at the Shared Task session.
The CoNLL-2004 Shared Task page overviews the 2004 edition of the SRL task, and includes all materials (data, software, results, papers and talks).


Organization

Shared Task Chairs:
Xavier Carreras and Lluís Màrquez

srlconll <at> lsi.upc.edu

TALP Research Center
Technical University of Catalonia (UPC)


Last update: September 7, 2005. Xavier Carreras, Lluís Màrquez