CoNLL-2005 Shared Task:
Semantic Role Labeling: Description & Goal
The data is a collection of sentences, each of which contains a number
of target verbs and other annotations. The goal of the task is to
develop a machine learning system to recognize participants of the
propositions governed by the target verbs. For simplicity, we will
refer as arguments to all kinds of participants in a proposition,
including adjunctives, references and the verb realization. The output,
thus, is a set of arguments for each target proposition.
The annotations provided as input to support the recognition consist of
syntactic information and named entities. Following earlier CoNLL
Shared Tasks, the syntactic information consists of part-of-speech
tags, base chunks and clauses. In addition, full syntactic parses are
provided as well. These input annotations are predicted by
Training and development data are provided to build the learning
system. These datasets contain predicted input annotations and the
correct outputs. The training set will be used for trainin systems. The
development set will be used to tune parameters of the learning systems.
Evaluation will be performed on a separate test set, which will be
provided with target verbs and predicted input annotations. A system
will be evaluated with respect to precision, recall and the F1 measure
of recognized arguments. For an argument to be correctly recognized,
both the words spanning the argument and its type have to be correct.
The srl-eval.pl program, distributed by the organization at the software section, is the
official program to compute the scores. As in the CoNLL-2004 edition,
arguments annotating the verb predicate (i.e., V args) will not be
In particular, datasets are sections of the Wall Street Journal (WSJ)
part of the Penn
TreeBank II (TB). We follow the standard WSJ partition used in
syntactic parsing, which is:
|WSJ Sections 02-21
|WSJ Section 24
|WSJ Section 23 + fresh sentences
The annotations of predicate-argument structures have been derived from
PropBank (PB), while the
preprocessors that predict the input
annotations have been developed within the standard partition of the
This edition, the test set will include a collection of fresh sentences which are not part
of WSJ. The aim is to evaluate the robustness of systems on data
outside the training corpus.
The Shared Task evaluation is separated into two challenges:
- Closed Challenge. Systems
have to be built strictly with information contained in the training
sections of TB and PB, and tuned with the development section. In most
of the cases, systems in this evaluation will make use only of the
provided training dataset. However, it is also possible to use any
information in the training sections of TB and PB, and consequently to
use any preprocessor strictly developed within the standard WSJ
partition (this includes the chunkers and clausers of previous Shared
Tasks, as well as most of the syntactic WSJ-parsers). In addition, the
PropBank frames can also be used (see Official
Resources). The aim of this challenge is to produce a ranking of
systems with respect to their F1 measure, and to compare their
performance in a fair environment.
- Open Challenge. Systems
can be developed making use of any kind of external tools and
resources. The only condition is that such tools or resources have not
been developed with the annotations of the test set, both for the input
and output annotations of the data. In this challenge, we are
interested in learning methods which make use of tools or resources
that might improve the performance. For example, we encourage the use
of rich semantic information, by using WordNet, VerbNet or a WSD
system. The comparison of different systems in this setting may
fair, and thus ranking of systems is not necessarily important.
The input annotations we provide have been computed with the following
- UPC processors :
- Part-of-Speech (PoS) tagger of (Giménez
and Màrquez, 2003)
- Chunker and clauser of (Carreras
and Màrquez 2003), both developed within the CoNLL-2000 and
CoNLL-2001 Shared Task settings, respectively. The clause boundaries
predicted by this partial parser respect the boundaries of the chunks.
Hence, this processor outputs a well-formed structure of chunks and
- Collins parser:
The full parser of (Collins 99),
with "model 2". Predicts WSJ full parses, with information of the
lexical head for each syntactic constituent. The PoS tags (required by
the parser) have been computed with (Giménez
and Màrquez 2003).
- Charniak parser:
The full parser of (Charniak 00).
Predicts PoS tags and WSJ full parses.
- Named Entity Extractor:
Of (Chieu and Ng 2003),
developed within the 2003 Shared Task on Named Entity Extraction for
Note: this processor has
not been developed with WSJ training data, and is the only exception
allowed for the closed challenge.
Here is an example of a fully-annotated sentence:
WORDS----> NE---> POS PARTIAL_SYNT FULL_SYNT------> VS TARGETS PROPS------->
There is one line for each token, and a blank line after the last
token. The columns, separated by spaces, represent different
annotations of the sentence with a tagging along words. For structured
annotations (named entities, chunks, clauses, parse trees, arguments),
we use the Start-End format.
The * DT (NP* (S* (S(NP* - - (A0* (A0*
$ * $ * * (ADJP(QP* - - * *
1.4 * CD * * * - - * *
billion * CD * * *)) - - * *
robot * NN * * * - - * *
spacecraft * NN *) * *) - - *) *)
faces * VBZ (VP*) * (VP* 01 face (V*) *
a * DT (NP* * (NP* - - (A1* *
six-year * JJ * * * - - * *
journey * NN *) * * - - * *
to * TO (VP* (S* (S(VP* - - * *
explore * VB *) * (VP* 01 explore * (V*)
Jupiter (ORG*) NNP (NP*) * (NP(NP*) - - * (A1*
and * CC * * * - - * *
its * PRP$ (NP* * (NP* - - * *
16 * CD * * * - - * *
known * JJ * * * - - * *
moons * NNS *) *) *))))))) - - *) *)
. * . * *) *) - - * *
The Start-End format
represents phrases (chunks, arguments, and syntactic constituents) that
constitute a well-formed bracketing in a sentence (that is, phrases do
not overlap, though they admit embedding). Each tag is of the form STARTS*ENDS, and represents phrases
that start and end at the corresponding word. A phrase of type k places a (k parenthesis at the STARTS part of the first word, and
a ) parenthesis at the END part of the last word.
Scripts will be provided to transform a column in Start-End format into
other standard formats (IOB1, IOB2, WSJ trees). The Start-End format
used last year (that considered the phrase type in the start and end
parts) will be compatible with the current software and scripts.
The different annotations in a sentence are grouped in the following
- WORDS. The words of the sentence.
- NE. Named Entities.
- POS. PoS tags.
- PARTIAL SYNT. Partial syntax, namely chunks (1st column) and
clauses (2nd column).
- FULL SYNT. Full syntactic tree. Note that this column represents
the following WSJ tree:
(NP (DT The)
(QP ($ $) (CD 1.4) (CD billion) ))
(NN robot) (NN spacecraft) )
(VP (VBZ faces)
(NP (DT a) (JJ six-year) (NN journey)
(VP (TO to)
(VP (VB explore)
(NP (NNP Jupiter) )
(NP (PRP$ its) (CD 16) (JJ known) (NNS moons) )))))))
(. .) )
- VS. VerbNet sense of target verbs. These are hand-crafted
annotations that will be available only for training and development
sets (not for the test set).
- TARGETS. The target verbs of the sentence, in infinitive form.
- PROPS. For each target verb, a column reprenting the arguments of
the target verb.
Some notes on Propositions and Arguments
The data includes the following types of arguments:
- A0 .. A5 : arguments
associated with a verb predicate, defined in the PropBank Frames
- AM-T : adjunctive
arguments of various sorts, where T
is the type of the adjunct. Types include locative, temporal, manner,
: causative agents.
: the verb of the proposition.
: a reference to some other argument of A* type. See examples of references in arguments.
: a continuation phrase of a previously started argument. See examples of discontinuous arguments.
Although we represent the arguments of each proposition in a format
which allows embedding, no embedding is observed in arguments of a
proposition governed by a verb.
Some arguments of a proposition can appear in a sentence split into
many discontiguous phrases. In this case, each phrase of an argument of
type k is represented as a
phrase in Start-End format: the first phrase appears with label k, and the remaining chunks appear
with label "C-k"
(Continuation). For a system to correctly recognize a discontinuous
argument, all and only its phrases have to be correctly recognized.
Last update: January 28, 2005. Xavier Carreras, Lluís Màrquez.