SVMTool


An Open Source generator of sequential taggers.

  • SVMTool
  • Documentation
  • Download
  • Development
  • References
  • On-line Demo

SVMTool

Here you can find information about the SVMTool, an open source generator of sequential taggers. The SVMTool has been developed at TALP Research Center NLP group , in Universitat Politècnica de Catalunya.

The SVMTool is a simple and effective generator of sequential taggers based on Support Vector Machine. We have applied the SVMTool to a number of NLP problems, such as Part-of-speech Tagging and Base Phrase Chunking, for different languages. The proposed SVM-based tagger is robust and flexible for feature modelling (including lexicalization), trains efficiently with almost no parameters to tune, and is able to tag thousands of words per second, which makes it really practical for real NLP applications. Regarding accuracy, the SVM-based tagger achieves a very competitive accuracy of 97.2% for English on the Wall Street Journal corpus, which is comparable to the best taggers reported up to date.

The SVMlight software implementation of Vapnik's Support Vector Machine [Vapnik, 1995] by Thorsten Joachims has been used to train the models. For further information on it see here or visit http://svmlight.joachims.org/

Through this web site you will be able to download the SVMTool software. You can also download several models to tag in different languages and models to deal with noisy and ungrammatical texts as those studied in the FAUST project.

Download

    Application

  • SVMTool v 1.3.2(Perl)  [includes more options for robust tagging]
  • SVMTool v 1.3.1(Perl)  [works with Perl v5.10.0]
  • SVMTool v 1.3 (Perl)  [includes 'multiple-column' features]
  • SVMTool++ v 1.1.4 (C++) [includes SVMTlearn]
  • SVMTool++ v 1.0 (C++)
  • SVMTool v 1.2.2 (Perl) 
  • SVMTool v 1.2.1 (Perl) 
  • Models for PoS Tagging

  • models for Catalan [3LB] (based on the 3LB corpus)
  • models for English [WSJ] (based on the Wall Street Journal corpus)
  • models for Robust English (based on the Wall Street Journal corpus and on the FAUST data)
  • models for Robust English using a distance module (based on the Wall Street Journal corpus and on the FAUST data with a modified dictionary)
  • models for Spanish [LEXESP] (based on the LEXESP corpus)

Development

Latest development versions may be downloaded through subversion.

  • SVMTool:
    • svn co http://svn-rdlab.lsi.upc.edu/subversion/svmtool/public svmtool
  • SVMTool++:
    • svn co http://svn-rdlab.lsi.upc.edu/subversion/svmtool++/public svmtool++

Everyone is allowed to check out (User: reader, Password: reader).

SVMTool is also open to public contribution. If you feel like helping us in the development, please, e-mail us at so we grant you the necessary permissions.

SVMTool Discussion Group

Discussion on features and bugs of this software as well as information about oncoming updates takes place on the SVMTool group, to which you can subscribe at:
http://groups.google.es/group/svmt

and post messages at:
SVMT at googlegroups.com

Contributing

The SVMTool library is released under the GNU Lesser General Public License (LGPL) of the Free Software Foundation. This means that it may be linked to and used by commercial software packages. But the license also enforces that any changes or improvements made to the library (and in this case also to the morphological data) must be redistributed under LGPL terms.

Thus, if you improve the software or data, either adding new functionalities, fixing bugs, or building sequential taggers on different data, you can not distribute them under different conditions than those stated in the license (i.e. freely and with no usage restrictions).

If you want that your changes and improvements become useful to many other people using this free software, please contact us ( ).

Documentation

This report describes the installation and usage of the SVMTool

  • SVMTool v 1.4 Technical Manual   [.ps] [.pdf]

References

Please reference this tool in your academic works citing the following paper:

  • Jesús Giménez and Lluís Márquez. SVMTool: A general POS tagger generator based on Support Vector Machines. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal. 2004.       [.ps]  [.pdf]  [slides]


  • @inproceedings{Gimenez-Marquez-2004,
                author = {Jes\'{u}s Gim\'{e}nez and Llu\'{i}s M\`{a}rquez},
                title = {SVMTool: A general POS tagger generator based on Support Vector Machines},
                booktitle = {Proceedings of the 4th LREC},
                year = {2004}
                address = {Lisbon, Portugal}
    }

On-line Access



[Go to the SVMTool on-line Demo...]

Sponsored by:

The original versions of the tool were partially supported by the Spanish Ministry of Science and Technology (HERMES TIC2000-0335-C03-02, ALIADO TIC2002- 04447-C02) and by the European Comission (LC-STAR IST-2001-32216).