Introduction to Natural Language Processing (INLP)



Course 2016-2017 . First semester

Midterm exam: November

New: Next session (27 of October) will last only 30 minutes

Lecturers

Main teacher. Theory classes : Marta Gatius Vila

Problems and laboratory classes: Horacio Rodríguez Hontoria

Attending timetable

Marta Gatius: Wednesdays from 11 to 13. Omega Building. Office 218

email:gatiusatcs.upc.edu

Horacio Rodríguez : Omega Building. Office 316

email: horacioatcs.upc.edu



Brief description of the contents

This course is an introduction to most relevant problems involved in Natural Language Processing (NLP), the most relevant techniques and resources used and the theories they are based on. The course includes an overview of Natural Language applications.

The course is focused on the two most relevant approaches to NLP: knowledge based and empirical ( both statistical and machine learning).

Main goals




The contents

OUTLINE

1. Introduction to Natural Language Processing

2. Applications.

3. Language models.

4. Basic levels of lingusitic description.

5. Syntactic processing.

6. Semantic and pragmatic processing.

7. Generation

1. Introduction to Natural Language Processing

2. Applications

3. Language models

4. Basic levels of linguistic description

5. Syntactic processing

6. Semantic and pragmatic processing



7. Generation



Methodology

There are three types of sessions: theory, exercise and laboratory.

In the theory sessions we will introduce new concepts together with the challenges they present and the approaches to face them.

In the exercises sessions we will work on the concepts, techniques and algorithms introduced in the theory sessions.

In the laboratory sessions small practices will be developed using the appropriate NLP tools to practice and reinforce the knowledge of the theory sessions.



Assessment

There will be two exams: a mid-term exam, that worths 15% of the final grade and an end-of-term exam, that worths 45%.

Assignments done by the student during the course worth 40% of the final grade.

The end-of-term exam will include all the course contents. For those students failing (or not presenting) the mid-term exam, the end-of-term exam would worth the 60% of their final grade.

In particular, the final grade of the course would be calculated as follows:

Course grade = maximum ( mid-term exam grade*0.15 + end-of-term exam grade* 0.45, end-of-term exam grade* 0.6) + assignments grade *0.4

Frequent questions about the course grading (and their corresponding answers).

Basic bibliography

[1] R. Dale, H. Moisl, H.Somers, ed. (2000) Handbook of Natural Language Processing, Marcel Dekker, New York, 2000.

[2] D. Jurafsky, James H. Martin (2008) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, Upper Saddle River, N.J. , 2008. 

[3] C. Manning, H. Schütze (1999) Foundations of Statistical Natural Language Processing, MIT Press Cambridge, Mass., 1999.

[4] R. Mitkov (ed) (2004) The Oxford handbook of Computational Linguistics, Oxford University Press, 2004.

[5] Noah A. Smith (2011)  Linguistic Structure Prediction,  Morgan & Claypool Publishers,  Synthesis Lectures on Human Language Technologies May 2011, Vol. 4, No. 2

More bibliography

o    [6] Allen, J. Natural Language Understanding, Benjamin/CummingsPublishing Company , 1995.

o    [7] NLTK tutorials,  http://nltk.sourceforge.net/lite/doc/en/

o    [8] Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003.

o    [9] Peter Jackson, Isabelle Moulinier, Natural Language Processing for Online Applications: Text retrieval, Extraction, and Categorization, John Benjamins, 2007 (2nd edition).

o    [10] Andras Kornai, Mathematical Linguistics, Springer Verlag 2008.

o   [11] Robert B. Kaplan (ed) The Oxford Handbook of Applied Linguistics Second Edition Edited by OUP USA Oxford Handbooks in Linguistics, 2010.

Web links

NLTK, Natural Language Toolkit

http://www.nltk.org/

Association of Computational Linguistics ACL

http://www.aclweb.org/

ACL Anthology

http://aclweb.org/anthology-new/

Python

http://python.org/

The Python Papers Anthology

http://pythonpapers.org/

Information Society Technology IST

http://cordis.europa.eu/ist/

Sociedad Española para el procesamiento del lenguaje natural SEPLN

http://www.sepln.org

Oficina del Español en la Sociedad de la Información OESI

http://www.cervantes.es/default.htm

TALP (UPC)

http://www.talp.upc.edu

Grup de PLN de la UPC

http://nlp.cs.upc.edu

OpenNLP

http://incubator.apache.org/opennlp/

Pàgina de recursos de NLP de l' Universitat d'Stanford

http://nlp.stanford.edu/links/statnlp.html

Mallet,  toolbox en Java, desarrollado por Andrew McCallum para NLP de tipo estadístico

http://mallet.cs.umass.edu/index.php/Main_Page

WEKA, paquete integrado de Machine Learning

http://www.cs.waikato.ac.nz/ml/weka/

Lingpipe

http://alias-i.com/lingpipe/

Additional material

 

SEPLN documentation (accessible at http://www.sepln.org/category/monografia/):

 

1

2001

Luis Alfonso Ureña López 
Resolución de la Ambigüedad Léxica en Tareas de Clasificación Automática de Documentos.

2

2002

Jose Luis Vicedo González 
Recuperación de información de alta precisión: los sistemas de búsqueda de respuestas.

3

2003

Montserrat Civit Torruella 
Criterios de etiquetación y desambiguación morfosintáctica de corpus en español.

4

2004

Anselmo Peñas Padilla
Técnicas lingüísticas aplicadas a la búsqueda textual multilingüe. Ambigüedad, variación terminológica y multilingüismo.

5

2005

Iulia Nica 
El conocimiento lingüístico en la desambiguación semántica automática. 

6

2007

David Martínez Iraolak
Supervised Word Sense Disambiguation: facing Current Challenges

7

2008

Enrique Amigó
Síntesis de Información: Desarrollo y evaluación de un modelo interactivo

8

2009

Jesús Ángel Giménez Linares
Empirical Machine Translation and its Evaluation

9

2010

Miguel Ángel García Cumbreras
BRUJA: Un sistema de Búsqueda de Respuestas Multilingüe

10

2011

Isabel Segura Bedmar
Application of Information Extraction techniques to pharmalogical domain: Extracting drug-drug interactions

11

2012

Fermín L. Cruz Mata
Extracción de Opiniones sobre Características: Un Enfoque Práctico Adaptable al Dominio

12

2013

F. Javier ORtega Rodriguez
Detection of Dishonest Behaviours in On-Line Networks Using Graph-based Ranking Techniques

 

 

Interesting courses on natural language processing:


         MIT (Michael Collins) MIT

         Toronto (Gerard Penn) Toronto

         Johns Hopkins (Jason Eisner) JHU

         Massachussetts Amherst (Andrew  McCallum) UMass


 

List of related courses (by Steven Bird):

http://en.wikipedia.org/wiki/User:Stevenbird/List_of_NLP_Courses

Organitzation

session

T/P/L

data

content

material

Recommended readings

1

T

15/09/16

Introduction

Applications of NLP.(I)

Introduction

Applications1

Part III from [4], Chapters 9 to 14 from [1], [9]

2

P/L

15/09/16

Introduction



3

T

22/09/16

Applications of NLP(II) Interfaces

Statistical Models of Language.

Applications 2



Language Models


Part III from [4], Chapters 9 to 14 from [1], [9]

Chapter 4 from [2], [3]

4

P/L

22/09/16




5

T

29/09/16

Lexical Processing.

Finite State Models

Lexical(pdf)



Finite State Machines

Chapter 10 from [4]



Chapter 21 from [4]

Chapter 2 from [2]

Chapter 18 from [4]

6

P

29/09/16


 

 

7

T

06/10/16

Morphology.

Introduction morphology

Morphology(I)

Morfo(II)

Chapter 2 from [4]

Chapter 3 from [2][8]

8

P/L

06/10/16




9

T

13/10/16

Tagging

 

Introduction POS tagging

Tagging

Chapters 5 and 6 from [2]




10

P/L

13/10/16


 

 

11

T

20/10/16

Hidden Markov models.

Syntax


Hidden Markov Model

Syntax


12

P/L

20/10/16




13

T

27/10/16

Syntactic parsing


 Parsing1

Parsing2

Parsing3

Chapter 4 from [1]

Chapters 12-13 from [2]

Chapter 3 from [3]

Chapter 4 from [4]

14

P/L

27/10/16




15

T

03/11/16

Midterm Exam



16

L

03/11/16








17

T

10/11/16

Statistical parsing

Parsing4

Parsing 5




Chapter 22 from [1]

Chapter 14 from [2]

18

P/L

10/11/16

 

 

 

21

T

17/11/16

Parsing

Problems

Summary Parsing







22

P/L

17/11/16




23

T

24/11/16

Semantics

Problems

Semantics 1


Chapter 17 from [2]




24

P/L

24/11/16




25

T

01/12/16

Semantics

Problems



Semantics 2


26

P/L

01/12/16

 

 

 

27

T

15/12/16

Discourse and pragmatics. Problems

Pragmatics and discourse


Chapter 21 from [2]

28

p/l

15/12/16




29

T

22/12/16

Generation

Problems

Generation





30

p/L

22/12/16






List of assigments to be done

Examples of grammars

Mid-term exam (solved). Course 2014-2015

Mid-term exam (solved). Course 2013-2014

PARTIAL exam 2012

Solution of a 2011 midterm exam

Final exam 2012

Solution to the FINAL exam 2012

Final exam 2014

Solution to the final exam 2014

Solution to the final Exam 2105

Examples of other FINAL exams

Other Examples of previous mid-term exams

Barcelona, november 2016