Master in Data Science (MDS)

Mining Unstructured Data - Spring 2024

Instructors


Sessions

Date Content Lecturer
Week 1 February 12nd Introduction to MUD
Jordi
February 13th Lab Session 1
Introduction to laboratory
Task data & other stuff
Carlos / Bardia
Week 2 February 19th Document structure and language
Jordi
February 20th Lab Session 2
Guides and code for task 1 (Document structure and language)
Carlos / Bardia
Week 3 February 26th Words: PoS Tagging
Jordi
February 27th Lab Session 3
Carlos / Bardia
Week 4 March 4th Words: Lexical Semantics Jordi
March 5th Lab Session 4
Exercise of WordNet similarities
Link to the colab code
Carlos / Bardia
Week 5 March 11th Word Sequence: Named Entities and Noun Phrases
Exercises about features for word sequence recognition
Jordi
March 12th Lab Session 5
Guidelines for task 2 (ML-based NERC)
Code for task 2
Carlos / Bardia
Week 6 March 18th Sentence: Constituent Parsing
Exercises about constituent parsing
Jordi
March 19th Lab Session 6
Carlos / Bardia
March 25th and 26th No class (Bank Holidays)
Week 7 April 1st No class (Bank Holidays)
April 2nd Lab Session 7
Carlos / Bardia
April 8th and 9th No class - FIB midterm exam week
(No MUD midterm exam, though)
Week 8 April 15th Sentence: Dependency Parsing
Exercises about dependency parsing
Jordi
April 16th Lab Session 8
Guidelines and code for task 3 (ML-based DDI)
Code for task 3
Carlos / Bardia
Week 9 April 22nd Discussion/Review/Exercises
Jordi
April 23th Lab Session 9
Carlos / Bardia
Week 10 April 29th Word Embeddings
Exercises about Word Embeddings
Salvador
April 30th Lab Session 10
Guidelines and code for task 4 (NN-based NER)
Carlos / Bardia
Week 11 May 6th Word Classification
Exercises about Word Classification
Salvador
May 7th Lab Session 11
Carlos / Bardia
Week 12 May 13th Recurrent NN Language Models
Exercises about RNNs
Salvador
May 14th Lab Session 12
Guidelines and code for task 5 (NN-based DDI)
Carlos / Bardia
Week 13 May 20th No class (Local Holidays). Moved to the 21th
May 21st Transformers
Exercises about transformers
Salvador
Week 14 May 27th Large language models Salvador
May 28th Lab Session 13
Carlos / Bardia

Important dates

March 4th: Delivery of Doc. structure and language report (lab task 1)
April 29th: Delivery of ML-based report (lab tasks 2 and 3)
June 11st: Delivery of NN-based report (lab tasks 4 and 5)
June 17th: Final exam

Solved Exercises

Exercises about Features for Word Sequence Recognition
Exercises about Constituent Parsing
Exercises about Dependency Parsing
Exercises about Word Embeddings
Exercises about Word Classification
Exercises about RNNs

Resources

Complementary Readings

Software