Master in Data Science (MDS)

Mining Unstructured Data - Spring 2025

Instructors


Sessions

Date Content Lecturer
Week 1 February 10th Introduction to MUD
Jordi
February 11th Lab Session 1
Introduction to laboratory
Task data & other stuff
Carlos
Week 2 February 17th Document structure and language
Jordi
February 18th Lab Session 2
Guides and code for task 1 (Document structure and language)
Carlos
Week 3 February 24th Words: PoS Tagging
Jordi
February 25th Lab Session 3
Carlos
Week 4 March 3th Words: Lexical Semantics Jordi
March 4th Lab Session 4
Exercise of WordNet similarities
Link to the colab code
Carlos
Week 5 March 10th Distributional Embeddings
+ Exercises about Distributional Embeddings
Salvador
March 11th Lab Session 5
Link to the colab code
Carlos
Week 6 March 17th Word Sequence: Named Entities and Noun Phrases
+ Exercises about features for word sequence recognition
Jordi
March 18th Lab Session 6
Guidelines for task 2 (ML-based NERC)
Code for task 2
Carlos
Week 7 March 24th Word sequence: Distributional embeddings for word classification
+ Exercises about Word Classification
Salvador
March 25th Lab Session 7
Guidelines and code for task 3 (NN-based NERC)
+ Colab link
Carlos
Week 8 March 31th Sentence: Constituent Parsing
+ Exercises about constituent parsing
Jordi
April 1st Lab Session 8
Carlos
April 7th and 8th No class (FIB midterm exams)
April 14th and 15th No class (Bank Holidays)
Week 9 April 21st No class (Bank Holidays)
April 22nd Lab Session 9
Carlos
Week 10 April 28th MOVED TO NEXT SESSION Jordi
April 29th Lab Session 10
Guidelines and code for task 4 (ML-based DDI)
Code for task 4
Carlos
Week 11 May 5th Sentence: Dependency Parsing
+ Exercises about dependency parsing
Jordi
May 6th Lab Session 11
Carlos
Week 12 May 12th Contextual embeddings: Recurrent NN Language Models
+ Exercises about RNNs
Salvador
May 13th Lab Session 12
Guidelines and code for task 5 (NN-based DDI)
+ Colab link
Carlos
Week 13 May 20th Contextual embeddings: Transformers
+ Exercises about transformers
Salvador
May 21st Lab Session 13 Carlos
Week 14 May 26th Large language models Salvador
May 27th Lab Session 14
Carlos

Important dates

March 3th: Delivery of Doc. structure and language report (lab task 1)
April 28th: Delivery of NERC report (lab tasks 2 and 3)
May 27th: Delivery of DDI report (lab tasks 4 and 5)
June 2nd: Final exam

Solved Exercises

Exercises about Features for Word Sequence Recognition
Exercises about Word Embeddings
Exercises about Word Classification
Exercises about Constituent Parsing
Exercises about Dependency Parsing
Exercises about RNNs
Exercises about Transformers

Resources

Complementary Readings

Software