Why do we call it decoding?
Kevin Knight Information Sciences Institute, University of Southern California
The first natural language processing systems had a straightforward goal -decipher coded messages sent by the enemy. Sixty years later, we have many more applications! These include web search, question answering, summarization, speech recognition, and language translation. This talk explores connections between early decipherment research and today's work. We find that many ideas from the earlier era have become core to the field, while others still remain to be picked up and developed.
Unsupervised feature learning and Deep Learning
Andrew Ng Computer Science Department, Stanford University
Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time laboriously hand-engineering the input feature representation. This is often true for learning in NLP, vision, audio, and many other problems. To address this, recently in machine learning there has been significant interest in unsupervised feature learning algorithms, including "deep learning" algorithms, that can automatically learn rich feature representations from unlabeled data. These algorithms build on such ideas as sparse coding, ICA, and deep belief networks, and have proved very effective for learning good feature representations in many problems. Since these algorithms mostly learn from unlabeled data, they also have the potential to learn from vastly increased amounts of data (since unlabeled data is cheap), and therefore perhaps also achieving vastly improved performance. In this talk, I'll survey the key ideas in this nascent area of unsupervised feature learning and deep learning. I'll outline a few algorithms, and describe a few successful applications of these ideas to problems in NLP, audio/speech, vision, and other problems.
Challenges in running a commercial search engine
Amit Singhal Google, Inc.
These are exciting times for Information Retrieval and NLP. Web search engines have brought IR to the masses. It now affects the lives of hundreds of millions of people, and growing, as Internet search companies launch ever more products based on techniques developed in IR and NLP research.The real world poses unique challenges for search algorithms. They operate at unprecedented scales, and over a wide diversity of information. In addition, we have entered an unprecedented world of "Adversarial Information Retrieval". The lure of billions of dollars of commerce, guided by search engines, motivates all kinds of people to try all kinds of tricks to get their sites to the top of the search results. What techniques do people use to defeat IR algorithms? What are the evaluation challenges for a web search engine? How much impact has IR had on search engines? How does Google serve over 250 Million queries a day, often with sub-second response times? This talk will show that the world of algorithm and system design for commercial search engines can be described by two of Murphy's Laws: a) If anything can go wrong, it will, and b) If anything cannot go wrong, it will anyway.