Expected knowledge and competences:
Description of the Course
The lecture will cover fundamentals of predictive text mining. It will be balanced between presenting key concepts of language processing, such as language modeling or searching, and one or two tools for each concept, for example, NLTK (natural language toolkit) and Solr (most popular enterprise search engine platform). The lectures will conclude with presentation of emerging technologies such as the IBM Watson computer.
Course Subjects
Subject number | Subject |
1 | Introduction to text mining |
2 | Overview of text mining tools |
3 | Models of text corpora, models of documents, models of sentences. Hierarchical model of language |
3 | Classification: Naïve Bayes, k-NN, rule based |
4 | Introduction to information retrieval and search Applications |
5 | Search tools: Solr, Lucene, Indri, Luke Data preparation, indexing, retrieval |
6 | Entity recognition: recognizing people, places and relations. Maximum entropy/logistic regression scoring |
7 | Exploring text corpora: Clustering and Topic Models. Statistical techniques: Expectation Maximization and LDA |
8 | IBM Watson: Architecture for question answering |
9 | Advanced topics: semantics and discourse |
Course structure
The Course consists of following elements:
Final remarks
This is the self learning course. After learning the course material and performing the final test the student will have the right to get the certificate confirming taking part in the course “Text mining for predictive analytics” prepare by Warsaw School of Computer Science.
Course performance certificate
Lecturer
Prof. Wlodek W. Zadrozny, Associate Professor, Department of Computer Science, University of North Carolina, Charlotte, USA. (Formerly, Senior Researcher and Manager at IBM T.J. Watson Research Center), Professor in Warsaw School of Computer Science