Expected knowledge and competences:
- Textual data representation and adaptation. Crawling, data preparation, tokenization, stemming, XML formats, term-document matrices etc.
- Naive Bayes Classification and its applications. Other classification methods: k-NN, logistic regression.
- Information Retrieval and search
- Examples of application and their architecture.
- Familiarity with one NLP tool (NLTK or OpenNLP) and Solr
Description of the Course
The lecture will cover fundamentals of predictive text mining. It will be balanced between presenting key concepts of language processing, such as language modeling or searching, and one or two tools for each concept, for example, NLTK (natural language toolkit) and Solr (most popular enterprise search engine platform). The lectures will conclude with presentation of emerging technologies such as the IBM Watson computer.
||Introduction to text mining
||Overview of text mining tools
||Models of text corpora, models of documents, models of sentences. Hierarchical model of language
||Classification: Naïve Bayes, k-NN, rule based
||Introduction to information retrieval and search
||Search tools: Solr, Lucene, Indri, Luke
Data preparation, indexing, retrieval
||Entity recognition: recognizing people, places and relations.
Maximum entropy/logistic regression scoring
||Exploring text corpora: Clustering and Topic Models.
Statistical techniques: Expectation Maximization and LDA
||IBM Watson: Architecture for question answering
||Advanced topics: semantics and discourse
The Course consists of following elements:
- Course Sylabus.
- Lectures in DVD format
- Powerpoint presentations.
- Final test.
This is the self learning course. After learning the course material and performing the final test the student will have the right to get the certificate confirming taking part in the course “Text mining for predictive analytics” prepare by Warsaw School of Computer Science.
Course performance certificate
Prof. Wlodek W. Zadrozny, Associate Professor, Department of Computer Science, University of North Carolina, Charlotte, USA. (Formerly, Senior Researcher and Manager at IBM T.J. Watson Research Center), Professor in Warsaw School of Computer Science