Ta witryna wykorzystuje pliki cookie, dowiedz się więcej Zgadzam się
warning Do prawidłowego działania strony wymagany jest włączony JavaScript.

Text mining for predictive analytics

Course Description

Expected knowledge and competences:

    • Textual data representation and adaptation. Crawling, data preparation, tokenization, stemming, XML formats, term-document matrices etc.
    • Naive Bayes Classification and its applications.  Other classification methods: k-NN, logistic regression.
    • Information Retrieval and search
    • Examples of application and their architecture.
    • Familiarity with one NLP tool (NLTK or OpenNLP) and Solr

Description of the Course
The lecture will cover fundamentals of predictive text mining. It will be balanced between presenting key concepts of language processing, such as language modeling or searching,  and one or two tools for each concept, for example, NLTK (natural language toolkit)  and Solr (most popular enterprise search engine platform). The lectures will conclude with presentation of emerging technologies such as the IBM Watson computer.
Course Subjects

Subject number Subject
1 Introduction to text mining
2 Overview of text mining tools
3 Models of text corpora, models of documents, models of sentences. Hierarchical model of language
3 Classification: Naïve Bayes, k-NN, rule based
4 Introduction to information retrieval and search Applications
5 Search tools: Solr, Lucene, Indri, Luke
Data preparation, indexing, retrieval
6 Entity recognition: recognizing people, places and relations.
Maximum entropy/logistic regression scoring
7 Exploring text corpora: Clustering and Topic Models.
Statistical techniques: Expectation Maximization and LDA
8 IBM Watson: Architecture for question answering
9 Advanced topics: semantics and discourse

Course structure

The Course consists of following elements:

  1. Course Sylabus.
  2. Lectures in DVD format
  3. Powerpoint presentations.
  4. Final test.

Final remarks

This is the self learning course. After learning the course material and performing the final test the student will have the right to get the certificate confirming taking part in the course “Text mining for predictive analytics” prepare by Warsaw School of Computer Science.

Course performance certificate




Prof. Wlodek W. Zadrozny, Associate Professor, Department of Computer Science, University of North Carolina, Charlotte, USA. (Formerly, Senior Researcher and Manager at IBM T.J. Watson Research Center), Professor in Warsaw School of Computer Science

UDP   Eskills