Instructor(s):

András Benczúr
Weeks
8-14
Contact hours
2x2 hours
Credit
2 credits

Short Description of the Course:
"What data scientists do is make discoveries while swimming in data", as described by the Harvard Business Review.  In the second part of the course, we learn advanced techniques including kernel methods, recommender systems, network centrality, in addition to getting introduced to Big Data tools such as Hadoop. During the course, we will have guest lectures by data scientists from companies in the Budapest area. Students will have the option to define their  data mining projects and work in teams during the semester.

Aim of the Course:
The aim of the course is to discuss advanced techniques of data mining with useful knowledge of related disciplines supporting real-world, especially bioinformatics data mining projects. By the end of the course, students will be able to analyze biological (genomic, microarray, pathway, protein, chemical) data sets using complex data mining methods.

Prerequisites:
The course requires basic knowledge in data mining. (See also the course Data Mining: Models and Algorithms) Background in probability theory, linear algebra and programming is important.

Detailed Program and Class Schedule:

  • Advanced classification methods: Bagging, boosting, AdaBoost.
  • More models and algorithms for classification: neural networks, linear separation methods, support vector machine (SVM).
  • Random forest.
  • Recommender systems. Collaborative filtering. Implicit and explicit recommendation.
  • Dimensionality reduction by spectral methods, singular value decomposition, low-rank approximation.
  • Search engines, web information retrieval, PageRank and network mining.
  • Distributed data processing systems, data processing with Hadoop.
  • Text mining, natural language processing.
  • Selected topics connected to student projects (e.g. Mining biological, scientific, social media data)
  • Final test.

Method of Instruction:
Handouts, presentations, IPython Notebooks, relevant research papers, web page, course mailing list and Wiki. Weekly regular office hour for consultations.

Textbooks:
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.

Jure Leskovec, Anand Rajaraman, Jeff Ullman: Mining of Massive Datasets

http://www.mmds.org/

Instructors' bio:

András Benczúr (born 1969) is a senior researcher of the Computer Science and Automation Institute of the Hungarian Academy of Science (MTA SZTAKI). He is co-founder of the Data Mining and Web Search Group and head of the Informatics Laboratory. He has been teaching Algorithms, and Web Information Retrieval at Eötvös Loránd University and Statistics at Central European University (CEU), Budapest. He received his Ph.D. degree at MIT, US in 1997. His primary research areas are information retrieval, data mining and algorithms. He has been awarded the “Young Researcher Award” and the “Béla Gyires Award” of the Hungarian Academy of Sciences. He won a “Yahoo! Faculty Research Grant” in 2006. Benczúr’s group won 1st place at the KDD Cup of the ACM in 1997. He is the author or co-author of more than 30 refereed research papers with over 200 citations. He has served as coordinator and/or principal researcher of several national and international information retrieval and data mining projects.