SIKS/BigGrid Advanced Course on Big Data


SIKS/BigGrid Advanced Course on Big Data

The School for Information and Knowledge Systems (SIKS) and the national High Performance Computing and e-Science Support Center SARA organize a new two-day tutorial on "Big Data" at the University of Twente. The tutorial is on top of some exciting new developments in cloud computing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, Microsoft, and Facebook. The course discusses processing terabytes of data on large clusters, and discusses several core computer science topics adapted for large data centers, such as new file systems (Google File System and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new "noSQL" databases (BigTable,Cassandra and Dynamo).

Dr. Jimmy Lin, who holds a PhD from MIT, is associate professor in the iSchool at the University of Maryland. He also has appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science at Maryland. Lin works at the intersection of natural language processing (NLP) and information retrieval (IR), with a recent emphasis on scalable algorithm design and large-data issues. He directs the recently-formed Cloud Computing Center, an interdisciplinary group which explores the many aspects of cloud computing as it impacts technology, people, and society. He is also a member of both the Computational Linguistics and Information Processing Lab (CLIP) and the Human-Computer Interaction Lab (HCIL). Lin worked on Cloudera, which aims to bring Hadoop MapReduce to the enterprise, and is currently spending a sabbatical at Twitter.

A major part of the tutorial consists of hand-on experience. Students will solve real large-scale data analysis problems of their choice on a cluster of machines. Students will get access to the SARA Hadoop test cluster, providing 20 cores for MapReduce and 100TB diskspace for HDFS. Students are encouraged to bring their own data, and present their results at the end of the second day. The organization will provide several public datasets, such as Wikipedia, the ENRON dataset, White House visitor records, Genome data, the ClueWeb09 web crawl, and more.

The tutorial will be given in English and is part of the educational program for SIKS-Ph.D. students. Although the course is primarily intended for SIKS-Ph.D. students, other participants are not excluded. However, their number will be restricted and depends on the number of SIKS-Ph.D. students taking the course. The course can host a limited number of participants. SIKS students that register before November 1st, 2011 will be given priority

DATE:November 30 & December 1, 2011

Djoerd Hiemstra (UT)
Evert Lammerts (SARA)
Arjen de Vries (CWI/TUD)

PROGRAM: to be announced shortly

At this course there is a limited number of places and there is interest from other groups in the topic as well. Therefore, an early registration is required.

Deadline for registration for SIKS-Ph.D.-students: November 01 2011

After that date, applications to participate will be honoured in a first-come first-serve manner. Of course, applications to participate from other interested groups are welcome already. They will receive a notification whether they can participate as soon as possible.

For registration you are kindly requested to fill in the registration form

Arrangement 1 includes single room, all meals, and course material. Arrangement 2 includes two lunches, one dinner and course material. So no stay in the hotel and no breakfast.

Arrangement 1 only applies to fully registered SIKS-PhD-students and SIKS-research fellows. Other participants should make their own sleeping arrangements.