Have you ever heard about Big Data? Technologies such as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don't miss this specialization either!

About Specialization

There are five concise courses teaching you from basics of MapReduce and offline processing to the advances in real-time processing with applications in big scale machine learning. If you are looking for the ways to expand your knowledge in the field of data engineering / analysis on the big scale, then this specialization is just right up your street.
Real Dev Environment
In comparison to existing online courses we would like to provide listeners with access to real computational cluster to expand their skills and knowledge.
All-Round Team
Big data evolves so fast, that nobody can move with the same speed. Nobody can know everything about Big Data. For this reason we put together our big and all-round team. All together we can explain everything.
Good Support
Our mentors are always ready to answer your questions. You can write us at the weekends and at night.

What should I know?

A student needs to have programming experience in Python. A student should also need to complete a basic course on Algorithms prior to starting this specialization. Introductory course on Machine Learning is not required, but it will be helpful to grasp material of the third course (Machine Learning over Big Data) easier.

Big Data Essentials: HDFS, MapReduce and Spark RDD
You have heard so much about HDFS, MapReduce, Spark and always wanted become familiar with these knew tools? Spot the opportunity to grasp the concise starting material. Why taking this course?
Big Data Essentials: HDFS, MapReduce and Spark RDD

  • You will learn basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce and Spark
  • Attentive guidance of experienced tutors will help you to feel at ease working with internal systems and their applications
  • You will get to know distributed file systems, why they exist and what function they serve
  • This course provides the opportunity to harness the workhorse of many modern Big Data applications – MapReduce framework and apply it to texts processing and sample business cases solving
  • This course is a wonderful opportunity to learn about the next-generation computational framework, Spark. You'll manage to build strong understanding of Spark basic concepts as well
  • All the above mentioned skills and tools will be of great help for you when creating solutions in finance, social networks, telecommunications and many other fields
The assignments based and checked on a real cluster will make your studying closer to real life. A friendly considerate atmosphere will make the process of your learning smooth and enjoyable. Get ready to work with real datasets alongside with real masters!
Go to the course!

Practical Big Data Analysis
Working with huge data volumes is hard, but to move a mountain, you have to deal with a lot of small stones. So, why strain yourself? Leave this job to Mapreduce and Spark and don't forget about high-level tools we're offering you. Tired of making your big data workflow productive and efficient? Don't rack your brains! See what this course is offering you:
  • Writing and executing Hive & Spark SQL queries
  • Reasoning how the queries are translated into actual execution primitives (be it MapReduce jobs or Spark transformations)
  • Organizing your data in Hive to optimize disk space usage and execution times
  • Constructing Spark DataFrames and using them to write ad-hoc analytical jobs easily
  • Processing large graphs with Spark GraphFrames
  • Debugging, profiling and optimizing Spark application performance
Still in doubt? Check this out. Become a data ninja by taking this course!
Go to the course!
Machine Learning over Big Data
Rapidly changing environment sets new rules and requires new knowledge. In order to be in the forefront you have to upgrade your skills regularly. Machine learning is an essential part of the knowledge that helps to maintain cornerstone in the contemporary world – data. Don't know where to start? The answer is one button away. Click it and you will learn how to:
  • Identify practical problems which can be solved with machine learning
  • Build, tune and apply linear models with Spark MLLib
  • Understand methods of text processing
  • Fit decision trees and boost them with ensemble learning
  • Construct your own recommender system
  • With these skills, you will be able to tackle many practical machine learning tasks. We provide the tools, you choose the place of application to make this world of machines more intelligent
Go to the course!
Real-Time Big Data Processing
The question of freshness is the thorniest in the data world. Miss the beat forecasting traffic jams and you get nothing but angry customers. The price for the same delay predicting tsunami is even higher - people's lives.
  • The first half of the course is devoted to streaming frameworks that allow building continuous data applications that process massive amounts of data within seconds. You will learn techniques, technologies and algorithms suitable for building near real-time applications.
  • The second part of the course will show you how to integrate these technologies into your applications. You will explore common strategies for improving data freshness in web applications (by building your own sample application on Flask). A bonus feature, NoSQL databases, how to work with them and when to use them instead of traditional RDBMS systems.
Are you ready for a journey full of knowledge and surprises?
Jump in!
Capstone Project / Culminating Project
Are you ready to close the loop on your Big Data skills and apply all the knowledge you got from the previous courses in the practice? Capstone project is the answer to your questions.
You will be given a task to combine data from different sources of different types (static distributed dataset, streaming data, SQL or NoSQL storage). Combined, this data will be used to build a predictive model for a financial market (as an example). First, you design a system from scratch and share it with your peers to get valuable feedback. Second, you can make it public, so get ready to receive your service customers' feedback. Real-world experience without any 3D-glasses or mock interview.
Go to the course!

Meet our team
Alexey Dral
Head of BigData specialization
Head of BigData and ML / Senior Lecturer at MIPT
(ex. Amazon AWS, Yandex, Rambler, Sberbank)
Emeli Dral
Senior Lecturer at MIPT, head of Big Data Department at Yandex Data Factory
(ex. Rambler)
Pavel Mezentsev
Senior Lecturer at MIPT, Senior Data Scientist at PulsePoint
(ex. Rambler)
Natasha Pritykovskaya
Software Developer at Odnoklassniki (Mail.ru)
(ex. Yandex)
Pavel Klemenov
Head of Machine Learning department at NVIDEA, Founder of Moscow Spark
Ilya Trofimov
Assistant Professor at MIPT, head of research team at Yandex
Ivan Puzyrevskiy
Team lead of software development team at Yandex
Evegeniy Ryabenko
Guest Speaker
Senior Data Scientist, Veon
Evgeny Frolov
Guest Speaker
PhD student, Skoltech
Vladimir Lesnichenko
Guest Speaker
R&D team lead, Iponwe
Anton Gorokhov
Course Designer
PhD, Associate Professor at MIPT, Senior SDE at Yandex
(ex. Rambler)
Oleg Suhoroslov
Course Designer
PhD, Senior Researcher at Russian Academy of Sciences (RAS)
(ex. expert at Yandex ~ Senior Principal)
Pavel Akhtyamov
Course Designer
Teacher assistant at ATP dept., MIPT
Developer-Analyst at VicMan
Oleg Ivchenko
Course Designer
Teacher assistant at ATP dept.,
MIPT PhD student
Vladimir Kuznetsov
Course Designer
Assistant at P.G. Demidov Yaroslavl State University,
Senior SDE at Confirmit
Asya Roitberg
Producer assistant
Prof.Mikhail Roytberg
Marina Sudarikova
Producer assistant
Evgene Baulin
Assistant at ATP dept., MIP
Artym Vybornov
Big Data Instructor
Lead Software Engineer, Rambler&Co
Co-author "Big Data for Data Engineers" (Coursera)

Why Big Data
Issues BigData can sol

Reviews from Coursera
There were no such other course online as far as I know. You can find many courses on machine learning or other topics of data analysis, but on the side of engineering you were along so far. With this course you will learn basics of Hadoop, HDFS, MapReduce and Spark. You will work on many practical assignments that will let you understand those technologies. If you are willing to move into direction of data engineering being software developer, or if you want to learn some engineering topics being a data analyst, this course is for you!
Excellent class to learn the basics of MapReduce, Hadoop, and PySpark. The lectures are very informative. They move at a strong pace, making this class more like a graduate level class in lecture style.

The programming problems are well designed to learn these languages.

The only downside is that code submission can be a bit of an adventure. It's not always clear exactly what the auto-grader is looking for. Aside from this issue, I would recommend this class for people interested in the material.

This course really takes you deep into Hadoop with the technical stuff.
Lectures are very good and I learned a lot.