Online Courses and Tutorials

Onlinecourses.tech provides you with the latest online courses information by assisting over 45,000 courses and 1 million students.

Learn programming, marketing, data science and more.

Get started today

Skip to main content

Data Analysis and Interpretation

About This Specialization Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions. The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world's biggest social challenges by joining one of their competitions, or help The Connection be…

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud by University of Illinois at Urbana-Champaign


Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud by University of Illinois at Urbana-Champaign

About this course: Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information. We start the first week by introducing some major systems for data analysis including Spark and the major frameworks and distributions of analytics applications including Hortonworks, Cloudera, and MapR. By the middle of week one we introduce the HDFS distributed and robust file system that is used in many applications like Hadoop and finish week one by exploring the powerful MapReduce programming model and how distributed operating systems like YARN and Mesos support a flexible and scalable environment for Big Data analytics. In week two, our course introduces large scale data storage and the difficulties and problems of consensus in enormous stores that use quantities of processors, memories and disks. We discuss eventual consistency, ACID, and BASE and the consensus algorithms used in data centers including Paxos and Zookeeper. Our course presents Distributed Key-Value Stores and in memory databases like Redis used in data centers for performance. Next we present NOSQL Databases. We visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then again we show how Spark SQL can program SQL queries on huge data. We finish up week two with a presentation on Distributed Publish/Subscribe systems using Kafka, a distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form complex systems. Week three moves to fast data real-time streaming and introduces Storm technology that is used widely in industries such as Yahoo. We continue with Spark Streaming, Lambda and Kappa architectures, and a presentation of the Streaming Ecosystem. Week four focuses on Graph Processing, Machine Learning, and Deep Learning. We introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX. Then we move to machine learning with examples from Mahout and Spark. Kmeans, Naive Bayes, and fpm are given as examples. Spark ML and Mllib continue the theme of programmability and application construction. The last topic we cover in week four introduces Deep Learning technologies including Theano, Tensor Flow, CNTK, MXnet, and Caffe on Spark.
Who is this class for: This course is intended for practitioners. We introduce a wide range of Big Data technologies and frameworks that are very commonly used across computer industry. We assume you are familiar with some programming language (such as Python or Java), and are now interested to take your knowledge to the next step by leveraging "frameworks" that do much of the heavy lifting involved in distributed Big Data systems. Most of the code snippets introduced in the lectures can be read as pseudocode.
University of Illinois at Urbana-Champaign
Created by:   University of Illinois at Urbana-Champaign
Reza Farivar
Taught by:    Reza Farivar, Data Engineering Manager at Capital One, Adjunct Research Assistant Professor of Computer Science
Department of Computer Science
Roy H. Campbell
Taught by:    Roy H. Campbell, Professor of Computer Science
Department of Computer Science

Basic Info
Course 4 of 6 in the Cloud Computing Specialization.
CommitmentThere is about 3-4 hours of video lectures per week. Each week's quiz takes about 30 minutes.
Language
English
How To PassPass all graded assignments to complete the course.
User Ratings
Average User Rating 4.1See what learners said
Syllabus
WEEK 1
Course Orientation
You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course. 

1 video4 readings1 reading
  1. Video: Welcome to Cloud Applications, Part 2!
  2. Reading: Syllabus
  3. Reading: About the Discussion Forums
  4. Practice Quiz: Orientation Quiz
  5. Reading: Updating Your Profile
  6. Discussion Prompt: Getting to Know Your Classmates
  7. Reading: Social Media
Module 1: Spark, Hortonworks, HDFS, CAP
In Module 1, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm. 

13 videos1 reading
  1. Reading: Module 1 Overview
  2. Video: 1.1.1 Motivation for Spark
  3. Video: 1.1.2 Apache Spark
  4. Video: 1.1.3 Spark Example: Log Mining
  5. Video: 1.1.4 Spark Example: Logistic Regression
  6. Video: 1.1.5 RDD Fault Tolerance
  7. Video: 1.1.6 Interactive Spark
  8. Video: 1.1.7 Spark Implementation
  9. Video: 1.2.1 Introduction to Distros
  10. Video: 1.2.2 Hortonworks
  11. Video: 1.2.3 Cloudera CDH
  12. Video: 1.2.4 MapR Distro
  13. Video: 1.3.1 HDFS Introduction
  14. Video: 1.3.2 YARN and MESOS
Graded: Module 1 Quiz
WEEK 2
Module 2: Large Scale Data Storage
In this module, you will learn about large scale data storage technologies and frameworks. We start by exploring the challenges of storing large data in distributed systems. We then discuss in-memory key/value storage systems, NoSQL distributed databases, and distributed publish/subscribe queues.

22 videos1 reading
  1. Reading: Module 2 Overview
  2. Video: Module 2 Introduction
  3. Video: 2.1.1 Introduction to MapReduce with Spark
  4. Video: 2.1.2 MapReduce: Motivation
  5. Video: 2.1.3 MapReduce Programming Model with Spark
  6. Video: 2.1.4 MapReduce Example: Word Count
  7. Video: 2.1.5 MapReduce Example: Pi Estimation & Image Smoothing
  8. Video: 2.1.6 MapReduce Summary
  9. Video: 2.2.1 Eventual Consistency – Part 1
  10. Video: 2.2.2 Eventual Consistency – Part 2
  11. Video: 2.2.3 Consistency Trade-Offs
  12. Video: 2.2.4 ACID and BASE
  13. Video: 2.2.5 Zookeeper and Paxos: Introduction
  14. Video: 2.2.6 Paxos
  15. Video: 2.2.7 Zookeeper
  16. Video: 2.3.1 Cassandra Introduction
  17. Video: 2.3.2 Redis
  18. Video: 2.3.3 Redis Demonstration
  19. Video: 2.4.1 HBase Usage API
  20. Video: 2.4.2 HBase Internals - Part 1
  21. Video: 2.4.3 HBase Internals - Part 2
  22. Video: 2.4.4 Spark SQL
  23. Video: 2.5.1 Kafka
Graded: Module 2 Quiz
WEEK 3
Module 3: Streaming Systems
This module introduces you to real-time streaming systems, also known as Fast Data. We talk about Apache Storm in length, Apache Spark Streaming, and Lambda and Kappa architectures. Finally, we contrast all these technologies as a streaming ecosystem.  

18 videos1 reading
  1. Reading: Module 3 Overview
  2. Video: Module 3 Introduction
  3. Video: 3.1.1 Streaming Introduction
  4. Video: 3.1.2 "Big Data Pipelines: The Rise of Real-Time"
  5. Video: 3.1.3 Storm Introduction: Protocol Buffers & Thrift
  6. Video: 3.1.4 A Storm Word Count Example
  7. Video: 3.1.5 Writing the Storm Word Count Example
  8. Video: 3.1.6 Storm Usage at Yahoo
  9. Video: 3.2.1 Anchoring and Spout Replay
  10. Video: 3.2.2 Trident: Exactly Once Processing
  11. Video: 3.3.1 Inside Apache Storm
  12. Video: 3.3.2 The Structure of a Storm Cluster
  13. Video: 3.3.3 Using Thrift in Storm
  14. Video: 3.3.4 How Storm Schedulers Work
  15. Video: 3.3.5 Scaling Storm to 4000 Nodes
  16. Video: 3.3.6 Q&A with Bobby Evans (Yahoo) on Storm
  17. Video: 3.4.1 Spark Streaming
  18. Video: 3.4.2 Lambda and Kappa Architecture
  19. Video: 3.4.3 Streaming Ecosystem
Graded: Module 3 Quiz
WEEK 4
Module 4: Graph Processing and Machine Learning
In this module, we discuss the applications of Big Data. In particular, we focus on two topics: graph processing, where massive graphs (such as the web graph) are processed for information, and machine learning, where massive amounts of data are used to train models such as clustering algorithms and frequent pattern mining. We also introduce you to deep learning, where large data sets are used to train neural networks with effective results.

18 videos1 reading
  1. Reading: Module 4 Overview
  2. Video: 4.1.1 Graph Processing
  3. Video: 4.1.2 Pregel - Part 1
  4. Video: 4.1.3 Pregel - Part 2
  5. Video: 4.1.4 Pregel - Part 3
  6. Video: 4.1.5 Giraph Introduction
  7. Video: 4.1.6 Giraph Example
  8. Video: 4.1.7 Spark GraphX
  9. Video: 4.2.1 Big Data Machine Learning Introduction
  10. Video: 4.2.2 Mahout: Introduction
  11. Video: 4.2.3 Mahout kmeans
  12. Video: 4.2.4 Mahout: Naïve Bayes
  13. Video: 4.2.5 Mahout: fpm
  14. Video: 4.2.6 Spark Naïve Bayes
  15. Video: 4.2.7 Spark fpm
  16. Video: 4.2.8 Spark ML/MLlib
  17. Video: 4.2.9 Introduction to Deep Learning
  18. Video: 4.2.10 Deep Neural Network Systems
  19. Video: 4.3.1 Closing Remarks
  20. Discussion Prompt: Final Reflections
Graded: Module 4 Quiz
How It Works
Coursework
Coursework
Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.
Help from Your Peers
Help from Your Peers
Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.
Certificates
Certificates
Earn official recognition for your work, and share your success with friends, colleagues, and employers.
Creators
University of Illinois at Urbana-Champaign
The University of Illinois at Urbana-Champaign is a world leader in research, teaching and public engagement, distinguished by the breadth of its programs, broad academic excellence, and internationally renowned faculty and alumni. Illinois serves the world by creating knowledge, preparing students for lives of impact, and finding solutions to critical societal needs.


Comments

Popular posts from this blog

An Introduction to Interactive Programming in Python (Part 1)

About this course: This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easy-to learn, high-level computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard and the mouse. In part 1 of this course, we will introduce the basic elements of programming (such as expressions, conditionals, and functions) and then use these elements to create simple interactive applications such as a digital stopwatch. Part 1 of this class will culminate in building a version of the classic arcade game "Pong".
Who is this class for: Recommended Background - A knowledge o…

Introduction to Data Science in Python

About this course: This course will introduce the learner to the basics of the python programming environment, including how to download and install python, expected fundamental python programming techniques, and how to find help with python programming questions. The course will also introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the DataFrame as the central data structure for data analysis. The course will end with a statistics primer, showing how various statistical measures can be applied to pandas DataFrames. By the end of the course, students will be able to take tabular data, clean it,  manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Ne…

Learn to Program and Analyze Data with Python

About This Specialization This Specialization builds on the success of the Python for Everybody course and will introduce fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language. In the Capstone Project, you’ll use the technologies learned throughout the Specialization to design and create your own applications for data retrieval, processing, and visualization. Created by: 5 courses Follow the suggested order or choose your own. Projects Designed to help you practice and apply the skills you learn. Certificates Highlight your new skills on your resume or LinkedIn. Courses

Archive