Online Courses and Tutorials provides you with the latest online courses information by assisting over 45,000 courses and 1 million students.

Learn programming, marketing, data science and more.

Get started today

Skip to main content

Data Analysis and Interpretation

About This Specialization Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions. The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world's biggest social challenges by joining one of their competitions, or help The Connection be…

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center

About this course: The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund program. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of single genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. These perturbations are applied to various types of human cells including induced pluripotent stem cells from patients, differentiated into various lineages such as neurons or cardiomyocytes. Then, to better understand the molecular networks that are affected by these perturbations, changes in level of many different variables are measured including: mRNAs, proteins, and metabolites, as well as cellular phenotypic changes such as changes in cell morphology. The BD2K-LINCS Data Coordination and Integration Center (DCIC) is commissioned to organize, analyze, visualize and integrate this data with other publicly available relevant resources. In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Who is this class for: Learn various methods of analysis including: unsupervised clustering, gene-set enrichment analysis, interactive data visualization, and supervised machine learning with application to data from the Library of Integrated Network-based Cellular Signature (LINCS) program, and other relevant Big Data from high content molecular omics data and phenotype profiling of mammalian cells.

Created by:   Icahn School of Medicine at Mount Sinai
  • Avi Ma’ayan, PhD
    Taught by:    Avi Ma’ayan, PhD, Director, Mount Sinai Center for Bioinformatics
    Professor, Department of Pharmacological Sciences
Commitment4-5 hours/week
How To PassPass all graded assignments to complete the course.
User Ratings
Average User Rating 5.0See what learners said
The Library of Integrated Network-based Cellular Signatures (LINCS) Program Overview
This module provides an overview of the concept behind the LINCS program; and tutorials on how to get started with using the LINCS L1000 dataset. 

8 videos2 readings
  1. Reading: Syllabus
  2. Reading: Grading and Logistics
  3. Video: Layers of Cellular Regulation and Omics Technologies
  4. Video: The Connectivity Map
  5. Video: Geometrical View of the Connectivity Map Concept
  6. Video: LINCS Data and Signature Generation Centers
  7. Video: BD2K-LINCS Data Coordination and Integration Center
  8. Video: Induced Pluripotent Stem Cells (iPSCs)
  9. Video: Introduction to LINCS L1000 Data
  10. Discussion Prompt: LINCS L1000 Data - Practice Exercise
  11. Video: L1000 Characteristic Direction Signature Search Engine (L1000CDS2) Demo
Metadata and Ontologies
This module includes a broad high level description of the concepts behind metadata and ontologies and how these are applied to LINCS datasets. 

2 videos
  1. Video: Introduction to Metadata and Ontologies | Part 1
  2. Video: Introduction to Metadata and Ontologies | Part 2
Serving Data with APIs
In this module we explain the concept of accessing data through an application programming interface (API). 

2 videos
  1. Video: Accessing and Serving Data through RESTful APIs | Part 1
  2. Video: Accessing and Serving Data through RESTful APIs | Part 2
  3. Discussion Prompt: Accessing Data through the Harmonizome's RESTful API - Practice Exercise
Bioinformatics Pipelines
This module describes the important concept of a Bioinformatics pipeline. 

1 video
  1. Video: Analyzing Big Data with Computational Pipelines
  2. Discussion Prompt: Bioinformatics Pipeline - Practice Exercise
The Harmonizome
This module describes a project that integrates many resources that contain knowledge about genes and proteins. The project is called the Harmonizome, and it is implemented as a web-server application available at:  

4 videos
  1. Video: The Harmonizome Concept
  2. Video: Processing Datasets | Part 1
  3. Video: Processing Datasets | Part 2
  4. Video: Processing Datasets | Part 3
  5. Discussion Prompt: Harmonizome - Practice Exercise
Data Normalization
This module describes the mathematical concepts behind data normalization. 

2 videos
  1. Video: Data Normalization | Part 1
  2. Video: Data Normalization | Part 2
  3. Discussion Prompt: Data Normalization - Practice Exercise
Data Clustering
This module describes the mathematical concepts behind data clustering, or in other words unsupervised learning - the identification of patterns within data without considering the labels associated with the data.  

3 videos
  1. Video: Data Clustering | Part 1 | Introduction
  2. Video: Data Clustering | Part 2 | Distance Functions
  3. Video: Data Clustering | Part 3 | Algorithms and Evaluation
  4. Discussion Prompt: Data Clustering - Practice Exercise
Midterm Exam
The Midterm Exam consists of 45 multiple choice questions which covers modules 1-7. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.  
Graded: Midterm Exam
Enrichment Analysis
This module introduces the important concept of performing gene set enrichment analyses. Enrichment analysis is the process of querying gene sets from genomics and proteomics studies against annotated gene sets collected from prior biological knowledge. 

3 videos
  1. Video: Enrichment Analysis | Part 1
  2. Video: Enrichment Analysis | Part 2
  3. Video: Enrichr Demo
Machine Learning
This module describes the mathematical concepts of supervised machine learning, the process of making predictions from examples that associate observations/features/attribute with one or more properties that we wish to learn/predict. 

3 videos
  1. Video: Introduction to Machine Learning | Part 1
  2. Video: Introduction to Machine Learning | Part 2
  3. Video: Introduction to Machine Learning | Part 3
  4. Discussion Prompt: Machine Learning - Practice Exercise
This module discusses how Bioinformatics pipelines can be compared and evaluated. 

2 videos
  1. Video: Benchmarking | Part 1
  2. Video: Benchmarking | Part 2
  3. Discussion Prompt: Benchmarking - Practice Exercise
Interactive Data Visualization
This module provides programming examples on how to get started with creating interactive web-based data visualization elements/figures. 

4 videos
  1. Video: Interactive Data Visualization with E-Charts
  2. Video: Visualizing Data using Interactive Clustergrams Built with D3.js | Part 1
  3. Video: Visualizing Data using Interactive Clustergrams Built with D3.js | Part 2
  4. Video: Visualizing Data using Interactive Clustergrams Built with D3.js | Part 3
  5. Discussion Prompt: Visualizing Gene Expression Data using Interactive Clustergrams Built with D3.js - Practice Exercise
Crowdsourcing Projects
This final module describes opportunities to work on LINCS related projects that go beyond the course. 

2 videos1 reading
  1. Video: Microtasks and GEO2Enrichr Demo
  2. Video: L1000-2-P100 Megatask Challenge
  3. Reading: BD2K-LINCS DCIC Crowdsourcing Portal
Final Exam
The Final Exam consists of 60 multiple choice questions which covers all of the modules of the course. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.  
Graded: Final Exam
How It Works
Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.
Help from Your Peers
Help from Your Peers
Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.
Earn official recognition for your work, and share your success with friends, colleagues, and employers.
Icahn School of Medicine at Mount Sinai
The Icahn School of Medicine at Mount Sinai, in New York City is a leader in medical and scientific training and education, biomedical research and patient care.


Popular posts from this blog

An Introduction to Interactive Programming in Python (Part 1)

About this course: This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easy-to learn, high-level computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard and the mouse. In part 1 of this course, we will introduce the basic elements of programming (such as expressions, conditionals, and functions) and then use these elements to create simple interactive applications such as a digital stopwatch. Part 1 of this class will culminate in building a version of the classic arcade game "Pong".
Who is this class for: Recommended Background - A knowledge o…

Introduction to Data Science in Python

About this course: This course will introduce the learner to the basics of the python programming environment, including how to download and install python, expected fundamental python programming techniques, and how to find help with python programming questions. The course will also introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the DataFrame as the central data structure for data analysis. The course will end with a statistics primer, showing how various statistical measures can be applied to pandas DataFrames. By the end of the course, students will be able to take tabular data, clean it,  manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Ne…

Learn to Program and Analyze Data with Python

About This Specialization This Specialization builds on the success of the Python for Everybody course and will introduce fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language. In the Capstone Project, you’ll use the technologies learned throughout the Specialization to design and create your own applications for data retrieval, processing, and visualization. Created by: 5 courses Follow the suggested order or choose your own. Projects Designed to help you practice and apply the skills you learn. Certificates Highlight your new skills on your resume or LinkedIn. Courses