Data Science

What is Data Science ?

Data Science Institute in Hyderabad  is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. Data Science Training in Hyderabad is related to data mining, machine learning and big data.

Data Science Course Content

Module 1 – Data Science Project Lifecycle

  • Recap of Demo
  • Introduction to Types of Analytics
  • Project life cycle

Module 2 – Introduction to Python, R and Basic Statistics

  • Installation of Python IDE
  • Anaconda and Spyder
  • Working with Python and some basic commands& Examples
  • Introduction to R and RStudio with some basics

Various graphical techniques to understand data

  • Bar plot
    • Histogram
    • Box plot
    • Scatter plot
  • The various Data Types namely continuous, discrete, categorical, count, qualitative, quantitative and its identification and application. Further classification of data in terms of Nominal, Ordinal, Interval and Ratio types
  • Random Variable and its definition
  • Probability and Probability Distribution – Continuous probability distribution / Probability density function and Discrete probability distribution / Probability mass function

Basic Statistics

  • Various sampling techniques 
  • Measure of central tendency
    • Mean / Average
    • Median
    • Mode
  • Measure of Dispersion
    • Variance
    • Standard Deviation
    • Range
  • Expected value of probability distribution
  • Measure of Skewness
  • Measure of Kurtosis
  • Normal Distribution
  • Standard Normal Distribution / Z distribution
  • Z scores and Z table
  • QQ Plot / Quantile-Quantile plot

Advanced Statistics

  • Sampling Variation
  • Central Limit Theorem
  • Sample size calculator
  • T-distribution / Student’s-t distribution
  • Confidence interval
    • Population parameter – Standard deviation known
    • Population parameter – Standard deviation unknown

Module 3 – Hypothesis Testing

Introduced to Hypothesis testing, various Hypothesis testing Statistics, understand what is Null Hypothesis, Alternative hypothesis and types of hypothesis testing.

  • Type I and Type II errors
  • ANOVA
  • Chi-Square test

High-Level overview of Machine Learning

  • Supervised Learning
    • Classifier
    • Regression
  • Unsupervised Learning
    • Clustering

Supervised – Classifiers

Module 4 – Machine Learning Classifiers – KNN

Module 5 – Classifier – Naive Bayes

Module 6 – Decision Tree

Module 7 – Logistic Regression

  • Simple Logistic Regression
  • Multiple Logistic Regression
  • Confusion matrix
    • False Positive, False Negative
    • True Positive, True Negative
    • Sensitivity, Recall, Specificity, F1
  • Receiver operating characteristics curve (ROC curve)

Module 8 – Bagging And Boosting

Module 9 – Black Box Methods

  • Network Topology
  • Support Vector Machines

Module 10 – Survival Analysis

  • Concept with a business case

Module 11 – Forecasting

  • ARMA (Auto-Regressive Moving Average), Order p and q
  • ARIMA (Auto-Regressive Integrated Moving Average), Order p, d and q

Supervised – Regression

Module 12 – Linear Regression

  • Scatter Diagram
  • Correlation Analysis
  • Principles of Regression
  • Ordinary least squares
  • Simple Linear Regression
  • Understanding Overfitting (Variance) vs Underfitting (Bias)
  • LINE assumption
    • Collinearity (Variance Inflation Factor)
    • Linearity
    • Normality
  • Multiple Linear Regression

Module 13 – Polynomial Regression

Module 14 – Decision Tree& Random Forest

Module 15 – Regularization Techniques

  • Lasso and Ridge Regressions

Module 16 – Multinomial Regression

  • Logit and Log Likelihood
    • Category Baselining
    • Modeling Nominal categorical data

Data Mining Unsupervised- Clustering

Module 17 – Data Mining Unsupervised – Clustering

  • Hierarchical Clustering / Agglomerative Clustering
  • K-Means Clustering

Module 18 – Dimension Reduction

  • Why dimension reduction
  • Advantages of PCA
  • Calculation of PCA weights
  • 2D Visualization using Principal Components
  • Basics of Matrix algebra
  • SVD – Decomposition of matrix data

Module 19 – Data Mining Unsupervised – Network Analytics

  • Definition of a network (the LinkedIn analogy)
  • Introduction to Google Page Ranking

Module 20 – Data Mining Unsupervised – Association Rules

  • What is Market Basket / Affinity Analysis
  • Measure of association
    • Support
    • Confidence
    • Lift Ratio
  • Apriori Algorithm
  • Sequential Pattern Mining

Module 21 – Data Mining Unsupervised – Recommender System

Module 22 – Text Mining

Module 23 – Natural Language Processing

Assignments/Projects/Placement Support

Open chat