Data science with R programming

1.History and Overview of R

  • What is R?
  • What is S?
  • The S Philosophy
  • Back to R
  • Basic Features of R
  • Free Software
  • Design of the R System
  • Limitations of R
  • R Resources

2.Getting Started with R

  • Installation
  • Getting started with the R interface

3.R Nuts and Bolts

  • Entering Input
  • Evaluation
  • R Objects
  • Numbers
  • Attributes
  • Creating Vectors
  • Mixing Objects
  • Explicit Coercion
  • Matrices
  • Lists
  • Factors
  • Missing Values
  • Data Frames
  • Names
  • Summary

4.CONTENTS

  • Getting Data In and Out of R
  • Reading and Writing Data
  • Reading Data Files with readtable()
  • Reading in Larger Datasets with readtable
  • Calculating Memory Requirements for R Objects
  • Using the readr Package
  • Using Textual and Binary Formats for Storing Data
  • Using dput() and dump()
  • Binary Formats
  • Interfaces to the Outside World
  • File Connections
  • Reading Lines of a Text File
  • Reading From a URL Connection
  • Subsetting R Objects
  • Subsetting a Vector
  • Subsetting a Matrix
  • Subsetting Lists
  • Subsetting Nested Elements of a List
  • Extracting Multiple Elements of a List
  • Partial Matching
  • Removing NA Values
  • Vectorized Operations
  • Vectorized Matrix Operations
  • Dates and Times
  • Dates in R
  • Times in R
  • Operations on Dates and Times
  • Summary

5.Managing Data Frames with the dplyr package

  • Data Frames
  • The dplyr Package
  • Dplyr Grammar
  • Installing the
  • Dplyr  package
  • select()
  • filter()
  • arrange()
  • rename()
  • mutate()
  • CONTENTS
  • group_by()
  • %>%

1.Probability and Statistical Methods:

 Introduction to random variables, probability theory, conditional probability, Bayes Theorem.

  • Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable.
  • Discrete probability distributions: Geometric, Poisson.
  • Continuous probability distributions: Exponential, Normal distribution; t-distribution
  • Central Limit Theorem; Sampling distributions; Confidence Intervals, Hypothesis Testing.
  • statistical hypothesis testing and will be introduced to various methods such as chi-square test, t-test, z-test, F-test and ANOVA
  • Covariance and Correlation.
  • Hands-on implementation of each of these methods will be conducted in R.

3. Statistical and Probability in Decision Modelling: 

  • Two very powerful techniques, viz., Linear Regression and Logistic Regression, which are used to solve problems in Prediction and Classification.
  • A very brief math refresher on calculus and gradient descents and arriving at suboptimal or optimal solution. 
  • Relationship between multiple variables: Regression (Linear, Multivariate Linear Regression) in prediction. 
  • Least squares method. 
  • Identifying significant features, feature reduction using AIC, multi-collinearity check, observing influential points, etc. 
  • Checking and validating linear fit, model assumptions and taking actions.
  • Hands on R-Session of Logistic and linear regression. 

3.Algorithms in Machine learning: 

Unsupervised:

  • Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour. (K-Means)

Supervised learning:

  • Decision trees.
  • Support vector machines
  • Random Forest 
  • Ensemble modelling 
  • Bagging & boosting and its impact on bias and variance 
  • Adaboost
  • XGboost 

4.Text mining, Natural language processing: 

Introduction to the Fundamentals of information retrieval; Language modeling 

  • n-gram models of language 
  • Smoothing 
  • Probabilistic language models 

Feature engineering: 

  • TF and IDF 
  • Bow technique, word2vec.
  • Thinking about the math behind text; Properties of words; Vector Space Model 
  • Evaluation Metrics for Ranking 

Natural Language Processing 

  • Stemming, Phrase identification, word sense disambiguation 
  • POS tagging 
  • Parsing and semantic structures 
  • Coreference resolution 

Topic Modelling using LDA

  • Course duration:  90 min/day
  • No. of Sessions: 45 
  • Weekend Batch Starting August 1st Week
  • Course Fee: Rs 10000/-

Spread the word. Share this post!

Leave Comment

Your email address will not be published. Required fields are marked *