1.History and Overview of R
- What is R?
- What is S?
- The S Philosophy
- Back to R
- Basic Features of R
- Free Software
- Design of the R System
- Limitations of R
- R Resources
2.Getting Started with R
- Installation
- Getting started with the R interface
3.R Nuts and Bolts
- Entering Input
- Evaluation
- R Objects
- Numbers
- Attributes
- Creating Vectors
- Mixing Objects
- Explicit Coercion
- Matrices
- Lists
- Factors
- Missing Values
- Data Frames
- Names
- Summary
4.CONTENTS
- Getting Data In and Out of R
- Reading and Writing Data
- Reading Data Files with readtable()
- Reading in Larger Datasets with readtable
- Calculating Memory Requirements for R Objects
- Using the readr Package
- Using Textual and Binary Formats for Storing Data
- Using dput() and dump()
- Binary Formats
- Interfaces to the Outside World
- File Connections
- Reading Lines of a Text File
- Reading From a URL Connection
- Subsetting R Objects
- Subsetting a Vector
- Subsetting a Matrix
- Subsetting Lists
- Subsetting Nested Elements of a List
- Extracting Multiple Elements of a List
- Partial Matching
- Removing NA Values
- Vectorized Operations
- Vectorized Matrix Operations
- Dates and Times
- Dates in R
- Times in R
- Operations on Dates and Times
- Summary
5.Managing Data Frames with the dplyr package
- Data Frames
- The dplyr Package
- Dplyr Grammar
- Installing the
- Dplyr package
- select()
- filter()
- arrange()
- rename()
- mutate()
- CONTENTS
- group_by()
- %>%
1.Probability and Statistical Methods:
Introduction to random variables, probability theory, conditional probability, Bayes Theorem.
- Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable.
- Discrete probability distributions: Geometric, Poisson.
- Continuous probability distributions: Exponential, Normal distribution; t-distribution
- Central Limit Theorem; Sampling distributions; Confidence Intervals, Hypothesis Testing.
- statistical hypothesis testing and will be introduced to various methods such as chi-square test, t-test, z-test, F-test and ANOVA
- Covariance and Correlation.
- Hands-on implementation of each of these methods will be conducted in R.
3. Statistical and Probability in Decision Modelling:
- Two very powerful techniques, viz., Linear Regression and Logistic Regression, which are used to solve problems in Prediction and Classification.
- A very brief math refresher on calculus and gradient descents and arriving at suboptimal or optimal solution.
- Relationship between multiple variables: Regression (Linear, Multivariate Linear Regression) in prediction.
- Least squares method.
- Identifying significant features, feature reduction using AIC, multi-collinearity check, observing influential points, etc.
- Checking and validating linear fit, model assumptions and taking actions.
- Hands on R-Session of Logistic and linear regression.
3.Algorithms in Machine learning:
Unsupervised:
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour. (K-Means)
Supervised learning:
- Decision trees.
- Support vector machines
- Random Forest
- Ensemble modelling
- Bagging & boosting and its impact on bias and variance
- Adaboost
- XGboost
4.Text mining, Natural language processing:
Introduction to the Fundamentals of information retrieval; Language modeling
- n-gram models of language
- Smoothing
- Probabilistic language models
Feature engineering:
- TF and IDF
- Bow technique, word2vec.
- Thinking about the math behind text; Properties of words; Vector Space Model
- Evaluation Metrics for Ranking
Natural Language Processing
- Stemming, Phrase identification, word sense disambiguation
- POS tagging
- Parsing and semantic structures
- Coreference resolution
Topic Modelling using LDA
- Course duration: 90 min/day
- No. of Sessions: 45
- Weekend Batch Starting August 1st Week
- Course Fee: Rs 10000/-