Course Syllabus – Machine learning with python
1.Module PYTHON Fundamentals.
- Python Basics
- Take your first steps in the world of Python. Discover the different data types and create your first variable.
- Python Lists
- Get the know the first way to store many different data points under a single name. Create, subset and
- manipulate Lists in all sorts of ways.
- Functions and Packages
- Learn how to get the most out of other people’s efforts by importing Python packages and calling functions.
- Numpy
- Write superfast code with Numerical Python, a package to efficiently store and do calculations with huge amounts of data.
- Matplotlib
- Create different types of visualizations depending on the message you want to convey. Learn how to build complex and customized plots based on real data.
- Control flow and Pandas
- Write conditional constructs to tweak the execution of your scripts and
get to know the Pandas DataFrame: the key data structure for Data Science in Python
2.Probability and Statistical Methods:
Introduction to random variables, probability theory, conditional probability, Bayes Theorem.
- Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable.
- Discrete probability distributions: Geometric, Poisson.
- Continuous probability distributions: Exponential, Normal distribution; t-distribution
- Central Limit Theorem; Sampling distributions; Confidence Intervals, Hypothesis Testing.
- statistical hypothesis testing and will be introduced to various methods such as chi-square test, t-test, z-test, F-test and ANOVA
- Covariance and Correlation.
- Hands-on implementation of each of these methods will be conducted in R.
3. Statistical and Probability in Decision Modelling:
- Two very powerful techniques, viz., Linear Regression and Logistic Regression, which are used to solve problems in Prediction and Classification.
- A very brief math refresher on calculus and gradient descents and arriving at suboptimal or optimal solution.
- Relationship between multiple variables: Regression (Linear, Multivariate Linear Regression) in prediction.
- Least squares method.
- Identifying significant features, feature reduction using AIC, multi-collinearity check, observing influential points, etc.
- Checking and validating linear fit, model assumptions and taking actions.
- Hands on R-Session of Logistic and linear regression.
4.Algorithms in Machine learning:
Unsupervised:
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour. (K-Means)
Supervised learning:
- Decision trees.
- Support vector machines
- Random Forest
- Ensemble modelling
- Bagging & boosting and its impact on bias and variance
- Adaboost
- XGboost
5.Text mining, Natural language processing:
Introduction to the Fundamentals of information retrieval; Language modeling
- n-gram models of language
- Smoothing
- Probabilistic language models
Feature engineering:
- TF and IDF
- Bow technique, word2vec.
- Thinking about the math behind text; Properties of words; Vector Space Model
- Evaluation Metrics for Ranking
Natural Language Processing
- Stemming, Phrase identification, word sense disambiguation
- POS tagging
- Parsing and semantic structures
- Coreference resolution
Topic Modelling using LDA