### Machine learning with python

Course Syllabus – Machine learning with python

1.Module PYTHON Fundamentals.

• Python Basics
• Take your first steps in the world of Python. Discover the different data types and create your first variable.
• Python Lists
• Get the know the first way to store many different data points under a single name. Create, subset and
• manipulate Lists in all sorts of ways.
• Functions and Packages
• Learn how to get the most out of other people’s efforts by importing Python packages and calling functions.
• Numpy
• Write superfast code with Numerical Python, a package to efficiently store and do calculations with huge amounts of data.
• Matplotlib
• Create different types of visualizations depending on the message you want to convey. Learn how to build complex and customized plots based on real data.
• Control flow and Pandas
• Write conditional constructs to tweak the execution of your scripts and

get to know the Pandas DataFrame: the key data structure for Data Science in Python

2.Probability and Statistical Methods:

Introduction to random variables, probability theory, conditional probability, Bayes Theorem.

• Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable.
• Discrete probability distributions: Geometric, Poisson.
• Continuous probability distributions: Exponential, Normal distribution; t-distribution
• Central Limit Theorem; Sampling distributions; Confidence Intervals, Hypothesis Testing.
• statistical hypothesis testing and will be introduced to various methods such as chi-square test, t-test, z-test, F-test and ANOVA
• Covariance and Correlation.
• Hands-on implementation of each of these methods will be conducted in R.

3. Statistical and Probability in Decision Modelling:

• Two very powerful techniques, viz., Linear Regression and Logistic Regression, which are used to solve problems in Prediction and Classification.
• A very brief math refresher on calculus and gradient descents and arriving at suboptimal or optimal solution.
• Relationship between multiple variables: Regression (Linear, Multivariate Linear Regression) in prediction.
• Least squares method.
• Identifying significant features, feature reduction using AIC, multi-collinearity check, observing influential points, etc.
• Checking and validating linear fit, model assumptions and taking actions.
• Hands on R-Session of Logistic and linear regression.

4.Algorithms in Machine learning:

Unsupervised:

• Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour. (K-Means)

Supervised learning:

• Decision trees.
• Support vector machines
• Random Forest
• Ensemble modelling
• Bagging & boosting and its impact on bias and variance
• XGboost

5.Text mining, Natural language processing:

Introduction to the Fundamentals of information retrieval; Language modeling

• n-gram models of language
• Smoothing
• Probabilistic language models

Feature engineering:

• TF and IDF
• Bow technique, word2vec.
• Thinking about the math behind text; Properties of words; Vector Space Model
• Evaluation Metrics for Ranking

Natural Language Processing

• Stemming, Phrase identification, word sense disambiguation
• POS tagging
• Parsing and semantic structures
• Coreference resolution

Topic Modelling using LDA

• Course duration:  90 min/day
• No. of Sessions: 45
• Weekend Batch Starting August 1st Week
• Course Fee: Rs 10000/-

1. 