项目作者: comsavvy

项目描述 :
Week 4 challenge @10Academy
高级语言: Jupyter Notebook
项目地址: git://github.com/comsavvy/A-B-Hypothesis-testing.git
创建时间: 2020-08-13T17:57:04Z
项目社区:https://github.com/comsavvy/A-B-Hypothesis-testing

开源协议:

下载


A-B-Hypothesis-testing

Task 2.1 : Classic and sequential A/B testing analysis

Perform data exploration to count unique values of categorical variables, make histogram, relational, and other necessary plots to help understand the data. For each of the plots you produce, write a description of what the plot shows in markdown cells.
Perform hypothesis testing: apply the classical p-value based algorithm and the sequential A/B testing algorithm for which a starter code is provided..
Are the number of data points in the experiment enough to make a reasonable judgement or should the company run a longer experiment? Remember that running the experiment longer may be costly for many reasons, so you should always optimize the number of samples to make a statistically sound decision.
What does your A/B testing analysis tell you? Is brand awareness increased for the exposed group?

Task 2.2: Machine Learning

In max three statements, make a problem formulation for machine learning and specify the target variable
Split the data into 70% training, 20% validation, and 10% test sets.
Based on the reading material provided, apply machine learning to the training data. Train a machine learning model using 5-fold cross validation the following 3 different algorithms:
Logistic Regression
Decision Trees
XGBoost
Define the appropriate loss function for the model using the validation data.
Compute feature importance - what’s driving the model? Which parameters are important predictors for the different ML models? What contributes to the goal of gaining more “Yes” results?
Which data features are relevant to predicting the target variable?
Explain what the difference is between using A/B testing to test a hypothesis vs using Machine learning to learn the viability of the same effect?
Explain the purpose of training using k-fold cross validation instead of using the whole data to train the ML models?
What information do you gain using the Machine Learning approach that you couldn’t obtain using A/B testing?