Skip to content

Commit e0fd572

Browse files
committed
Refactored
1 parent 59eadf1 commit e0fd572

File tree

1 file changed

+39
-38
lines changed

1 file changed

+39
-38
lines changed

Module 3 - Advance Data Analysis and Statistics/6. Performing Statistical Test/statistical_test_practical_implementation.ipynb

+39-38
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,7 @@
99
},
1010
{
1111
"cell_type": "markdown",
12-
"metadata": {
13-
"jp-MarkdownHeadingCollapsed": true
14-
},
12+
"metadata": {},
1513
"source": [
1614
"## **Concepts**"
1715
]
@@ -82,15 +80,19 @@
8280
"### **Concept 4 - What is p-value?**\n",
8381
"The p-value, short for \"probability value,\" is a number that helps us understand the strength of evidence against a null hypothesis in hypothesis testing.\n",
8482
"\n",
83+
"**OR**\n",
84+
"\n",
85+
"p-value quantifies the strength of evidence i.e. how likely the evidence occured by a random chance. The smaller the p-value, less likely the result occured by random change and the stronger the evidence that you should reject the null hypothesis.\n",
86+
"\n",
8587
"To say whether the p-value is significant or not, we need a significance threshold called the **significance level**. This threshold is usually set at 0.05. \n",
8688
"\n",
8789
"It's important because it helps control the rate of Type I errors. By setting a threshold, we define the level of evidence needed to reject the null hypothesis. Lower thresholds (e.g., 0.01) require stronger evidence.\n",
8890
"\n",
8991
"\n",
90-
"If the p-value is **BELOW** the threshold (meaning smaller than), then you can infer a **statistically significant relationship** between the input and target variables. \n",
92+
"If the p-value is **BELOW** the threshold (meaning smaller than), then you can infer a **statistically significant evidence** i.e. outcome didn't occur randomly. \n",
9193
"i.e. For p-value < 0.05, you can Reject the Null Hypothesis\n",
9294
"\n",
93-
"Otherwise, then you can infer **no statistically significant relationship** between the predictor and outcome variables. \n",
95+
"Otherwise, then you can infer **no statistically significant evidence** i.e. outcome occured at random. \n",
9496
"i.e. For p-value > 0.05, you Fail to Reject the Null Hypothesis\n",
9597
"\n",
9698
"Here's a simple explanation:\n",
@@ -149,9 +151,7 @@
149151
},
150152
{
151153
"cell_type": "markdown",
152-
"metadata": {
153-
"jp-MarkdownHeadingCollapsed": true
154-
},
154+
"metadata": {},
155155
"source": [
156156
"## **Importing all the Required Libraries**"
157157
]
@@ -173,9 +173,7 @@
173173
},
174174
{
175175
"cell_type": "markdown",
176-
"metadata": {
177-
"jp-MarkdownHeadingCollapsed": true
178-
},
176+
"metadata": {},
179177
"source": [
180178
"## **Loading the Data**"
181179
]
@@ -405,9 +403,7 @@
405403
},
406404
{
407405
"cell_type": "markdown",
408-
"metadata": {
409-
"jp-MarkdownHeadingCollapsed": true
410-
},
406+
"metadata": {},
411407
"source": [
412408
"## **Renaming the Columns**"
413409
]
@@ -478,9 +474,7 @@
478474
},
479475
{
480476
"cell_type": "markdown",
481-
"metadata": {
482-
"jp-MarkdownHeadingCollapsed": true
483-
},
477+
"metadata": {},
484478
"source": [
485479
"## **Univariate Analysis - Discrete Data**"
486480
]
@@ -773,7 +767,15 @@
773767
"source": [
774768
"### Chi-Square Test for Goodness-of-fit\n",
775769
"\n",
776-
"Tests whether the observed frequencies of categorical data match the expected frequencies according to a specified distribution."
770+
"Tests whether the observed frequencies of categorical data match the expected frequencies according to a specified distribution.\n",
771+
"\n",
772+
"**Assumptions**\n",
773+
"- Observations in each sample are independent and identically distributed (iid).\n",
774+
"- Observations should be discrete.\n",
775+
"\n",
776+
"**Interpretation**\n",
777+
"- H0: The observed and expected frequencies are matching.\n",
778+
"- H1: The observed and expected frequencies are not matching."
777779
]
778780
},
779781
{
@@ -818,9 +820,7 @@
818820
},
819821
{
820822
"cell_type": "markdown",
821-
"metadata": {
822-
"jp-MarkdownHeadingCollapsed": true
823-
},
823+
"metadata": {},
824824
"source": [
825825
"## **Univariate Analysis - Numerical Data**"
826826
]
@@ -841,9 +841,7 @@
841841
{
842842
"cell_type": "code",
843843
"execution_count": 16,
844-
"metadata": {
845-
"scrolled": true
846-
},
844+
"metadata": {},
847845
"outputs": [
848846
{
849847
"name": "stdout",
@@ -1029,12 +1027,13 @@
10291027
"### Kolmogorov-Smirnov Test\n",
10301028
"\n",
10311029
"The Kolmogorov-Smirnov (KS) test is used to check if a sample follows a specific distribution, including normal distribution. \n",
1030+
"\n",
10321031
"**Assumptions**\n",
10331032
"- Observations in each sample are independent and identically distributed (iid).\n",
10341033
"\n",
10351034
"**Interpretation**\n",
10361035
"- H0: the sample has a distribution.\n",
1037-
"- H1: the sample does not have that distribution.\n"
1036+
"- H1: the sample does not have that distribution."
10381037
]
10391038
},
10401039
{
@@ -1044,6 +1043,7 @@
10441043
"outputs": [],
10451044
"source": [
10461045
"def kolmogorov_smirnov(data, significance_level):\n",
1046+
" # You can replace 'norm' with stats.norm\n",
10471047
" stat, p = stats.kstest(data, 'norm')\n",
10481048
" \n",
10491049
" print('stat=%.3f, p=%.3f' % (stat, p))\n",
@@ -1078,7 +1078,16 @@
10781078
"source": [
10791079
"### One-Sample t-test\n",
10801080
"\n",
1081-
"Tests whether the mean of a single sample is significantly different from a known or hypothesized population mean."
1081+
"Tests whether the mean of a single sample is significantly different from a known or hypothesized population mean.\n",
1082+
"\n",
1083+
"**Assumptions**\n",
1084+
"- Observations in each sample are independent and identically distributed (iid).\n",
1085+
"- Observations are continuous measurements.\n",
1086+
"- Data is normally distributed.\n",
1087+
"\n",
1088+
"**Interpretation**\n",
1089+
"- H0: $\\mu_{pop}=m_o$.\n",
1090+
"- H1: $\\mu_{pop}\\ne m_o$."
10821091
]
10831092
},
10841093
{
@@ -1122,9 +1131,7 @@
11221131
},
11231132
{
11241133
"cell_type": "markdown",
1125-
"metadata": {
1126-
"jp-MarkdownHeadingCollapsed": true
1127-
},
1134+
"metadata": {},
11281135
"source": [
11291136
"## **Bivariate Analysis - Numerical vs Numerical**"
11301137
]
@@ -1238,9 +1245,7 @@
12381245
},
12391246
{
12401247
"cell_type": "markdown",
1241-
"metadata": {
1242-
"jp-MarkdownHeadingCollapsed": true
1243-
},
1248+
"metadata": {},
12441249
"source": [
12451250
"## **Bivariate Analysis - Categorical vs Categorical**"
12461251
]
@@ -1477,9 +1482,7 @@
14771482
},
14781483
{
14791484
"cell_type": "markdown",
1480-
"metadata": {
1481-
"jp-MarkdownHeadingCollapsed": true
1482-
},
1485+
"metadata": {},
14831486
"source": [
14841487
"## **Bivariate Analysis - Numerical vs Categorical**"
14851488
]
@@ -1735,9 +1738,7 @@
17351738
},
17361739
{
17371740
"cell_type": "markdown",
1738-
"metadata": {
1739-
"jp-MarkdownHeadingCollapsed": true
1740-
},
1741+
"metadata": {},
17411742
"source": [
17421743
"## **This is not the end!**\n",
17431744
"\n",

0 commit comments

Comments
 (0)