Refactored

bansalkanav · bansalkanav · commit e0fd5721bbbf · 2024-02-16T12:56:41.000+05:30
diff --git a/Module 3 - Advance Data Analysis and Statistics/6. Performing Statistical Test/statistical_test_practical_implementation.ipynb b/Module 3 - Advance Data Analysis and Statistics/6. Performing Statistical Test/statistical_test_practical_implementation.ipynb
@@ -9,9 +9,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Concepts**"
    ]
@@ -82,15 +80,19 @@
     "### **Concept 4 - What is p-value?**\n",
     "The p-value, short for \"probability value,\" is a number that helps us understand the strength of evidence against a null hypothesis in hypothesis testing.\n",
     "\n",
+    "**OR**\n",
+    "\n",
+    "p-value quantifies the strength of evidence i.e. how likely the evidence occured by a random chance. The smaller the p-value, less likely the result occured by random change and the stronger the evidence that you should reject the null hypothesis.\n",
+    "\n",
     "To say whether the p-value is significant or not, we need a significance threshold called the **significance level**. This threshold is usually set at 0.05. \n",
     "\n",
     "It's important because it helps control the rate of Type I errors. By setting a threshold, we define the level of evidence needed to reject the null hypothesis. Lower thresholds (e.g., 0.01) require stronger evidence.\n",
     "\n",
     "\n",
-    "If the p-value is **BELOW** the threshold (meaning smaller than), then you can infer a **statistically significant relationship** between the input and target variables.    \n",
+    "If the p-value is **BELOW** the threshold (meaning smaller than), then you can infer a **statistically significant evidence** i.e. outcome didn't occur randomly.    \n",
     "i.e. For p-value < 0.05, you can Reject the Null Hypothesis\n",
     "\n",
-    "Otherwise, then you can infer **no statistically significant relationship** between the predictor and outcome variables.  \n",
+    "Otherwise, then you can infer **no statistically significant evidence** i.e. outcome occured at random.  \n",
     "i.e. For p-value > 0.05, you Fail to Reject the Null Hypothesis\n",
     "\n",
     "Here's a simple explanation:\n",
@@ -149,9 +151,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Importing all the Required Libraries**"
    ]
@@ -173,9 +173,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Loading the Data**"
    ]
@@ -405,9 +403,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Renaming the Columns**"
    ]
@@ -478,9 +474,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Univariate Analysis - Discrete Data**"
    ]
@@ -773,7 +767,15 @@
    "source": [
     "### Chi-Square Test for Goodness-of-fit\n",
     "\n",
-    "Tests whether the observed frequencies of categorical data match the expected frequencies according to a specified distribution."
+    "Tests whether the observed frequencies of categorical data match the expected frequencies according to a specified distribution.\n",
+    "\n",
+    "**Assumptions**\n",
+    "- Observations in each sample are independent and identically distributed (iid).\n",
+    "- Observations should be discrete.\n",
+    "\n",
+    "**Interpretation**\n",
+    "- H0: The observed and expected frequencies are matching.\n",
+    "- H1: The observed and expected frequencies are not matching."
    ]
   },
   {
@@ -818,9 +820,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Univariate Analysis - Numerical Data**"
    ]
@@ -841,9 +841,7 @@
   {
    "cell_type": "code",
    "execution_count": 16,
-   "metadata": {
-    "scrolled": true
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -1029,12 +1027,13 @@
     "### Kolmogorov-Smirnov Test\n",
     "\n",
     "The Kolmogorov-Smirnov (KS) test is used to check if a sample follows a specific distribution, including normal distribution. \n",
+    "\n",
     "**Assumptions**\n",
     "- Observations in each sample are independent and identically distributed (iid).\n",
     "\n",
     "**Interpretation**\n",
     "- H0: the sample has a distribution.\n",
-    "- H1: the sample does not have that distribution.\n"
+    "- H1: the sample does not have that distribution."
    ]
   },
   {
@@ -1044,6 +1043,7 @@
    "outputs": [],
    "source": [
     "def kolmogorov_smirnov(data, significance_level):\n",
+    "    # You can replace 'norm' with stats.norm\n",
     "    stat, p = stats.kstest(data, 'norm')\n",
     "    \n",
     "    print('stat=%.3f, p=%.3f' % (stat, p))\n",
@@ -1078,7 +1078,16 @@
    "source": [
     "### One-Sample t-test\n",
     "\n",
-    "Tests whether the mean of a single sample is significantly different from a known or hypothesized population mean."
+    "Tests whether the mean of a single sample is significantly different from a known or hypothesized population mean.\n",
+    "\n",
+    "**Assumptions**\n",
+    "- Observations in each sample are independent and identically distributed (iid).\n",
+    "- Observations are continuous measurements.\n",
+    "- Data is normally distributed.\n",
+    "\n",
+    "**Interpretation**\n",
+    "- H0: $\\mu_{pop}=m_o$.\n",
+    "- H1: $\\mu_{pop}\\ne m_o$."
    ]
   },
   {
@@ -1122,9 +1131,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Bivariate Analysis - Numerical vs Numerical**"
    ]
@@ -1238,9 +1245,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Bivariate Analysis - Categorical vs Categorical**"
    ]
@@ -1477,9 +1482,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **Bivariate Analysis - Numerical vs Categorical**"
    ]
@@ -1735,9 +1738,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## **This is not the end!**\n",
     "\n",