Skip to content

Commit 1b213d8

Browse files
authored
Update Readme.md
1 parent a285d06 commit 1b213d8

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

Complexity_Learning_curves/Readme.md

+17-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,21 @@
11
## Complexity and Learning curves
2-
Complexity and learning curve analyses are some of the most important tasks in a Machine Learning project.
2+
Modern enterprise of data science is powered at its core by advanced statistical modeling and machine learning (ML). Countless courses and articles teach budding data scientists about the concepts of training and test set, cross-validation, and computation of error metrics such as confusion matrix and F1 score.
3+
4+
However, a good modeling does not end with achieving a high accuracy in the test set. It is just the start. There are tasks, often underappreciated, which can make a ML modeling robust and ready for production level scaling. Learners and practitioners of data science should imbue these tasks in their modeling pipeline as much as possible to project themselves as someone who cares not only about the algorithmic performance but also how the data science pipeline ultimately helps solving a business or scientific problem.
5+
6+
In this repo, we have notebooks outlining two such simple yet effective techniques and how they can be enmeshed with widely popular ML algorithms for modeling task.
7+
8+
### Learning and complexity curves
9+
10+
Complexity and learning curve analyses are essentially are part of the visual analytics that a data scientist must perform using the available dataset for comparing the merits of various ML algorithms.
11+
12+
Often, a data scientist has a plethora of ML algorithms to choose from for a given business question and a specific data set.
13+
14+
“Should I use logistic regression or k-nearest-neighbor for the classification task? How about a Support Vector Machine classifier? But what kernel to choose?”
15+
16+
Ultimately, a lot of experiment with the data and algorithms are needed to construct a good methodology. That is the true spirit of the data science process i.e. experimentation with the core elements (the dataset and the processing algorithms) in a scientific manner. In support of that process, sampling and time complexity analyses can often guide a data scientist, in a systematic manner, to the choice of a suitable ML algorithm for the job at hand.
17+
18+
---
319

420
**Learning curve**: Graphs that compares the performance of a model on training and testing data over a varying number of training instances.
521

0 commit comments

Comments
 (0)