Skip to content

Commit b85bed6

Browse files
author
ivankozlov98
committed
MLTOOLS-3914 add example files with dataset and columns description
Note: mandatory check (NEED_CHECK) was skipped ref:69aea90efc938f5eb5e4c63669c8a2998d290caa
1 parent 74dac26 commit b85bed6

File tree

8 files changed

+339
-1
lines changed

8 files changed

+339
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ It's better to start CatBoost exploring from this basic tutorials.
1818

1919
### Command line
2020

21-
* [Command Line Tutorial](cmdline_tutorial.md)
21+
* [Command Line Tutorial](cmdline_tutorial/cmdline_tutorial.md)
2222
* This tutorial shows how to train and apply model with the command line tool.
2323

2424
## Classification

cmdline_tutorial/cmdline_tutorial.md

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Catboost command line tutorial
2+
### Train classification model
3+
4+
Train classification model with default params in silent mode.
5+
6+
```
7+
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss --iterations 1000 --learning-rate 0.03
8+
```
9+
10+
### Train regression model on csv file with header
11+
12+
Train regression model with 1000 trees on comma separated pool with header. If the header specifies something other than the names of the features, the model ignores it. In this case, names of features were specified in both, column description file and in header of the dataset. If feature names are specified in both places, then only ones from cd-file are used, as you can see in the example:
13+
14+
```
15+
catboost fit --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 1000 --delimiter="," --has-header
16+
```
17+
18+
### Train classification model in verbose mode with multiple error functions
19+
20+
It is possible to calc additional info while learning, such as current error on learn and current plus best error on test error. Remaining and passed time information is also displayed in verbose mode.
21+
Custom loss functions parameter allow to log additional error functions on learn and test for each iteration. The model is saved into ```model.bin``` file by default
22+
23+
```
24+
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss --iterations 1000 --custom-loss="AUC,Precision,Recall" --learning-rate 0.03 --verbose 10
25+
```
26+
Example test\_error.tsv result:
27+
```
28+
iter Logloss AUC Precision Recall
29+
0 0.6913617239 0.5 1 0
30+
1 0.6895846977 0.5520833333 1 0
31+
2 0.6881428049 0.5520833333 1 0
32+
3 0.686666081 0.5520833333 1 0
33+
4 0.6851113844 0.5520833333 1 0
34+
```
35+
36+
### Calculate Feature importances
37+
38+
Model was saved into ```model.bin``` file by default, which will be the value of -m option. The output file with data for features analysis is feature_strength.tsv:
39+
40+
```
41+
catboost fstr -m model.bin --input-path train.tsv --cd train.cd --fstr-type PredictionValuesChange -o feature_strength.tsv
42+
```
43+
44+
### Applying the model
45+
46+
Calc model predictions on ```test.tsv```, output will contain: DocId, evaluated class1 probability, target column, columns which contain names and profession, ```#3```. Where ```#3``` is third column in dataset. Results of applying the model to the eval.tsv file:
47+
48+
```
49+
catboost calc -m model.bin --input-path test.tsv --cd train.cd -o eval.tsv -T 4 --output-columns DocId,Probability,Target,name,profession,#3
50+
```
51+
52+
Example eval.tsv result:
53+
54+
```
55+
DocId Probability Target name profession #3
56+
0 0.1263071371 0 Alex doctor winter
57+
1 0.1558141636 1 Demid dentist summer
58+
2 0.3505153052 1 Valentin programmer spring
59+
3 0.5650058751 1 Ivan doctor summer
60+
4 0.102227666 0 Ivan dentist spring
61+
```
62+
63+
### Random subspace method
64+
65+
To enable rsm for feature bagging use --rsm parameter:
66+
```
67+
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss --rsm 0.5 --iterations 1000 --learning-rate 0.03
68+
```
69+
70+
### Params file
71+
72+
It is also possible to pass training parameters in a file :
73+
```
74+
{
75+
"thread_count": 4,
76+
"loss_function": "Logloss",
77+
"iterations": 400
78+
}
79+
```
80+
And run the algorithm as follows:
81+
```
82+
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --params-file params_file.txt --iterations 1000 --learning-rate 0.03
83+
```
84+
85+
If a parameter is specified in two places - in file and as a command line parameter, then the one from command line is used. This example demonstrates this behavior, because iterations are specified in both places

cmdline_tutorial/params_file.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"thread_count": 4,
3+
"loss_function": "Logloss",
4+
"iterations": 400
5+
}

cmdline_tutorial/test.csv

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
1,2,3,4,5,6
2+
0,lawyer,Ivan,spring,junior,213
3+
1,doctor,Demid,summer,junior,344
4+
1,lawyer,Milan,autumn,junior,4344
5+
1,doctor,Ivan,summer,junior,344334
6+
0,lawyer,Demid,winter,middle,21
7+
1,programmer,Demid,summer,middle,344334
8+
1,doctor,Ivan,summer,middle,4344
9+
0,programmer,Milan,summer,middle,4344
10+
0,programmer,Ivan,summer,middle,344
11+
1,lawyer,Alex,summer,senior,344334
12+
1,lawyer,Ivan,autumn,middle,4344
13+
0,lawyer,Milan,autumn,junior,344
14+
0,lawyer,Alex,spring,junior,344
15+
1,programmer,Alex,spring,middle,344354
16+
1,programmer,Demid,summer,senior,4344
17+
1,lawyer,Ivan,winter,middle,344334
18+
0,lawyer,Ivan,autumn,junior,344334
19+
1,dentist,Ivan,spring,middle,4344
20+
1,dentist,Ivan,autumn,middle,344
21+
0,programmer,Alex,summer,middle,213

cmdline_tutorial/test.tsv

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
0 doctor Alex winter senior 344
2+
1 dentist Demid summer junior 344
3+
1 programmer Valentin spring senior 344334
4+
1 doctor Ivan summer middle 344354
5+
0 dentist Ivan spring senior 4344
6+
0 dentist Valentin spring middle 344
7+
0 lawyer Milan winter middle 4344
8+
0 programmer Milan winter senior 344334
9+
1 lawyer Valentin winter senior 344334
10+
1 doctor Demid spring middle 213
11+
0 dentist Valentin winter middle 4344
12+
0 programmer Milan summer senior 344334
13+
1 dentist Valentin summer junior 344334
14+
0 doctor Ivan autumn senior 213
15+
0 dentist Valentin summer senior 213
16+
0 dentist Milan autumn senior 344354
17+
0 lawyer Demid autumn middle 4344
18+
0 programmer Valentin autumn senior 344354
19+
1 lawyer Milan autumn senior 4344
20+
1 lawyer Valentin autumn junior 344

cmdline_tutorial/train.cd

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
0 Target
2+
1 Categ profession
3+
2 Categ name
4+
3 Categ season
5+
4 Categ
6+
5 Auxiliary

cmdline_tutorial/train.csv

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
1,2,3,4,5,6
2+
1,lawyer,Milan,winter,senior,344
3+
0,lawyer,Ivan,spring,junior,344354
4+
1,dentist,Valentin,summer,middle,21
5+
0,doctor,Ivan,autumn,senior,21
6+
0,lawyer,Milan,spring,middle,213
7+
1,doctor,Alex,winter,senior,4344
8+
1,programmer,Alex,winter,junior,4344
9+
0,lawyer,Demid,spring,senior,344354
10+
0,lawyer,Milan,summer,senior,344354
11+
0,programmer,Milan,spring,senior,344
12+
0,doctor,Alex,spring,middle,21
13+
1,dentist,Ivan,winter,middle,4344
14+
1,dentist,Demid,winter,middle,344334
15+
0,programmer,Alex,winter,junior,4344
16+
0,programmer,Valentin,winter,junior,344
17+
0,dentist,Alex,autumn,middle,213
18+
1,dentist,Alex,autumn,junior,344
19+
1,dentist,Milan,winter,senior,21
20+
0,dentist,Demid,spring,senior,213
21+
1,dentist,Milan,winter,junior,4344
22+
1,programmer,Milan,winter,middle,344354
23+
0,doctor,Ivan,autumn,senior,344354
24+
1,programmer,Milan,spring,junior,344354
25+
0,doctor,Ivan,autumn,senior,344334
26+
0,dentist,Demid,summer,senior,4344
27+
0,doctor,Milan,spring,junior,344
28+
0,programmer,Milan,summer,senior,213
29+
1,doctor,Milan,winter,senior,344334
30+
1,dentist,Alex,spring,junior,344
31+
1,dentist,Milan,winter,middle,344334
32+
1,doctor,Demid,autumn,middle,21
33+
1,doctor,Milan,spring,middle,344354
34+
0,dentist,Demid,winter,senior,21
35+
0,lawyer,Milan,autumn,junior,21
36+
1,lawyer,Demid,spring,junior,344
37+
1,programmer,Valentin,autumn,junior,4344
38+
0,doctor,Valentin,autumn,middle,344354
39+
0,lawyer,Milan,autumn,middle,4344
40+
1,doctor,Valentin,winter,middle,344
41+
1,lawyer,Demid,spring,middle,213
42+
0,dentist,Ivan,summer,senior,21
43+
0,dentist,Alex,summer,junior,21
44+
1,programmer,Ivan,autumn,junior,344
45+
1,dentist,Ivan,summer,middle,213
46+
0,doctor,Alex,winter,senior,344
47+
0,doctor,Valentin,winter,middle,213
48+
1,doctor,Milan,spring,junior,4344
49+
0,dentist,Valentin,summer,junior,344
50+
1,programmer,Milan,summer,middle,344334
51+
1,dentist,Demid,spring,middle,213
52+
0,dentist,Valentin,summer,junior,344334
53+
0,doctor,Alex,summer,senior,21
54+
1,programmer,Demid,spring,junior,21
55+
1,doctor,Ivan,autumn,middle,213
56+
1,doctor,Valentin,winter,senior,4344
57+
1,programmer,Milan,winter,senior,21
58+
1,programmer,Demid,autumn,middle,344354
59+
0,programmer,Demid,autumn,middle,4344
60+
1,dentist,Demid,winter,senior,21
61+
0,doctor,Valentin,winter,senior,213
62+
1,lawyer,Valentin,autumn,junior,213
63+
1,dentist,Milan,spring,senior,213
64+
0,doctor,Milan,autumn,senior,4344
65+
0,dentist,Milan,summer,junior,21
66+
0,dentist,Demid,summer,senior,344334
67+
0,programmer,Alex,summer,senior,213
68+
1,lawyer,Demid,autumn,middle,344354
69+
0,dentist,Alex,autumn,junior,344
70+
0,dentist,Valentin,spring,senior,344354
71+
0,lawyer,Milan,spring,senior,344334
72+
0,doctor,Ivan,autumn,senior,344354
73+
1,dentist,Alex,summer,junior,213
74+
0,programmer,Valentin,winter,junior,4344
75+
1,doctor,Demid,autumn,senior,344
76+
0,programmer,Alex,winter,junior,213
77+
0,dentist,Ivan,autumn,middle,344
78+
1,dentist,Demid,winter,middle,344354
79+
0,dentist,Alex,spring,junior,21
80+
1,programmer,Alex,winter,middle,344354
81+
1,dentist,Milan,spring,senior,21
82+
0,programmer,Demid,winter,senior,344
83+
1,doctor,Ivan,spring,middle,213
84+
0,lawyer,Valentin,spring,senior,344354
85+
0,lawyer,Alex,autumn,middle,344334
86+
1,programmer,Alex,summer,middle,344
87+
0,dentist,Ivan,autumn,middle,344334
88+
0,doctor,Alex,summer,senior,344334
89+
0,lawyer,Ivan,spring,middle,21
90+
0,programmer,Demid,spring,senior,344334
91+
0,lawyer,Valentin,summer,junior,213
92+
1,doctor,Demid,autumn,senior,344
93+
1,lawyer,Ivan,winter,senior,21
94+
1,lawyer,Milan,summer,junior,21
95+
0,lawyer,Valentin,summer,junior,213
96+
1,lawyer,Alex,summer,middle,344
97+
1,lawyer,Demid,autumn,middle,344334
98+
0,lawyer,Ivan,summer,middle,21
99+
1,lawyer,Alex,winter,middle,344354
100+
0,dentist,Milan,autumn,senior,21
101+
0,lawyer,Milan,autumn,junior,4344

cmdline_tutorial/train.tsv

+100
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
0 dentist Alex winter senior 4344
2+
0 dentist Ivan winter middle 21
3+
0 dentist Ivan autumn junior 213
4+
1 dentist Milan winter senior 21
5+
0 dentist Valentin spring junior 21
6+
1 dentist Demid winter junior 344
7+
0 lawyer Milan spring senior 344354
8+
0 lawyer Valentin spring junior 344
9+
1 lawyer Ivan summer middle 344354
10+
0 lawyer Ivan autumn middle 21
11+
0 doctor Milan spring senior 344334
12+
1 programmer Demid spring junior 213
13+
0 doctor Milan winter senior 21
14+
1 dentist Demid winter junior 344354
15+
0 dentist Alex autumn senior 344334
16+
0 dentist Valentin spring senior 344334
17+
1 programmer Alex summer middle 21
18+
0 programmer Demid spring junior 213
19+
1 doctor Ivan spring middle 213
20+
1 doctor Demid spring middle 344334
21+
0 programmer Milan winter middle 213
22+
0 programmer Ivan autumn senior 21
23+
0 lawyer Demid summer middle 344
24+
0 doctor Alex summer senior 213
25+
0 programmer Demid autumn middle 344334
26+
1 programmer Alex spring junior 213
27+
0 lawyer Alex summer junior 21
28+
0 lawyer Alex summer junior 4344
29+
1 lawyer Alex winter junior 4344
30+
1 programmer Ivan summer middle 344334
31+
0 programmer Demid summer senior 4344
32+
0 doctor Ivan spring senior 213
33+
0 programmer Ivan spring senior 213
34+
0 programmer Ivan winter middle 344
35+
0 doctor Ivan winter junior 344354
36+
0 lawyer Alex spring junior 213
37+
1 programmer Milan autumn senior 213
38+
1 doctor Alex spring senior 213
39+
0 programmer Ivan spring middle 344334
40+
0 lawyer Demid autumn senior 21
41+
1 doctor Demid winter middle 21
42+
1 doctor Demid autumn middle 4344
43+
0 dentist Demid autumn middle 213
44+
0 doctor Ivan winter senior 213
45+
1 lawyer Milan autumn middle 213
46+
0 doctor Demid spring middle 344334
47+
1 dentist Valentin winter junior 344
48+
1 lawyer Demid summer middle 21
49+
1 lawyer Ivan autumn junior 344
50+
0 programmer Demid autumn middle 213
51+
1 lawyer Alex spring junior 344354
52+
0 dentist Valentin spring middle 344354
53+
1 lawyer Alex summer middle 213
54+
0 doctor Valentin spring middle 344354
55+
0 dentist Valentin autumn senior 344
56+
1 doctor Valentin winter junior 344
57+
0 lawyer Valentin autumn senior 344334
58+
0 lawyer Alex winter junior 344354
59+
0 lawyer Valentin spring junior 344
60+
1 programmer Ivan autumn middle 213
61+
0 dentist Demid winter junior 21
62+
0 programmer Demid summer middle 21
63+
0 lawyer Alex summer middle 344
64+
1 programmer Valentin spring senior 344
65+
0 dentist Milan winter junior 344
66+
0 doctor Ivan summer middle 213
67+
0 lawyer Ivan winter junior 344354
68+
1 lawyer Ivan summer middle 344
69+
0 dentist Demid summer junior 344334
70+
1 programmer Demid spring senior 344334
71+
1 doctor Milan autumn senior 4344
72+
0 dentist Milan autumn junior 213
73+
0 dentist Alex autumn senior 213
74+
1 doctor Ivan summer junior 213
75+
0 dentist Valentin spring junior 344334
76+
1 programmer Milan summer senior 344354
77+
0 doctor Milan spring senior 213
78+
1 programmer Valentin spring junior 344334
79+
0 lawyer Ivan autumn middle 21
80+
1 lawyer Ivan autumn junior 21
81+
0 dentist Milan summer senior 4344
82+
0 programmer Milan spring middle 344354
83+
0 programmer Ivan autumn junior 344334
84+
1 dentist Alex summer middle 4344
85+
1 programmer Ivan autumn junior 344354
86+
0 dentist Milan winter junior 4344
87+
0 doctor Ivan autumn senior 21
88+
0 dentist Milan autumn middle 21
89+
0 lawyer Ivan winter senior 4344
90+
0 doctor Demid autumn junior 4344
91+
0 lawyer Valentin summer junior 344354
92+
1 programmer Milan autumn middle 344334
93+
0 dentist Ivan autumn junior 21
94+
0 programmer Demid spring middle 21
95+
0 lawyer Milan summer senior 213
96+
0 programmer Alex winter junior 213
97+
0 programmer Alex spring senior 344334
98+
1 lawyer Demid summer junior 21
99+
1 dentist Demid winter senior 213
100+
0 doctor Ivan spring middle 21

0 commit comments

Comments
 (0)