Skip to content

Commit 9e04f78

Browse files
No public description
PiperOrigin-RevId: 685510840
1 parent 7dc7c86 commit 9e04f78

File tree

3 files changed

+90
-92
lines changed

3 files changed

+90
-92
lines changed

official/projects/waste_identification_ml/README.md

-92
Original file line numberDiff line numberDiff line change
@@ -49,98 +49,6 @@ Material Form Model V2 | Resnet | saved model | [click here](https://storage.goo
4949
Material Type Model V2| MobileNet | saved model | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/mobilenet_material.zip)
5050
Material Form Model V2| MobileNet | saved model | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/mobilenet_material_form.zip)
5151

52-
53-
## Training Guide
54-
55-
1. Create a VM instance in Compute Engine of Google Cloud Platform with desired
56-
number of GPUs.
57-
2. Make sure compatible Cuda version is installed. Check your GPU devices using
58-
`nvidia-smi` command.
59-
3. SSH to your VM instance in the Compute Engine and create a conda environment
60-
`conda create -n circularnet-train python=3.11`
61-
4. Activate your conda environment
62-
`conda activate circularnet-train`
63-
5. Install the following libraries
64-
`pip install tensorflow[and-cuda] tf-models-official`
65-
6. Move your data in GCP bucket or inside the VM instance. I moved it inside
66-
the VM instance. Your data should be in the TFRecords format.
67-
7. Move the configuration file for model training inside the VM as well.
68-
8. Your configuration file contains all the parameters and path to your datasets
69-
Example of configuration file for GPU training has been uploaded in the same
70-
directory with name `config.yaml`
71-
8. Create a directory where you want to save the output checkpoints.
72-
9. Run the following command to initiate the training -
73-
`python -m official.vision.train --experiment="maskrcnn_resnetfpn_coco"
74-
--mode="train_and_eval" --model_dir="output_directory"
75-
--config_file="config.yaml"`
76-
10. You can also start a screen session and run the training in the background.
77-
78-
## Config file parameters
79-
80-
- `annotation_file` - path to the validation file in COCO JSON format.
81-
- `init_checkpoint` - path to the checkpoints for transfer learning.
82-
- `init_checkpoint_modules` - to load both the backbone or decoder or any one
83-
of them.
84-
- `freeze_backbone` - if you want to freeze your backbone or not while
85-
training.
86-
- `input_size` - image size according to which the model is trained.
87-
- `num_classes` - total number of classes + 1 ( background )
88-
- `per_category_metrics` - in case you need metric for each class
89-
- `global_batch_size` - batch size.
90-
- `input_path` - path to the dataset set.
91-
- `parser` - contains the data augmentation operations.
92-
- `steps_per_loop` - number of steps to complete one epoch. It's usually
93-
`training tal data size / batch size`.
94-
- `summary_interval` - how often you want to plot the metric
95-
- `train_steps` - total steps for training. Its equal to
96-
`steps_per_loop x epochs`
97-
- `validation_interval` - how often do you want to evaluate the validation
98-
data.
99-
- `validation_steps` - steps to cover validation data. Its equal to
100-
`validation data size / batch size`
101-
- `warmup_learning_rate` - it is a strategy that gradually increases the
102-
learning rate from a very low value to a desired initial learning rate over
103-
a predefined number of iterations or epochs.
104-
To stabilize training in the early stages by allowing the model to adapt to
105-
the data slowly before using a higher learning rate.
106-
- `warmup_steps` - steps for the warmup learning rate
107-
- `initial_learning_rate` - The initial learning rate is the value of the
108-
learning rate at the very start of the training process.
109-
- `checkpoint_interval` - number of steps to export the model.
110-
111-
A common practice to calculate the parameters are below:
112-
113-
```python
114-
total_training_samples = 4389
115-
total_validation_samples = 485
116-
117-
train_batch_size = 512
118-
val_batch_size = 128
119-
num_epochs = 700
120-
warmup_learning_rate = 0.0001
121-
initial_learning_rate = 0.001
122-
123-
steps_per_loop = total_training_samples // train_batch_size
124-
summary_interval = steps_per_loop
125-
train_steps = num_epochs * steps_per_loop
126-
validation_interval = steps_per_loop
127-
validation_steps = total_validation_samples // val_batch_size
128-
warmup_steps = steps_per_loop * 10
129-
checkpoint_interval = steps_per_loop * 5
130-
decay_steps = int(train_steps)
131-
132-
print(f'steps_per_loop: {steps_per_loop}')
133-
print(f'summary_interval: {summary_interval}')
134-
print(f'train_steps: {train_steps}')
135-
print(f'validation_interval: {validation_interval}')
136-
print(f'validation_steps: {validation_steps}')
137-
print(f'warmup_steps: {warmup_steps}')
138-
print(f'warmup_learning_rate: {warmup_learning_rate}')
139-
print(f'initial_learning_rate: {initial_learning_rate}')
140-
print(f'decay_steps: {decay_steps}')
141-
print(f'checkpoint_interval: {checkpoint_interval}')
142-
```
143-
14452
## Authors and Maintainers
14553
- Umair Sabir
14654
- Sujit Sanjeev
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# CircularNet Fune-tuning Guide
2+
3+
## Below are the steps to fune-tune CircularNet on a custom training dataset
4+
5+
1. Create a VM instance in Compute Engine of Google Cloud Platform with desired
6+
number of GPUs.
7+
2. Install compatible Cuda version, and validate GPU devices using
8+
`nvidia-smi` command.
9+
3. SSH to the VM instance in Compute Engine and create a conda environment
10+
`conda create -n circularnet-train python=3.11`
11+
4. Activate the conda environment
12+
`conda activate circularnet-train`
13+
5. Install the following libraries
14+
`pip install tensorflow[and-cuda] tf-models-official`
15+
6. Move training data in TFRecord format to a GCP bucket, or into the VM
16+
instance.
17+
7. Move the configuration file for model training into the VM. The configuration
18+
file contains all the parameters and path to datasets. A sample
19+
configuration file `config.yaml` has been provided for GPU training, and
20+
description of few entries is provided below.
21+
8. Create a directory to save the output checkpoints.
22+
9. Run the following command to initiate the training -
23+
`python -m official.vision.train --experiment="circularnet_finetuning"
24+
--mode="train_and_eval" --model_dir="output_directory"
25+
--config_file="config.yaml"`
26+
10. Training can also be run in the background by starting a screen session.
27+
28+
## Config file parameters
29+
30+
- `annotation_file` - path to the validation file in COCO JSON format.
31+
- `init_checkpoint` - path to the checkpoints for transfer learning, these
32+
be the CircularNet checkpoints.
33+
- `init_checkpoint_modules` - to load both the backbone or decoder or any one
34+
of them.
35+
- `freeze_backbone` - to freeze backbone while training.
36+
- `input_size` - image size according to which the model is trained.
37+
- `num_classes` - total number of classes + 1 ( background )
38+
- `per_category_metrics` - to derive metric for each class
39+
- `global_batch_size` - batch size.
40+
- `input_path` - path to the input dataset set.
41+
- `parser` - contains the data augmentation operations.
42+
- `steps_per_loop` - number of steps to complete one epoch. It's usually
43+
`training data size / batch size`.
44+
- `summary_interval` - interval to plot the metrics
45+
- `train_steps` - total steps for training. Its equal to
46+
`steps_per_loop x epochs`
47+
- `validation_interval` - interval to evaluate the validation data.
48+
- `validation_steps` - steps to cover validation data. Its equal to
49+
`validation data size / batch size`
50+
- `warmup_learning_rate` - the warm-up phase is an initial stage in the
51+
training process where the learning rate is gradually increased from a very
52+
low value to the base learning rate. The warmup_learning_rate is typically
53+
set to a small fraction of the base learning rate
54+
- `warmup_steps` - steps for the warmup learning rate
55+
- `initial_learning_rate` - The initial learning rate is the value of the
56+
learning rate at the very start of the training process.
57+
- `checkpoint_interval` - number of steps to export the model.
58+
59+
## A common practice to calculate the parameters are below:
60+
61+
```python
62+
total_training_samples = 4389
63+
total_validation_samples = 485
64+
65+
train_batch_size = 512
66+
val_batch_size = 128
67+
num_epochs = 700
68+
warmup_learning_rate = 0.0001
69+
initial_learning_rate = 0.001
70+
71+
steps_per_loop = total_training_samples // train_batch_size
72+
summary_interval = steps_per_loop
73+
train_steps = num_epochs * steps_per_loop
74+
validation_interval = steps_per_loop
75+
validation_steps = total_validation_samples // val_batch_size
76+
warmup_steps = steps_per_loop * 10
77+
checkpoint_interval = steps_per_loop * 5
78+
decay_steps = int(train_steps)
79+
80+
print(f'steps_per_loop: {steps_per_loop}')
81+
print(f'summary_interval: {summary_interval}')
82+
print(f'train_steps: {train_steps}')
83+
print(f'validation_interval: {validation_interval}')
84+
print(f'validation_steps: {validation_steps}')
85+
print(f'warmup_steps: {warmup_steps}')
86+
print(f'warmup_learning_rate: {warmup_learning_rate}')
87+
print(f'initial_learning_rate: {initial_learning_rate}')
88+
print(f'decay_steps: {decay_steps}')
89+
print(f'checkpoint_interval: {checkpoint_interval}')
90+
```

0 commit comments

Comments
 (0)