No public description

tensorflower-gardener · tensorflower-gardener · commit 9e04f78f3965 · 2024-10-13T16:54:07.000-07:00
PiperOrigin-RevId: 685510840
diff --git a/official/projects/waste_identification_ml/README.md b/official/projects/waste_identification_ml/README.md
@@ -49,98 +49,6 @@ Material Form Model V2 | Resnet | saved model | [click here](https://storage.goo
 Material Type Model V2| MobileNet | saved model | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/mobilenet_material.zip)
 Material Form Model V2| MobileNet | saved model | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/mobilenet_material_form.zip)
 
-
-## Training Guide
-
-1.  Create a VM instance in Compute Engine of Google Cloud Platform with desired
-    number of GPUs.
-2.  Make sure compatible Cuda version is installed. Check your GPU devices using
-    `nvidia-smi` command.
-3.  SSH to your VM instance in the Compute Engine and create a conda environment
-    `conda create -n circularnet-train python=3.11`
-4. Activate your conda environment
-    `conda activate circularnet-train`
-5. Install the following libraries
-    `pip install tensorflow[and-cuda] tf-models-official`
-6. Move your data in GCP bucket or inside the VM instance. I moved it inside
-   the VM instance. Your data should be in the TFRecords format.
-7. Move the configuration file for model training inside the VM as well.
-8. Your configuration file contains all the parameters and path to your datasets
-   Example of configuration file for GPU training has been uploaded in the same
-   directory with name `config.yaml`
-8. Create a directory where you want to save the output checkpoints.
-9. Run the following command to initiate the training -
-    `python -m official.vision.train --experiment="maskrcnn_resnetfpn_coco"
-    --mode="train_and_eval" --model_dir="output_directory"
-    --config_file="config.yaml"`
-10. You can also start a screen session and run the training in the background.
-
-## Config file parameters
-
--   `annotation_file` - path to the validation file in COCO JSON format.
--   `init_checkpoint` - path to the checkpoints for transfer learning.
--   `init_checkpoint_modules` - to load both the backbone or decoder or any one
-     of them.
--   `freeze_backbone` - if you want to freeze your backbone or not while
-     training.
--   `input_size` - image size according to which the model is trained.
--    `num_classes` - total number of classes + 1 ( background )
--   `per_category_metrics` - in case you need metric for each class
--   `global_batch_size` - batch size.
--   `input_path` - path to the dataset set.
--   `parser` - contains the data augmentation operations.
--   `steps_per_loop` - number of steps to complete one epoch. It's usually
-     `training tal data size / batch size`.
--   `summary_interval` - how often you want to plot the metric
--   `train_steps` - total steps for training. Its equal to
-     `steps_per_loop x epochs`
--   `validation_interval` - how often do you want to evaluate the validation
-     data.
--   `validation_steps` - steps to cover validation data. Its equal to
-     `validation data size / batch size`
--   `warmup_learning_rate` - it is a strategy that gradually increases the
-     learning rate from a very low value to a desired initial learning rate over
-     a predefined number of iterations or epochs.
-     To stabilize training in the early stages by allowing the model to adapt to
-     the data slowly before using a higher learning rate.
--   `warmup_steps` - steps for the warmup learning rate
--   `initial_learning_rate` - The initial learning rate is the value of the
-     learning rate at the very start of the training process.
--   `checkpoint_interval` - number of steps to export the model.
-
-A common practice to calculate the parameters are below:
-
-```python
-total_training_samples = 4389
-total_validation_samples = 485
-
-train_batch_size = 512
-val_batch_size = 128
-num_epochs = 700
-warmup_learning_rate = 0.0001
-initial_learning_rate = 0.001
-
-steps_per_loop = total_training_samples // train_batch_size
-summary_interval = steps_per_loop
-train_steps = num_epochs * steps_per_loop
-validation_interval = steps_per_loop
-validation_steps = total_validation_samples // val_batch_size
-warmup_steps = steps_per_loop * 10
-checkpoint_interval = steps_per_loop * 5
-decay_steps = int(train_steps)
-
-print(f'steps_per_loop: {steps_per_loop}')
-print(f'summary_interval: {summary_interval}')
-print(f'train_steps: {train_steps}')
-print(f'validation_interval: {validation_interval}')
-print(f'validation_steps: {validation_steps}')
-print(f'warmup_steps: {warmup_steps}')
-print(f'warmup_learning_rate: {warmup_learning_rate}')
-print(f'initial_learning_rate: {initial_learning_rate}')
-print(f'decay_steps: {decay_steps}')
-print(f'checkpoint_interval: {checkpoint_interval}')
-```
-
 ## Authors and Maintainers
 - Umair Sabir
 - Sujit Sanjeev
diff --git a/official/projects/waste_identification_ml/fine_tuning/README.md b/official/projects/waste_identification_ml/fine_tuning/README.md
@@ -0,0 +1,90 @@
+# CircularNet Fune-tuning Guide
+
+## Below are the steps to fune-tune CircularNet on a custom training dataset
+
+1.  Create a VM instance in Compute Engine of Google Cloud Platform with desired
+    number of GPUs.
+2.  Install compatible Cuda version, and validate GPU devices using
+    `nvidia-smi` command.
+3.  SSH to the VM instance in Compute Engine and create a conda environment
+    `conda create -n circularnet-train python=3.11`
+4. Activate the conda environment
+    `conda activate circularnet-train`
+5. Install the following libraries
+    `pip install tensorflow[and-cuda] tf-models-official`
+6. Move training data in TFRecord format to a GCP bucket, or into the VM
+     instance.
+7. Move the configuration file for model training into the VM. The configuration
+     file contains all the parameters and path to datasets. A sample
+     configuration file `config.yaml` has been provided for GPU training, and
+     description of few entries is provided below.
+8. Create a directory to save the output checkpoints.
+9. Run the following command to initiate the training -
+    `python -m official.vision.train --experiment="circularnet_finetuning"
+    --mode="train_and_eval" --model_dir="output_directory"
+    --config_file="config.yaml"`
+10. Training can also be run in the background by starting a screen session.
+
+## Config file parameters
+
+-   `annotation_file` - path to the validation file in COCO JSON format.
+-   `init_checkpoint` - path to the checkpoints for transfer learning, these
+     be the CircularNet checkpoints.
+-   `init_checkpoint_modules` - to load both the backbone or decoder or any one
+     of them.
+-   `freeze_backbone` - to freeze backbone while training.
+-   `input_size` - image size according to which the model is trained.
+-   `num_classes` - total number of classes + 1 ( background )
+-   `per_category_metrics` - to derive metric for each class
+-   `global_batch_size` - batch size.
+-   `input_path` - path to the input dataset set.
+-   `parser` - contains the data augmentation operations.
+-   `steps_per_loop` - number of steps to complete one epoch. It's usually
+     `training data size / batch size`.
+-   `summary_interval` - interval to plot the metrics
+-   `train_steps` - total steps for training. Its equal to
+     `steps_per_loop x epochs`
+-   `validation_interval` - interval to evaluate the validation data.
+-   `validation_steps` - steps to cover validation data. Its equal to
+     `validation data size / batch size`
+-   `warmup_learning_rate` - the warm-up phase is an initial stage in the
+     training process where the learning rate is gradually increased from a very
+     low value to the base learning rate. The warmup_learning_rate is typically
+     set to a small fraction of the base learning rate
+-   `warmup_steps` - steps for the warmup learning rate
+-   `initial_learning_rate` - The initial learning rate is the value of the
+     learning rate at the very start of the training process.
+-   `checkpoint_interval` - number of steps to export the model.
+
+## A common practice to calculate the parameters are below:
+
+```python
+total_training_samples = 4389
+total_validation_samples = 485
+
+train_batch_size = 512
+val_batch_size = 128
+num_epochs = 700
+warmup_learning_rate = 0.0001
+initial_learning_rate = 0.001
+
+steps_per_loop = total_training_samples // train_batch_size
+summary_interval = steps_per_loop
+train_steps = num_epochs * steps_per_loop
+validation_interval = steps_per_loop
+validation_steps = total_validation_samples // val_batch_size
+warmup_steps = steps_per_loop * 10
+checkpoint_interval = steps_per_loop * 5
+decay_steps = int(train_steps)
+
+print(f'steps_per_loop: {steps_per_loop}')
+print(f'summary_interval: {summary_interval}')
+print(f'train_steps: {train_steps}')
+print(f'validation_interval: {validation_interval}')
+print(f'validation_steps: {validation_steps}')
+print(f'warmup_steps: {warmup_steps}')
+print(f'warmup_learning_rate: {warmup_learning_rate}')
+print(f'initial_learning_rate: {initial_learning_rate}')
+print(f'decay_steps: {decay_steps}')
+print(f'checkpoint_interval: {checkpoint_interval}')
+```
diff --git a/official/projects/waste_identification_ml/fine_tuning/config.yaml b/official/projects/waste_identification_ml/fine_tuning/config.yaml