This repository was archived by the owner on Aug 11, 2023. It is now read-only.
forked from hunglc007/tensorflow-yolov4-tflite
-
Notifications
You must be signed in to change notification settings - Fork 75
This repository was archived by the owner on Aug 11, 2023. It is now read-only.
OOM when loading yolov4 for training on 8gb gpu. #75
Copy link
Copy link
Open
Labels
Description
Hello @hhk7734
On training the model on yolov4 and training it using yolov4.conv.137 weights file, I get an OOM.
batch_size = 8
input_size = 416
I am using 2.1.0 version of yolov4 from this repo.
I have changed
physical_devices = tf.config.experimental.list_physical_devices("GPU")
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
tf.config.gpu.set_memory_growth(physical_devices[0],True)to
try:
tf.config.experimental.set_virtual_device_configuration(physical_devices[0], [
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024 * 6)]
)
except:
passas set_memory_growth doesn't work.
following are the relevant dependencies.
tensorflow version is 2.2.0
cuda 10.2
cudnn 7.6.1
Here is the training script:
from tensorflow.keras import callbacks, optimizers
from yolov4.tf import SaveWeightsCallback, YOLOv4
import time
yolo = YOLOv4()
yolo.classes = "/home/user/datasets/YOLO_march3_2lines/data/voc.names"
yolo.input_size = 416
yolo.batch_size = 8
yolo.make_model()
yolo.load_weights(
"/home/user/Downloads/yolov4.conv.137",
weights_type="yolo"
)
train_data_set = yolo.load_dataset(
"/home/user/datasets/YOLO_march3_2lines/VOCdevkit/VOCPAN/ImageSets/Main/train_yolov4_py.txt",
dataset_type="yolo",
image_path_prefix="/home/user/datasets/pyyolo4_train_data",
label_smoothing=0.05
)
val_data_set = yolo.load_dataset(
"/home/user/datasets/YOLO_march3_2lines/VOCdevkit/VOCPAN/ImageSets/Main/val_yolov4_py.txt",
dataset_type="yolo",
image_path_prefix="/home/user/datasets/pyyolo4_train_data",
training=False
)
epochs = 10
lr = 1e-4
optimizer = optimizers.Adam(learning_rate=lr)
yolo.compile(optimizer=optimizer, loss_iou_type="ciou")
def lr_scheduler(epoch):
if epoch < int(epochs * 0.5):
return lr
if epoch < int(epochs * 0.8):
return lr * 0.5
if epoch < int(epochs * 0.9):
return lr * 0.1
return lr * 0.01
_callbacks = [
callbacks.LearningRateScheduler(lr_scheduler),
callbacks.TerminateOnNaN(),
callbacks.TensorBoard(
log_dir="/home/user/yolov4_text_crops/logs",
),
SaveWeightsCallback(
yolo=yolo, dir_path="/home/user/yolov4_text_crops/weights",
weights_type="yolo", epoch_per_save=10
),
]
yolo.fit(
train_data_set,
epochs=epochs,
callbacks=_callbacks,
validation_data=val_data_set,
validation_steps=50,
validation_freq=5,
steps_per_epoch=100,
)Am I doing something wrong?