Skip to content

memory utilization increment after every request, worker died, memory issue #974

@n0thing233

Description

@n0thing233

Hi, MMS by default will print memory utilization into log which is great. The problem I have is after each request to MMS, the memory utilization increment a little bit. after several requests, the memory utilization went up to 100% and worker died.
I don't think this is the right behavior right?
I tried gc.collect() in _handle function but it doesn't work.(no gpu available in this machine)
I wonder if anyone can help me out here.
here is an example:
when just started the server, the log shows:
2021-10-31 18:22:25,881 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:5.1|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704545
After one request:
mms_1 | 2021-10-31 18:24:25,742 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:26.2|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704665
After second request:
mms_1 | 2021-10-31 18:26:25,601 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:39.7|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704785
After third request:
mms_1 | 2021-10-31 18:30:25,323 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:58.5|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705025
After 4th request:
mms_1 | 2021-10-31 18:32:25,187 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:81.6|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705145
After 5th request,OOM appears:
mms_1 | 2021-10-31 18:35:41,402 [INFO ] epollEventLoopGroup-4-7 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-96795301 Worker disconnected. WORKER_MODEL_LOADED mms_1 | 2021-10-31 18:35:41,528 [DEBUG] W-9000-video_segmentation_v1 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. mms_1 | java.lang.InterruptedException mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) mms_1 | at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) mms_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) mms_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) mms_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) mms_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) mms_1 | at java.lang.Thread.run(Thread.java:748)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions