Skip to content

Conversation

@maheshambule
Copy link
Contributor

@maheshambule maheshambule commented Aug 24, 2020

Issue #, if available:

The orphan processes get created when you fire multiple register and unregister calls on same model one after another.
The orphan worker processes hogs the system memory.

Description of changes:

  1. Send sigterm to Main Worker Thread from frontend by using destroy call instead of destroyForcefully.
  2. Handle sigterm and kill all the child workers and current process.
  3. Add SIGCHLD handler to handle zombie processes. Code taken from here: maaquib@6ed099a#diff-efa92912588641e6a3b20d8900316be2R167

Testing done:

To run CI tests on your changes refer README.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@maheshambule maheshambule changed the title Ensure workers get killed on unregsiter call Ensure workers get killed on unregister call Aug 24, 2020
@dk19y
Copy link
Contributor

dk19y commented Nov 25, 2020

@maheshambule , Can you please do a small write regarding the testing that was done for this PR.

Please add some context / behavior before the fix / behavior after the fix with steps & logs for both the cases.

@dk19y dk19y requested review from dk19y and maaquib November 25, 2020 23:18
Copy link

@ayushsengupta1991 ayushsengupta1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested that the worker processes actually get destroyed on an unregister call.

@kastman
Copy link

kastman commented Dec 1, 2021

I'm relatively sure I'm seeing this behavior as well - memory used on invocation/handle doesn't seem to be garbage collected and I'm quickly hitting 100% even on relatively large 3 x 18xlarge instances in the context of Sagemaker multi-model endpoints. The changes from @maheshambule were approved but never merged - is that because it needed documentation? @ayushsengupta1991 @dhanainme I'm happy to write something if it would get the code accepted upstream

@kastman
Copy link

kastman commented Dec 1, 2021

(The CI failure isn't accessible anymore or I'd go in and comment.) Hoping to get this bumped / helped. Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants