Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor HpoService #9

Merged
merged 2 commits into from
Apr 8, 2022
Merged

Conversation

johnaohara
Copy link
Contributor

@johnaohara johnaohara commented Apr 5, 2022

  • Decouple REST service from optuna_hpo with a new class
    HpoService. This allow further apis to be introduced
  • Managed thread creation in hpo_service
  • Use Thread wait()/notify() to sync threads for providing trial data
    to and optuna Objective function
  • Use Thread wait()/notify() to sync threads when a new experiment is
    started
  • Sync rest and experiment threads to ensure experiment is ready before
    returning from rest call
  • Ensure thread safe access to TrialDetails objects

@johnaohara
Copy link
Contributor Author

Hi @dinogun where is the best place to discuss the ideas behind this PR. Here or slack?

@dinogun
Copy link
Contributor

dinogun commented Apr 5, 2022

@johnaohara Thanks for the PR, @khansaad is working on the thread changes as well, but looks like you beat him to it! Let me open issues so that we have better co-ordination here.

@dinogun
Copy link
Contributor

dinogun commented Apr 5, 2022

Fixes #10

@johnaohara
Copy link
Contributor Author

@dinogun @khansaad thats great. Yeah, opening an issue to discuss would be great. I am not wedded to these changes, If there is a different impl being worked on, that's great as well.

This PR was really to do the ground work for this branch: https://github.com/johnaohara/hpo/tree/gRPC

I am integrating HPO into our workflows, so I have created a python gRPC cli client (https://github.com/johnaohara/hpo/blob/gRPC/src/gRPC_client.py) and have a Java client as well (have not pushed it anywhere yet)

@johnaohara
Copy link
Contributor Author

@dinogun one question I did have was how do you run the test suite? I see it is lifted from the Autotune project, but can not see how to run the HPO tests

@dinogun
Copy link
Contributor

dinogun commented Apr 5, 2022

@chandrams is updating the tests, she will be submitting a PR shortly

@dinogun
Copy link
Contributor

dinogun commented Apr 5, 2022

@dinogun @khansaad thats great. Yeah, opening an issue to discuss would be great. I am not wedded to these changes, If there is a different impl being worked on, that's great as well.

This PR was really to do the ground work for this branch: https://github.com/johnaohara/hpo/tree/gRPC

I am integrating HPO into our workflows, so I have created a python gRPC cli client (https://github.com/johnaohara/hpo/blob/gRPC/src/gRPC_client.py) and have a Java client as well (have not pushed it anywhere yet)

I think the changes are looking good here! so we can continue to use this PR

@johnaohara johnaohara force-pushed the hpoServiceRefactor branch 2 times, most recently from de35152 to 540a2ac Compare April 5, 2022 13:05
 - Decouple REST service from optuna_hpo with a new class
   HpoService. Allow further apis to be introduced
 - Managed thread creation in hpo_service
 - Use Thread wait()/notify() to sync threads for providing trial data
   to optuna Objective function
 - Use Thread wait()/notify() to sync threads when a new experiment is
   started
 - Sync rest and experiment threads to ensure experiment is ready before
   returning from rest call
 - Ensure thread safe access to TrialDetails objects
@johnaohara johnaohara force-pushed the hpoServiceRefactor branch from 540a2ac to d6ad505 Compare April 7, 2022 09:30
@johnaohara
Copy link
Contributor Author

There is not currently a working testsuite, so testing these changes have been performed using the following script;

#!/bin/bash

curl -s -H 'Content-Type: application/json' \
http://localhost:8085/experiment_trials \
--data-binary @- << EOF
{
    "operation" : "EXP_TRIAL_GENERATE_NEW", 
    "search_space":
        {
            "experiment_name": "petclinic-sample-2-75884c5549-npvgd",
            "total_trials": 100,
            "parallel_trials": 1,
            "experiment_id": "a123",
            "value_type": "double",
            "hpo_algo_impl": "optuna_tpe",
            "objective_function": "transaction_response_time",
            "tunables": [
                {
                    "value_type": "double",
                    "lower_bound": 150,
                    "name": "memoryRequest",
                    "upper_bound": 300,
                    "step": 1
                },
                {
                    "value_type": "double",
                    "lower_bound": 1.0,
                    "name": "cpuRequest",
                    "upper_bound": 3.0,
                    "step": 0.01
                }
            ],
            "slo_class": "response_time",
            "direction": "minimize"
        }
}
EOF
echo ""
sleep 2

curl -s -H 'Accept: application/json' "http://localhost:8085/experiment_trials?experiment_id=a123&trial_number=0"
echo ""
sleep 2


curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "trial_number": 0, "trial_result": "success", "result_value_type": "double", "result_value": 98.78, "operation" : "EXP_TRIAL_RESULT"}'
echo ""
sleep 3

curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "operation" : "EXP_TRIAL_GENERATE_SUBSEQUENT"}'
echo ""
sleep 3

curl -H 'Accept: application/json' "http://localhost:8085/experiment_trials?experiment_id=a123&trial_number=1"
echo ""
sleep 2

curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "trial_number": 1, "trial_result": "success", "result_value_type": "double", "result_value": 98.78, "operation" : "EXP_TRIAL_RESULT"}'
echo ""
sleep 3

curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "operation" : "EXP_TRIAL_GENERATE_SUBSEQUENT"}'
echo ""
sleep 3

curl -s -H 'Accept: application/json' "http://localhost:8085/experiment_trials?experiment_id=a123&trial_number=2"
echo ""
sleep 2

curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "trial_number": 2, "trial_result": "success", "result_value_type": "double", "result_value": 98.78, "operation" : "EXP_TRIAL_RESULT"}'
echo ""
sleep 3

curl -s -H 'Content-Type: application/json' http://localhost:8085/experiment_trials -d '{"experiment_id" : "a123", "operation" : "EXP_TRIAL_GENERATE_SUBSEQUENT"}'
echo ""

Output from current main branch;

$ ./runSeq.sh 
0
[{"tunable_name": "memoryRequest", "tunable_value": 196.0}, {"tunable_name": "cpuRequest", "tunable_value": 1.75}]
0
1
[{"tunable_name": "memoryRequest", "tunable_value": 187.0}, {"tunable_name": "cpuRequest", "tunable_value": 2.1100000000000003}]
0
2
[{"tunable_name": "memoryRequest", "tunable_value": 194.0}, {"tunable_name": "cpuRequest", "tunable_value": 2.59}]
0
3

Output from this PR;

$ ./runSeq.sh 
0
[{"tunable_name": "memoryRequest", "tunable_value": 286.0}, {"tunable_name": "cpuRequest", "tunable_value": 1.8}]
0
1
[{"tunable_name": "memoryRequest", "tunable_value": 202.0}, {"tunable_name": "cpuRequest", "tunable_value": 2.64}]
0
2
[{"tunable_name": "memoryRequest", "tunable_value": 161.0}, {"tunable_name": "cpuRequest", "tunable_value": 1.83}]
0
3

@johnaohara
Copy link
Contributor Author

AFAICS the behaviour is unchanged

@johnaohara johnaohara marked this pull request as ready for review April 7, 2022 10:40
Copy link
Contributor

@dinogun dinogun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! One question, do we need to handle spurious wakeups with threading.Condition()? I guess we can add that in a separate PR. Also something that @khansaad can take a look at.

@johnaohara
Copy link
Contributor Author

@dinogun no, interrupted threads are not handled in this PR. There are a number of areas for improvement;

  • Handling of wait() timeouts, ideally we shouldn't be blocking forever, but handling some form of timeout
  • Experiment thread tear down, e.g. when process is terminated, the experiment threads need to be stopped gracefully
  • Possible replace threading conditions with co-routines

I am not proficient in Python, the most I have used it for previously is writing some simple scripts, so would either need to research these concerns, or leave for someone who has more experience

@dinogun
Copy link
Contributor

dinogun commented Apr 8, 2022

@dinogun no, interrupted threads are not handled in this PR. There are a number of areas for improvement;

  • Handling of wait() timeouts, ideally we shouldn't be blocking forever, but handling some form of timeout
  • Experiment thread tear down, e.g. when process is terminated, the experiment threads need to be stopped gracefully
  • Possible replace threading conditions with co-routines

I am not proficient in Python, the most I have used it for previously is writing some simple scripts, so would either need to research these concerns, or leave for someone who has more experience

This is looking great, thanks for your contribution. @khansaad will take up the improvements in a separate PR!

Copy link
Contributor

@dinogun dinogun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dinogun dinogun merged commit 9aa51d4 into kruize:main Apr 8, 2022
@johnaohara johnaohara deleted the hpoServiceRefactor branch June 21, 2022 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants