Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][cpp] C++ Ray worker process number keep increasing if calling actor from workers #51711

Open
patrickiamy opened this issue Mar 26, 2025 · 2 comments
Labels
bug Something that is supposed to be working; but isn't C++-Worker core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical

Comments

@patrickiamy
Copy link

patrickiamy commented Mar 26, 2025

What happened + What you expected to happen

I set up a ray cluster with only 4 cores, then start the following program with 10000 workers each with restricted 1 CPU resource request. If I don't call any actor function inside the workers, the program can run successfully. And the worker process keep as 4. But if I call an actor function inside the workers, the worker process number keeps increasing until the error of "[2025-03-26 12:02:15,548 C 684054 684054] (raylet) worker_pool.cc:630: Failed to start worker with return value system:11: Resource temporarily unavailable"

Versions / Dependencies

ray, version 2.10.0
Python 3.8
Red Hat Enterprise Linux release 8.4

Reproduction script

#include <ray/api.h>
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>

static const std::unordered_map<std::string, double> ACTOR_RESOURCES{
    {"CPU", 1.0}, {"memory", 24.0 * 1024.0}
};

static const std::unordered_map<std::string, double> TASK_RESOURCES{
    {"CPU", 1.0}, {"memory", 24.0 * 1024.0}
};

static const char* actor_name = "actor";

class SimpleActor{
public:
    int Do(){
        std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
        return 0;
    };
    int Do1(){
        std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
        return 0;
    };
static SimpleActor*  FactoryCreate(){return new SimpleActor;}
};
RAY_REMOTE(&SimpleActor::Do, &SimpleActor::Do1, SimpleActor::FactoryCreate);

int SimpleFunc() {

    //the program can run through if remove the following 3 lines of code
    auto actor = *ray::GetActor<SimpleActor>(actor_name);
    auto future = actor.Task(&SimpleActor::Do).SetResources(TASK_RESOURCES).Remote();
    ray::Get(future);

    std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
    return 0;
}

RAY_REMOTE(&SimpleFunc);

int main(int argc, char **argv) {
    ray::Init();
    auto actor = ray::Actor(SimpleActor::FactoryCreate
            ).SetResources(ACTOR_RESOURCES).SetName(actor_name).Remote();

    std::vector<ray::ObjectRef<int>> futures;
    for (int i = 0; i < 10000; ++i) {
        auto obj = ray::Task(SimpleFunc).SetResources(TASK_RESOURCES).Remote();
        futures.push_back(obj);
    }
    ray::Get(futures);

    auto future = actor.Task(&SimpleActor::Do1
                            ).SetResources(TASK_RESOURCES).Remote();
    ray::Get(future);

    ray::Shutdown();
    return 0;
}

Issue Severity

None

@patrickiamy patrickiamy added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 26, 2025
@patrickiamy patrickiamy changed the title [<Ray component: Core>] Ray worker process number keep increasing if calling actor from workers [<Ray component: Core>,C++] Ray worker process number keep increasing if calling actor from workers Mar 26, 2025
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Mar 26, 2025
@patrickiamy
Copy link
Author

patrickiamy commented Mar 27, 2025

BTW, the corresponding python program can run through as expected (running worker process number is the same as total CPU core number)

import ray
import time

ray.init()

@ray.remote(num_cpus=1, name="actor")
class Actor:
    def do1(self):
        time.sleep(0.1)
        return 0
    def do2(self):
        time.sleep(0.2)
        return 0

@ray.remote(num_cpus=1)
def my_function():
    time.sleep(0.1)
    actor = ray.get_actor('actor')
    obj = actor.do1.remote()
    ray.get(obj)
    return 0

actor = Actor.remote()
obj_refs=[]
for _ in range(10000):
    obj_ref = my_function.remote()
    obj_refs.append(obj_ref)
obj = actor.do2.remote()

ray.get(obj_refs)
ray.get(obj)

@patrickiamy patrickiamy changed the title [<Ray component: Core>,C++] Ray worker process number keep increasing if calling actor from workers [Core] C++ Ray worker process number keep increasing if calling actor from workers Mar 31, 2025
@kevin85421 kevin85421 changed the title [Core] C++ Ray worker process number keep increasing if calling actor from workers [Core][cpp] C++ Ray worker process number keep increasing if calling actor from workers Mar 31, 2025
@kevin85421 kevin85421 added C++-Worker P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 31, 2025
@kevin85421
Copy link
Member

cc @jjyao do you know who from Ant can answer this question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't C++-Worker core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants