ETA tasks lost on worker TERM restart #533

safwanebrahim-uc · 2015-10-27T11:05:09Z

I have clusters of workers listening to a redis server. When i do restart worker(using salt stack), the running tasks will comple execution since its a TERM signal, but all the scheduled tasks (ETA set) are lost

ask · 2015-10-27T17:36:07Z

Do you have any steps to reproduce this?

What does the worker log say at shutdown?

safwanebrahim-uc · 2015-10-28T13:24:20Z

I have around 20 instances running as a cluster, consume from a redis server located in a separate instance. I use saltstack to deploy things on these 20 instances. Will be doing worker restart as well along with deploys.

This is an instance celery log
[2015-10-27 11:25:11,889: INFO/MainProcess] mingle: searching for neighbors
[2015-10-27 11:25:12,896: INFO/MainProcess] mingle: all alone

and it seems the instances do not sync each other since worker restart happenig in same point in time.

Problem found is that when we restart worker during execution of a task, all the scheduled task will be lost from the queue

I use supervisor as daemon process for worker process

For example. Test1 and Test2 are two consumer instances of a queue. i have added two tasks, T1(countdown = 60), T2(countdown = 86400).

I just restarted the worker using saltstack, now both the tasks are there in Test2

worker: Warm shutdown (MainProcess)
[2015-10-27 11:25:09,598: WARNING/MainProcess] Restoring 1 unacknowledged message(s).
[2015-10-27 11:25:11,861: WARNING/MainProcess] /usr/local/lib/python2.7/dist-packages/celery/apps/worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.

The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.

If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::

CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']

You must only enable the serializers that you will actually use.

warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2015-10-27 11:25:11,881: INFO/MainProcess] Connected to redis://10.83.73.169:9875/0
[2015-10-27 11:25:11,889: INFO/MainProcess] mingle: searching for neighbors
[2015-10-27 11:25:12,896: INFO/MainProcess] mingle: all alone
[2015-10-27 11:25:12,913: WARNING/MainProcess] [email protected] ready.
[2015-10-27 11:25:13,127: INFO/MainProcess] Received task: taskXXX.Task_XXX[134feb23-b82f-4cd2-8974-2a6b150a9386] eta:[2016-02-04 12:25:58.399643+00:00]
[2015-10-27 11:25:13,129: INFO/MainProcess] Received task: task_XXX.Task_XXX[80c608c3-b7fa-4318-8df9-73e435df514d] eta:[2015-10-27 11:26:19.881562+00:00]

i restarted the supervisor when a task T1 (scheduled at 2015-10-27 11:26:19.881562+00:00 ) is running

The running task finishes the execution as i see my database since it does some updates in my tables , and the task T2 (scheduled to run at 2016-02-04 12:25:58.399643+00:00) is not found in the queue (not restored by either Test1 or Test2)

ask · 2015-10-28T19:13:50Z

Start the worker with -l debug please, to get more logging.

scythargon · 2019-12-23T20:07:01Z

Same here in 2019

nathanielastudillo · 2020-01-21T01:16:41Z

Having this issue with RabbitMQ as well

auvipy · 2020-01-28T13:07:41Z

celery n kombu version?

mkuchen · 2020-03-06T16:50:15Z

Also having this issue with a task ETA that is scheduled to occur after a proc restart

kombu==4.6.4
celery==4.3.0

gdvalderrama · 2022-12-12T11:56:42Z

@auvipy so just to check, is this solved in some version? Or is it an ongoing issue?

auvipy · 2022-12-14T12:26:32Z

I don't know! need some one to verify in production using the latest development branch. but most probably no

torotil · 2023-05-15T13:44:20Z

I’m observing the same issue in production with a very simple setup: Single worker, single task, redis backend, unlimited auto-retry, acks_late. Versions are:

kombu==5.2.4
celery==5.2.7

kombu said

Restoring xxx unacknowledged message(s)

in the logs. But the Redis DB is empty now.

torotil · 2023-05-26T08:41:31Z

I can reproduce this very easily in my local dev setup (vagrant box) and might help to track it down.

In [9]: worker.APP.send_task("logcrm_event_bus.send_event", [{}, "impact-stack>test"], countdown=2*3600)
Out[9]: <AsyncResult: 63e995fe-012f-497d-a523-c02211e48362>

In [10]: i.scheduled()
Out[10]: 
{'celery@impactbox': [{'eta': '2023-05-26T10:35:45.630970+00:00',
   'priority': 6,
   'request': {'id': '63e995fe-012f-497d-a523-c02211e48362',
    'name': 'logcrm_event_bus.send_event',
    'args': [{}, 'impact-stack>test'],
    'kwargs': {},
    'type': 'logcrm_event_bus.send_event',
    'hostname': 'celery@impactbox',
    'time_start': None,
    'acknowledged': False,
    'delivery_info': {'exchange': '',
     'routing_key': 'celery',
     'priority': 0,
     'redelivered': None},
    'worker_pid': None}}]}

In another shell I restart the worker service and the task is gone.

In [11]: i.scheduled()
Out[11]: {'celery@impactbox': []}

Here is the output of the celery worker service with debug log-level: systemctl-debug.txt

torotil · 2023-05-26T13:15:55Z

Here is the service file used to launch the worker logcrm-event-bus.service. Maybe that’s where the problem lies.

torotil · 2023-05-30T15:47:24Z

journal-20230530-rabbitmq.txt

I’ve tried reproducing this with the same setup but using RabbitMQ and the problem did not occur. Note that the service shuts down successfully in this case too (with the same service file).

safwanebrahim-uc mentioned this issue Oct 27, 2015

ETA tasks lost with redis celery/celery#1810

Closed

ask added the Component: Redis Broker label Oct 27, 2015

ask closed this as completed Jun 24, 2016

auvipy reopened this Dec 23, 2019

auvipy added this to the 4.7 milestone May 5, 2020

auvipy modified the milestones: 5.1.0, 5.3 Sep 12, 2021

auvipy added the Issue Type: Bug Report label May 15, 2023

auvipy modified the milestones: 5.3, 5.4 Jun 17, 2023

torotil mentioned this issue Oct 13, 2023

Don’t list redis as stable backend and/or link to issues/caveats celery/celery#8568

Closed

4 tasks

Nusnus mentioned this issue Sep 3, 2024

Added worker_enable_soft_shutdown_on_idle celery/celery#9231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETA tasks lost on worker TERM restart #533

ETA tasks lost on worker TERM restart #533

safwanebrahim-uc commented Oct 27, 2015

ask commented Oct 27, 2015

safwanebrahim-uc commented Oct 28, 2015

ask commented Oct 28, 2015

scythargon commented Dec 23, 2019

nathanielastudillo commented Jan 21, 2020

auvipy commented Jan 28, 2020

mkuchen commented Mar 6, 2020

gdvalderrama commented Dec 12, 2022

auvipy commented Dec 14, 2022

torotil commented May 15, 2023

torotil commented May 26, 2023 •

edited

Loading

torotil commented May 26, 2023

torotil commented May 30, 2023

ETA tasks lost on worker TERM restart #533

ETA tasks lost on worker TERM restart #533

Comments

safwanebrahim-uc commented Oct 27, 2015

ask commented Oct 27, 2015

safwanebrahim-uc commented Oct 28, 2015

ask commented Oct 28, 2015

scythargon commented Dec 23, 2019

nathanielastudillo commented Jan 21, 2020

auvipy commented Jan 28, 2020

mkuchen commented Mar 6, 2020

gdvalderrama commented Dec 12, 2022

auvipy commented Dec 14, 2022

torotil commented May 15, 2023

torotil commented May 26, 2023 • edited Loading

torotil commented May 26, 2023

torotil commented May 30, 2023

torotil commented May 26, 2023 •

edited

Loading