strace It
hack linuxWe found an unexpected behavior during the deployment: it takes quite long for supervisord to stop a pyres worker. Since we don’t want to kill the worker process when the job is still being processed, the supervisord is configured to send QUIT signal to the worker process. The pyres signal handler is supposed to mark the global exit flag, and continue to finish the outstanding job, then exit. It takes much longer to wait for job finish though.
So I run the strace to instrument both supervisord and pyres worker:
# supervisor
sudo strace -p 2312 -o /tmp/supervisor.log
# pyres worker
sudo strace -p 7350 -r -o /tmp/pyres.log
Here is the snippet of supervisor.log
:
gettimeofday({1392164214, 250457}, NULL) = 0
kill(7350, SIGQUIT) = 0
gettimeofday({1392164214, 250900}, NULL) = 0
gettimeofday({1392164214, 250949}, NULL) = 0
... ....
gettimeofday({1392164227, 22023}, NULL) = 0
select(24, [4 5 6 8 11 14 18 22 23], [], [], {1, 0}) = 1 (in [22], left {0, 956550})
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0xffffffff) = 1
read(22, "", 131072) = 0
gettimeofday({1392164227, 66013}, NULL) = 0
The supervisor sent the SIGQUIT at 4:16:54 PM, and the child exits at 4:17:7,
13s in total. The pyres.log
is more interesting:
0.000000 recvfrom(3, 0x3793044, 8192, 0, 0, 0) = ? ERESTARTSYS (To be restarted)
2.983523 --- SIGQUIT (Quit) @ 0 (0) ---
0.000119 rt_sigreturn(0xffffffff) = -1 EINTR (Interrupted system call)
0.000289 recvfrom(3, "*-1\r\n", 8192, 0, NULL, NULL) = 5
12.193469 prctl(PR_SET_NAME, 0x7ffff96d5e82, 0x29, 0x2, 0x7f2ae32b80b2) = 0
0.000793 gettimeofday({1392164734, 82373}, NULL) = 0
0.000420 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
0.000567 write(1, "00:25:34,082 INFO [7469] [pyres"..., 73) = 73
0.000336 sendto(3, "*3\r\n$4\r\nSREM\r\n$14\r\nresque:worker"..., 65, 0, NULL, 0) = 65
0.000162 recvfrom(3, ":1\r\n", 8192, 0, NULL, NULL) = 4
0.000264 sendto(3, "*2\r\n$3\r\nDEL\r\n$45\r\nresque:worker:"..., 65, 0, NULL, 0) = 65
0.000288 recvfrom(3, ":1\r\n", 8192, 0, NULL, NULL) = 4
0.000696 sendto(3, "*2\r\n$3\r\nDEL\r\n$45\r\nresque:stat:pr"..., 65, 0, NULL, 0) = 65
0.000296 recvfrom(3, ":0\r\n", 8192, 0, NULL, NULL) = 4
0.000591 sendto(3, "*2\r\n$3\r\nDEL\r\n$42\r\nresque:stat:fa"..., 62, 0, NULL, 0) = 62
0.000264 recvfrom(3, ":0\r\n", 8192, 0, NULL, NULL) = 4
0.000310 gettimeofday({1392164734, 86646}, NULL) = 0
0.000519 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
0.000501 write(1, "00:25:34,086 INFO [7469] [panel"..., 90) = 90
0.000905 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f2ae6a74cb0}, {0x5699b0, [], SA_RESTORER, 0x7f2ae6a74cb0}, 8) = 0
0.000569 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f2ae6a74cb0}, {0x5699b0, [], SA_RESTORER, 0x7f2ae6a74cb0}, 8) = 0
0.000517 rt_sigaction(SIGUSR1, {SIG_DFL, [], SA_RESTORER, 0x7f2ae6a74cb0}, {0x5699b0, [], SA_RESTORER, 0x7f2ae6a74cb0}, 8) = 0
0.000457 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f2ae6a74cb0}, {0x5699b0, [], SA_RESTORER, 0x7f2ae6a74cb0}, 8) = 0
0.000526 close(3) = 0
0.051131 exit_group(0) = ?
The QUIT is signaled at 2.98s, then recvfrom
is recovered from the signal
handling and keep waiting until 5 bytes *-1\r\n
are received at 12.19s. I
think this is the smoking gun, but why?
The pyres implementation shows that the Worker.reserve
is implemented as
BLPOP command with timeout parameter. If it times out, and all lists are
empty, the redis just simply return *-1\r\n
. Hmm, it looks the global exit
flag won’t take effect until the reserve
method times out.
It could become worse if the QUIT is signaled just after the reserve
is
invoked, and a job is pushed just before the timeout expires, see the
infinite loop in Worker.run
method:
#!python
while True:
if self._shutdown:
logger.info('shutdown scheduled')
break
job = self.reserve(interval)
if job:
self.fork_worker(job)
else:
if interval == 0:
break
#procline @paused ? "Paused" : "Waiting for #{@queues.join(',')}"
self._setproctitle("Waiting")
#time.sleep(interval)
self.unregister_worker()
The shutdown has to wait for almost the full interval
seconds, then keep
waiting for forked child to finish the job. I think a more intuitive approach
might be: in the SIGQUIT handler, if the worker is still waiting, we should
trigger the cleanup and exit immediately.
Also the doc string is totally misleading:
``interval`` -- Number of seconds the worker will wait until processing the next job. Default is "5".
The interval has no effect on job scheduling: the reserve
method will return
as soon as a job is pushed to the queue. The loop in the Worker.work
method
should use non-blocking list pop command, LPOP with sleep
to fulfill its
commitment.