Debian Bug report logs - #301511
sysklogd: hangs the whole system

version graph

Package: sysklogd; Maintainer for sysklogd is (unknown);

Reported by: Miquel van Smoorenburg <miquels@cistron.nl>

Date: Sat, 26 Mar 2005 13:03:03 UTC

Severity: grave

Tags: patch

Found in version 1.4.1-16

Fixed in version sysklogd/1.4.1-17

Done: Martin Schulze <joey@infodrom.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Miquel van Smoorenburg <miquels@cistron.nl>:
New Bug report received and forwarded. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: submit@bugs.debian.org
Subject: sysklogd: hangs the whole system
Date: Sat, 26 Mar 2005 13:53:10 +0100
[Message part 1 (text/plain, inline)]
Package: sysklogd
Version: 1.4.1-16
Severity: grave
Justification: breaks the whole system

References:
  http://lkml.org/lkml/2005/3/26/37
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103392
  http://lkml.org/lkml/2004/12/21/208
  http://lkml.org/lkml/2004/11/2/17

Syslogd can hang if domark() is called while the main loop is just
calling in the ctime() libc function. ctime() is not reentrant
and will make syslogd hang (with recent glibc and 2.6 kernels,
because glibc uses __libc_lock which uses a [fm]utex).

Because AF_UNIX SOCK_DGRAM sockets are blocking by default under
Linux, after a short while everything on the system that calls
syslog() will hang as well. Which means you can't login anymore,
and almost all other network services hang as well.
Also cron will fork every now and again and call syslog() so the
whole process table will fill up. The system is fubared.

The original syslogd protects itself from this by blocking SIGHUP
and SIGALRM in logmsg(), but that is under #ifndef SYSV.

The attached patch fixes this by adding POSIX sigprocmask calls
in logmsg(), as that is the simplest fix.

Non-bug related comments:

A better fix would be to just set a flag in domark() and check that
flag every so often in the mainloop and do the MARKing there.

Someone should take out all the #ifdef/#ifndef SYSV stuff and
replace it with POSIX routines so that this syslogd compiles
under all modern OSes. I think that it doesn't even compile
without #define SYSV anymore, anyway.

Mike.
[syslogd.fix (text/plain, attachment)]

Tags added: patch Request was from Miquel van Smoorenburg <mikevs@xs4all.net> to control@bugs.debian.org. (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Jeff Bailey <jbailey@ubuntu.com>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #12 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Jeff Bailey <jbailey@ubuntu.com>
To: miquels@cistron.nl
Cc: 301511@bugs.debian.org
Subject: sysklogd: hangs the whole system
Date: Thu, 31 Mar 2005 09:31:39 -0500
[Message part 1 (text/plain, inline)]
Miguel,

I was looking through the sysklogd patch that you provided, and looked
through the glibc code.  I don't see any __libc_lock magic happening.
Since ctime isn't reentrant, no effort is made to protect the coder
against threaded programming.  I also checked the 'time' function in
case you were worried about that.

Which glibc function were you worried about?

I can see this code being included for completeness so that an entry
isn't lost, but I don't think this is what's causing your breakage.

TKs,
Jeff Bailey

[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Miquel van Smoorenburg <miquels@cistron.nl>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #17 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: Jeff Bailey <jbailey@ubuntu.com>
Cc: 301511@bugs.debian.org
Subject: Re: sysklogd: hangs the whole system
Date: Thu, 31 Mar 2005 17:07:24 +0200
[Message part 1 (text/plain, inline)]
On Thu, 2005-03-31 at 09:31 -0500, Jeff Bailey wrote:
> Miguel,
> 
> I was looking through the sysklogd patch that you provided, and looked
> through the glibc code.  I don't see any __libc_lock magic happening. 

ctime() -> localtime() -> __tz_convert() -> __libc_lock_lock()

>  Since ctime isn't reentrant, no effort is made to protect the coder
> against threaded programming.  I also checked the 'time' function in
> case you were worried about that.
> 
> Which glibc function were you worried about?
> 
> I can see this code being included for completeness so that an entry
> isn't lost, but I don't think this is what's causing your breakage.

It is, I tested it :) Compile the attached C program, run with sarge
glibc on a 2.6 kernel. It's the most clear if you run it like this:

$ strace -e trace=\!time ./a.out

Lockup within a fraction of a second.

Mike.
[ctime-hang.c (text/x-csrc, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Lars Wirzenius <liw@iki.fi>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #22 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Lars Wirzenius <liw@iki.fi>
To: 301511@bugs.debian.org
Cc: Miquel van Smoorenburg <miquels@cistron.nl>
Subject: Re: sysklogd: hangs the whole system
Date: Mon, 18 Apr 2005 16:11:30 +0300
I ran Miquel's ctime-hang.c, on a sarge machine with a 2.6 kernel, but
it kept on running for many minutes (that is, the SIGALRM happened all
the time). This didn't seem like a lockup. Did I do something wrong in
my attempt to reproduce this?




Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Christian Hammers <ch@debian.org>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #27 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Christian Hammers <ch@debian.org>
To: 301511@bugs.debian.org
Cc: Miquel van Smoorenburg <miquels@cistron.nl>, Lars Wirzenius <liw@iki.fi>
Subject: Re: sysklogd: hangs the whole system
Date: Sun, 24 Apr 2005 14:05:33 +0200
For what it's worth, I also tried Miguel's ctime-hang.c on both a Sarge i386
and a Sid amd64 machine with 2.6 kernels and can reproduce the hang in 
10 of 10 attempts.

bye,

-christian-

$ strace -e trace=\!time ./a.out
...
set_thread_area({entry_number:-1 -> 6, base_addr:0x401522a0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0x40018000, 20908)               = 0
rt_sigaction(SIGALRM, {0x8048464, [], 0}, NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 1000}, it_value={0, 1000}}, NULL) = 0
brk(0)                                  = 0x804a000
brk(0x806b000)                          = 0x806b000
brk(0)                                  = 0x806b000
open("/etc/localtime", O_RDONLY)        = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=837, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40018000
--- SIGALRM (Alarm clock) @ 0 (0) ---
futex(0x4014fcec, FUTEX_WAIT, 2, NULL <unfinished ...>




Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Miquel van Smoorenburg <miquels@cistron.nl>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #32 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: Christian Hammers <ch@debian.org>
Cc: 301511@bugs.debian.org, Lars Wirzenius <liw@iki.fi>
Subject: Re: sysklogd: hangs the whole system
Date: Wed, 4 May 2005 14:11:38 +0200
Christian Hammers wrote:
> For what it's worth, I also tried Miguel's ctime-hang.c on both a Sarge i386
> and a Sid amd64 machine with 2.6 kernels and can reproduce the hang in 
> 10 of 10 attempts.

I also re-ran the ctime-hang.c test program on i386 uniprocessor
and SMP, and amd64 SMP (all up-to-date sarge and 2.6 kernel) and
ctime-hang.c locked every time right away.

Mike.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to GOTO Masanori <gotom@debian.or.jp>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #37 received at 301511@bugs.debian.org (full text, mbox, reply):

From: GOTO Masanori <gotom@debian.or.jp>
To: Miquel van Smoorenburg <miquels@cistron.nl>, 301511@bugs.debian.org
Cc: Christian Hammers <ch@debian.org>, Lars Wirzenius <liw@iki.fi>, Martin Schulze <joey@debian.org>, Jeff Bailey <jbailey@ubuntu.com>
Subject: Re: Bug#301511: sysklogd: hangs the whole system
Date: Sat, 14 May 2005 15:22:56 +0900
At Wed, 4 May 2005 14:11:38 +0200,
Miquel van Smoorenburg wrote:
> Christian Hammers wrote:
> > For what it's worth, I also tried Miguel's ctime-hang.c on both a Sarge i386
> > and a Sid amd64 machine with 2.6 kernels and can reproduce the hang in 
> > 10 of 10 attempts.
> 
> I also re-ran the ctime-hang.c test program on i386 uniprocessor
> and SMP, and amd64 SMP (all up-to-date sarge and 2.6 kernel) and
> ctime-hang.c locked every time right away.

I confirm that Miquel's ctime-hang.c stops its execution on 2.6 kernel
+ the latest glibc.  Recent glibc switches to use NPTL instead of
LinuxThreads when 2.6 kernel is used.  If you set environment variable
LD_ASSUME_KERNEL=2.4.1 and rerun his programs on 2.6 kernel, the
problem is just disappeared (because LinuxThreads is used).  Note that
NPTL uses futex for mutex protection, instead LinuxThreads uses
signal.

SUSv3, aka POSIX, defines the signal handler safe functions as
follows:

  http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04

Unfortunatelly, ctime() is not defined on this list.  So, glibc does
not guarantee the sane behavior when one uses ctime() in signal
handler.  BTW, I'm surprised that sysklogd calls some functions in
signal handler.

I have not reappeared sysklogd breakage yet, but IMHO this problem is
potentially existed - I agreed Miquel's proposal.  Miquel, did you
confirm this problem using sysklogd?  If this patch fixes this bug, I
think we should do NMU for sarge.

Regards,
-- gotom





Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Miquel van Smoorenburg <miquels@cistron.nl>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #42 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: GOTO Masanori <gotom@debian.or.jp>
Cc: 301511@bugs.debian.org, Christian Hammers <ch@debian.org>, Lars Wirzenius <liw@iki.fi>, Martin Schulze <joey@debian.org>, Jeff Bailey <jbailey@ubuntu.com>
Subject: Re: Bug#301511: sysklogd: hangs the whole system
Date: Mon, 16 May 2005 21:41:11 +0000
On Sat, 14 May 2005 08:22:56, GOTO Masanori wrote:
> I have not reappeared sysklogd breakage yet, but IMHO this problem is
> potentially existed - I agreed Miquel's proposal.  Miquel, did you
> confirm this problem using sysklogd?  If this patch fixes this bug, I
> think we should do NMU for sarge.

Yes, the reason I filed this bug was not theoretical - the
servers I run regularly hung completely because lots of processes
on the box blocked (cron calls syslog() ....), and the process
table got full. No way to login, not even on the serial console
(login calls syslog() too!), no way to recover ..

Mike.




Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze <joey@debian.org>:
Bug#301511; Package sysklogd. (full text, mbox, link).


Acknowledgement sent to Qingning Huo <qingninghuo@gmail.com>:
Extra info received and forwarded to list. Copy sent to Martin Schulze <joey@debian.org>. (full text, mbox, link).


Message #47 received at 301511@bugs.debian.org (full text, mbox, reply):

From: Qingning Huo <qingninghuo@gmail.com>
To: 301511@bugs.debian.org
Subject: more information on sysklog
Date: Tue, 24 May 2005 10:29:25 +0100
Hi,

I think this bug might be the problem bugging me for the last month. 
It is a remote box, running sarge, kernel 2.6.

Every seven or eight days, I couldn't login though ssh, or even
through serial consoles.  Apache2 runs fine though.  A reboot solves
all the problem.  But there was nothing in the log files for me to
investigate.  It seems to me syslogd just vanished.

Nice work to find this bug.  Thanks.

Qingning



Reply sent to Martin Schulze <joey@infodrom.org>:
You have taken responsibility. (full text, mbox, link).


Notification sent to Miquel van Smoorenburg <miquels@cistron.nl>:
Bug acknowledged by developer. (full text, mbox, link).


Message #52 received at 301511-close@bugs.debian.org (full text, mbox, reply):

From: Martin Schulze <joey@infodrom.org>
To: 301511-close@bugs.debian.org
Subject: Bug#301511: fixed in sysklogd 1.4.1-17
Date: Wed, 25 May 2005 14:32:02 -0400
Source: sysklogd
Source-Version: 1.4.1-17

We believe that the bug you reported is fixed in the latest version of
sysklogd, which is due to be installed in the Debian FTP archive:

klogd_1.4.1-17_i386.deb
  to pool/main/s/sysklogd/klogd_1.4.1-17_i386.deb
sysklogd_1.4.1-17.diff.gz
  to pool/main/s/sysklogd/sysklogd_1.4.1-17.diff.gz
sysklogd_1.4.1-17.dsc
  to pool/main/s/sysklogd/sysklogd_1.4.1-17.dsc
sysklogd_1.4.1-17_i386.deb
  to pool/main/s/sysklogd/sysklogd_1.4.1-17_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 301511@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Martin Schulze <joey@infodrom.org> (supplier of updated sysklogd package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Wed, 25 May 2005 20:10:31 +0200
Source: sysklogd
Binary: sysklogd klogd
Architecture: source i386
Version: 1.4.1-17
Distribution: unstable
Urgency: high
Maintainer: Martin Schulze <joey@debian.org>
Changed-By: Martin Schulze <joey@infodrom.org>
Description: 
 klogd      - Kernel Logging Daemon
 sysklogd   - System Logging Daemon
Closes: 301511
Changes: 
 sysklogd (1.4.1-17) unstable; urgency=high
 .
   * Use $(getconf LFS_CFLAGS) for large file support
   * Applied adjusted patch by Miquel van Smoorenburg to fix spurious
     hanging syslogd in connection with futex and NPTL introduced in recent
     glibc versions and Linux 2.6 (closes: Bug#301511)
Files: 
 e4d7b5bfb49f5d23b948e01dbdbfb0b6 539 base important sysklogd_1.4.1-17.dsc
 107c62e3bf41626b89050f01c5b347cb 25031 base important sysklogd_1.4.1-17.diff.gz
 ff91acac687d6a3e156af566059c0ac0 56866 base important sysklogd_1.4.1-17_i386.deb
 c1a79db3f09eb620c662c9c81cf4c2d0 38310 base important klogd_1.4.1-17_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFClMJrW5ql+IAeqTIRAveBAKCFnK8+YY5u5fOkwGGr3eUlJHZHwgCfVkep
M/uuXheH1KVXMXMVftXQyDo=
=rrUr
-----END PGP SIGNATURE-----




Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Apr 26 06:26:12 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.