Open Source Support Tools
 
Search Item
 
Summary
  Reported Issue
Title: [8502] fast, big untar to fat32 fails with segmentation fault
Project: kernel
Item Last Modified: Tue, 23 Sep 2008 13:10:56
Tags:  
 
 
2.6.19 2.6.21 Bug ahci buffer call complete debug detected device die disk dmesg driver drivers dump exist ext3 fast fat fault filesystem filesystems fix foobar free gettimeofday glibc glibc-2.5 hardware high hp href i386 ibm info kernel level libata line linux linux-kernel linux-kernel-2.6 load locked locks long low memory mode module mount mounted patch process protect put random read recent sata scsi sd sectors segfault send size slow start static support switch tainted tested this trace wait write
Details
[8502] fast, big untar to fat32 fails with segmentation fault
Reporter:   operator
Created:   Fri, 18 May 2007 12:24:00
Updated:   Tue, 23 Sep 2008 13:10:56
Key:   8502
Versions:   Not provided
Environment:  
Priority:   -1
Status:   Not provided
Resolution:   UNREPRODUCIBLE
Original Link:   http://bugzilla.kernel.org/show_bug.cgi?id=8502
Summary:   fast, big untar to fat32 fails with segmentation fault
Description:
Most recent kernel where this bug did *NOT* occur: 2.6.12
Distribution: self-compiled
Hardware Environment: IBM Thinkpad
Software Environment: Mini-Linux with glibc 2.5
Problem Description:
unpacking of a large tar.gz archive to an fat32 filesystem produce a
"segmentation fault" error.
The same archive to an ext3 works without problems.
I used BusyBox 1.5.0 tar and GNU-tar

Steps to reproduce:
Must be a brandnew IBM-Thinkpad or better hardware, 1GB RAM.
Then unpack a tar > 800MB to a fat32 on an sd-drive.
Comments:
OGAWA Hirofumi Sun, 20 May 2007 02:58:57
Created an attachment (id=11550) [details]
bug fix

Please try the attached patch. This patch may fix this bug.
SCHMIEDER Operator Mon, 21 May 2007 16:20:39
Hello,
first of all, thanks for your very fast answer.
Unfortunaly the patch did'nt resolve the problem, it still exists.
In meanwhile i tried the same procedure with cpio instead of tar - exactly the
same result: ext3 works, fat32 produce segement-fault.
What kind of information do you need to possible see what happens here ?
Problem: neither dmesg nor strace are running after the seg-fault.
OGAWA Hirofumi Tue, 22 May 2007 12:50:08
I'd like to see why the tar was crashed. Can you debug the tar?
OGAWA Hirofumi Tue, 22 May 2007 12:59:44
Or can you find the version in which a problem begins to happen?
2.6.21 is ok?
SCHMIEDER Operator Tue, 22 May 2007 13:30:29
Hi,
i never saw this with kernel 2.6.19, but i am not realy sure on this because we
did'nt have such notebooks at this time.
Could you please give me a hint how to debug the tar ? Is strace ok for you ?
OGAWA Hirofumi Wed, 23 May 2007 12:59:46
Created an attachment (id=11578) [details]
debug version of tar

Can you run attached tar on your system? (it compiled on my system, so
it may be incompatible library.)

If it run, please reproduce a problem, then send the core.
SCHMIEDER Operator Thu, 24 May 2007 12:16:43
Hi,
unfortunaly the tar brought me no more output. Should there be a file, or where
can i find the "core".
But, i did another thing:
i copied my tgz on the FAT32, than i unziped the tar and after that i tried an
untar - but got the seg-fault while starting.
While the untar was running, the free memory goes down to about 12kB.
Maybe this it is a buffer problem ?

Info: i am going on vacation the next 14 days and will give all the infos to my
colleque Markus.
Thanks
Holger
OGAWA Hirofumi Fri, 25 May 2007 11:10:20
This bugzilla is crap. It's tar.bz2. Please try the following.

# wget -O tar.bz2 'http://bugzilla.kernel.org/attachment.cgi?id=11578&action=view'
# bzip2 -d tar.bz2
# tar xzf your-test.tar.gz

Please try to reproduce a problem with the attached "tar".

> i copied my tgz on the FAT32, than i unziped the tar and after that i tried an
> untar - but got the seg-fault while starting.
> While the untar was running, the free memory goes down to about 12kB.
> Maybe this it is a buffer problem ?

This is normal. The memory are used as file cache.
SCHMIEDER Operator Mon, 11 Jun 2007 23:39:43
Hi,
i am back from vacation now and saw your last thread.
I tested the following again with your tar:
- copied the tgz-archive to the fat32
- did an tar -zxvf archive.tgz

This works without any error. But after that i get an segmentation fault for
every application i try to run. Only a reboot helps than.

This is the last message in dmesg:
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA

Maybe this is imoortand to know:
- all filesystems are compiled static into the kernel
- the device drivers are modules:
Module Size Used by Not tainted
sd_mod 25088 2
ata_piix 15492 0
ahci 20740 1
libata 116252 2 ata_piix,ahci
scsi_mod 99980 2 sd_mod,libata
e1000 113216 0
OGAWA Hirofumi Tue, 12 Jun 2007 00:59:18
(In reply to comment #9)
> i am back from vacation now and saw your last thread.
> I tested the following again with your tar:
> - copied the tgz-archive to the fat32
> - did an tar -zxvf archive.tgz
>
> This works without any error. But after that i get an segmentation fault for
> every application i try to run. Only a reboot helps than.

Thanks. I think the your tar version is buggy.
Could you change your tar or glibc version?

> This is the last message in dmesg:
> sd 0:0:0:0: [sda] Attached SCSI disk
> sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA

Is this reproducible?
SCHMIEDER Operator Tue, 12 Jun 2007 09:14:58
Hi,
thanks for the fast answer.
I tried multiple different tar versions, all with same result.
The dmesg trace above comes up only one time while the sata driver was loaded.

Anyway, all this does NOT appear on an ext3 volume with same hardware. - So, my
question is: could it be realy a problem of glibc or tar ?

Thanks
Holger
OGAWA Hirofumi Tue, 12 Jun 2007 11:41:08
(In reply to comment #11)
> I tried multiple different tar versions, all with same result.
> The dmesg trace above comes up only one time while the sata driver was loaded.

> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA

the avobe line is normal, please don't care.

> Anyway, all this does NOT appear on an ext3 volume with same hardware. - So, my
> question is: could it be realy a problem of glibc or tar ?

Ah, sorry. I'm forgetting ext3. Umm... I'll rethink this weekend. Please wait
a bit.
SCHMIEDER Operator Wed, 13 Jun 2007 09:38:45
Hi,
one more additional info:
unix2dos onto a textfile brings up the message "File exist"

exactly the same with kernel 2.6.19 works verry well.

Maybe this is an important info.
OGAWA Hirofumi Wed, 13 Jun 2007 11:31:49
(In reply to comment #13)
> one more additional info:
> unix2dos onto a textfile brings up the message "File exist"

This is interesting. Could you give cmdline and strace?

# strace -fF -vvv -s 4096 -o log unix2dos foobar
SCHMIEDER Operator Wed, 13 Jun 2007 12:27:17
Created an attachment (id=11742) [details]
attached the strace log

Hi,
i did the strace - hope this helps

Holger
OGAWA Hirofumi Wed, 13 Jun 2007 13:46:43
Umm...

18563 open("/mnt/sda1/autoexec.bat", O_RDONLY) = 3
18563 gettimeofday({1181762230, 993358}, NULL) = 0
18563 getpid() = 18563
18563 open("/mnt/sda1/autoexec.batSqico6", O_RDWR|O_CREAT|O_EXCL, 0600) = -1
EEXIST (File exists)

Probaby unix2dos want to create temporary file, but this is msdos fs?
Did you use "msdos" as fstype? If so, can you try vfat instead?
SCHMIEDER Operator Thu, 21 Jun 2007 03:17:55
Created an attachment (id=11844) [details]
strace sniplet

Hi,
this is realy strange:
- normaly vfat should be detected automaticly when i have autofs in fstab but
it is using msdos.
- i do now a mount -t vfat and on the ibm machine both, unix2dos and tar are
fine now
- exactly the same on an HP DC7700 brings the segfault again.
- to slow the tar down now i did a strace with -s128 -> also bad
- to slow it down more than that i did an strace with -s4096 and -vvv to
/dev/zero, -> now it works, slowly because of strace, but it works.

So it realy seems to be a runt-condition of the fat32 kernel-modul - or what is
your opinion ?

The strace-log is very large ( 300Mb) so i copied the last view lines into the
attached file. Please let me know if the complete stracelog could help, i will
then put it on our webserver for downloading.
OGAWA Hirofumi Thu, 21 Jun 2007 09:15:13
(In reply to comment #17)

> this is realy strange:
> - normaly vfat should be detected automaticly when i have autofs in fstab but
> it is using msdos.

Are you using the following?

xxxx -fstype=auto :/dev/foo

auto is meaning to try to /etc/filessytems order, and if there is no
/etc/filesystems, it's /proc/filesystems order.

If vfat and msdos were compiled as static, /proc/filesystems will show

msdos
vfat

order. In short, fat will be mounted as msdos.

> - i do now a mount -t vfat and on the ibm machine both, unix2dos and tar are
> fine now
> - exactly the same on an HP DC7700 brings the segfault again.
> - to slow the tar down now i did a strace with -s128 -> also bad
> - to slow it down more than that i did an strace with -s4096 and -vvv to
> /dev/zero, -> now it works, slowly because of strace, but it works.
>
> So it realy seems to be a runt-condition of the fat32 kernel-modul - or what is
> your opinion ?

It means -- with strace works, but without strace it doesn't work?
SCHMIEDER Operator Thu, 21 Jun 2007 09:45:08
Hi, thanks for clearifing the auto-option, i missunderstood this.

regarding the problem, let me give you a summary:
THE UNTAR WORKS:
- on slower machines,
- on fast machines with strace high level debug
- on ext3 ever
- for small tgz's

AND DONT WORK
- on fast machines fat32
- on fast machines with low debug-level (-v -s128) strace

I believe it works while the machine is not to fast while writing into fat32
filesystem.
OGAWA Hirofumi Sun, 24 Jun 2007 01:53:49
Did the tar die as segfault? Is there coredump?

Um... what does the following output?

# ulimit -Ha
# ulimit -Sa

Thanks.
SCHMIEDER Operator Tue, 26 Jun 2007 12:03:51
Hi,
sorry for the delay. I first had to organize an machine with this isues.

I am not sure what produce the segfault, but i believe this is the tar. Anyway,
After the tar -zxvf on the big archive within fat32 run once, every application
brings the segfault-error.

Here the requested output:
~ # ulimit -Ha
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
coredump(blocks) unlimited
memory(kbytes) unlimited
locked memory(kbytes) 32
process 16252
nofiles 1024
vmemory(kbytes) unlimited
locks unlimited

~ # ulimit -Sa
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) 0
memory(kbytes) unlimited
locked memory(kbytes) 32
process 16252
nofiles 1024
vmemory(kbytes) unlimited
locks unlimited
OGAWA Hirofumi Tue, 26 Jun 2007 12:59:45
Thanks.

coredump(blocks) 0

Ah, this is limiting to dump core. Please do "ulimit -c unlimited", then we
can get coredump.

Um.. but random segfault is hard to find the cause of it. Can you find first
broken kernel version?

And could you send your tar binary and coredump?
SCHMIEDER Operator Tue, 26 Jun 2007 13:49:56
Created an attachment (id=11888) [details]
Cordumps and tar binary

Hi,
i switch the cordump to unlimited, then i did an tar -zxvf, after this was
finished (without error), every application i tried to start (pico, tput,
strace ...) are endig with segfault.
(the only way to copy the coredump is with dd wich is working:
============================================================================
/mnt/sda1/install # pico
Segmentation fault (core dumped)
/mnt/sda1/install # ls -l
-rwxr-xr-x 1 root root 163840 Jun 27 00:28 core
drwxr-xr-x 10 root root 192512 Jun 27 00:22 i386
-rwxr-xr-x 1 root root 536539731 Jun 27 00:21 xpsp2.tgz
/mnt/sda1/install # dd if=core of=/mnt/floppy/core_pico
320+0 records in
320+0 records out
/mnt/sda1/install # tput
Segmentation fault (core dumped)
/mnt/sda1/install # dd if=core of=/mnt/floppy/core_tput
336+0 records in
336+0 records out
============================================================================

Let me explain my system a little bit:

It is for unattend installations and some other network testings.
it boots a linux kernel an a initrd. All network drivers are within initrd, so
it works on nearly every pc. After the network connection is established the
system gets the necessary hd-driver from the server. Than it creates a
partition, formats that an copies all files to there. Than it boots and install
the OS on the PC.
So for Windows installations FAT32 is necessary. This works for a couple of
years now exactly this way, but now there are some new machines wich needs
newer kernel-drivers to load the hardware.
And tougether with that, i believe because they are faster, i get theese
problems.
So, i unpack the tar - this is the WinXP i386-dir, i have to call some other
commands.
....

And now the strange thing:
-this is working constantly on an ext3-fs !!!
-there are no problems on slower machines !!!

I attached the tar-binary and 2 coredumps. But please note, i tried other tar's
(for example the busybox-tar) and cpio to, with same results.

Hope you have some more ideas.

Holger
OGAWA Hirofumi Tue, 26 Jun 2007 14:10:10
Oops, I need to see some info coredump and binary of coredump. Could you send
tput and pico binary on your system?
OGAWA Hirofumi Tue, 26 Jun 2007 14:17:55
BTW, yes, I'm thinking this is strange things happen on kernel or glibc now.
So, if you can find first broken kernel version, it will be much appreciated.
SCHMIEDER Operator Tue, 26 Jun 2007 14:25:09
Created an attachment (id=11889) [details]
tput and pico binary

Hi, thanks for the fast replay. Attached tput and pico.
Unfortunaly it is not possible to find the kernel version wich is working
because every older kernel is not working with the brandnew hardware (ata_piix)

Hope you have another idea.

Thanks,
Holger
OGAWA Hirofumi Tue, 26 Jun 2007 14:33:07
Oh, I see. Does first working kernel version have this problem?
SCHMIEDER Operator Tue, 26 Jun 2007 22:58:16
i can use all kernel versions begining from 2.6.19 for those HW-types (ata_piix
and ahci was testetd). All with same problem.
Holger
OGAWA Hirofumi Wed, 27 Jun 2007 09:59:15
Does 2.6.19 also have this segfault problem? Um...
SCHMIEDER Operator Tue, 14 Aug 2007 10:26:39
Hi, sorry for not answering for such a long time. Yes, the problem exists for
all kernels, including 2.6.19.
Anyway, it not happens with ext3 so in my opinion it could not be a problem of
the driver but of the fat32 kernel-modul.

Some more ideas ?
OGAWA Hirofumi Tue, 14 Aug 2007 11:32:10
My stress test is passed always, but I'd like to reproduce this on my
machine...
Can you give reproducible step?

Um.., can you try most recent kernel version (2.6.23-rc2)? And please use debug
options (CONFIG_SLUB=y, CONFIG_SLUB_DEBUG_ON=y).

Thanks.