Trafodion

sort crashes when no space for scratch area

Bug #1389784 reported by khaled Bouaziz on 2014-11-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Trafodion	Fix Committed	High	Prashanth Vasudev	Trafodion r1.1

Bug Description

We saw this issue few times now especialy when running bulk loader. When there is not enough scatch space sort crashes.
we probably need to give an error messge instead.

#0 0x00000030b9889abb in memcpy () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 glibc-2.12-1.132.el6_5.3.x86_64 hadoop-libhdfs-2.4.0.2.1.5.0-695.el6.x86_64 jdk-1.7.0_67-fcs.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.9-33.el6_3.3.x86_64 libcom_err-1.41.12-12.el6.x86_64 libgcc-4.4.7-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 nspr-4.10.6-1.el6_5.x86_64 nss-3.16.1-4.el6_5.x86_64 nss-softokn-freebl-3.14.3-10.el6_5.x86_64 nss-util-3.16.1-1.el6_5.x86_64 openldap-2.4.23-26.el6_3.2.x86_64 openssl-1.0.1e-16.el6_5.15.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000030b9889abb in memcpy () from /lib64/libc.so.6
#1 0x00007ff3b3b59177 in SQScratchFile::redriveVectorIO (this=0x7ff3a5a7f398, index=0) at ../sort/scratchfile_sq.cpp:754
#2 0x00007ff3b3b5924e in SQScratchFile::executeVectorIO (this=0x7ff3a5a7f398) at ../sort/scratchfile_sq.cpp:678
#3 0x00007ff3b3b5ced5 in ScratchSpace::writeFile (this=0x7ff3a5a5d120, block=0x7ff39502e200 "\017\t", blockNum=2319, blockLen=524288)
    at ../sort/ScratchSpace.cpp:707
#4 0x00007ff3b3b5d131 in SortScratchSpace::flushRun (this=0x7ff3a5a5d120, endrun=0, waited=0) at ../sort/ScratchSpace.cpp:1470
#5 0x00007ff3b3b5d265 in SortScratchSpace::writeRunData (this=0x7ff3a5a5d120, data=0x7ff39a7c99a8 "\270", reclen=184, run=<value optimized out>,
    waited=0) at ../sort/ScratchSpace.cpp:1410
#6 0x00007ff3b3b58250 in Record::putToScr (this=0x7ff3a14e18e0, run=<value optimized out>, reclen=<value optimized out>,
    scratch=<value optimized out>, waited=<value optimized out>) at ../sort/Record.cpp:151
#7 0x00007ff3b3b5766c in Qsort::generateARun (this=0x7ff3a5a5d090) at ../sort/Qsort.cpp:367
#8 0x00007ff3b3b579e6 in Qsort::sortSend (this=0x7ff3a5a5d090, rec=<value optimized out>, len=184, tupp=<value optimized out>)
    at ../sort/Qsort.cpp:295
#9 0x00007ff3b5a794cc in ExSortTcb::sortSend (this=0x7ff3a9019048, srcEntry=<value optimized out>, srcStatus=<value optimized out>,
    pentry_down=0x7ff3a9019358, upEntry=<value optimized out>, sortFromTop=0, step=@0x7ff3a9019414, matchCount=@0x7ff3a9019418,
    allocatedTuppDesc=@0x7ff3a9019420, noOverflow=@0x7ff3a9019428, workRC=@0x7fff61726aae) at ../executor/ex_sort.cpp:1222
#10 0x00007ff3b5a7b009 in ExSortTcb::workUp (this=0x7ff3a9019048) at ../executor/ex_sort.cpp:665
#11 0x00007ff3b5af8eb3 in ExScheduler::work (this=0x7ff3a8fd4f68, prevWaitTime=<value optimized out>) at ../executor/ExScheduler.cpp:328
#12 0x00007ff3b5a306b2 in ExEspFragInstanceDir::work (this=0x7fff61727090, prevWaitTime=1551921) at ../executor/ex_esp_frag_dir.cpp:757
#13 0x0000000000405cff in runESP (argc=3, argv=0x7fff617274a8, guaReceiveFastStart=0x0) at ../bin/ex_esp_main.cpp:389
#14 0x0000000000406103 in main (argc=3, argv=0x7fff617274a8) at ../bin/ex_esp_main.cpp:244

Tags:

Anoop Sharma (anoop-sharma) on 2014-11-06

Changed in trafodion:
assignee:	nobody → Prashanth Vasudev (vasudev-prashanth)

Sandhya Sundaresan (sandhya-sundaresan) on 2015-01-07

Changed in trafodion:
milestone:	none → r1.1

Prashanth Vasudev (vasudev-prashanth) on 2015-03-27

Changed in trafodion:
status:	New → In Progress

Revision history for this message

Prashanth Vasudev (vasudev-prashanth) wrote on 2015-03-27:

After the temp file is created, its initial size is zero bytes. This temp file needs to be fully allocated before it is mmap’ed.

We resize the file by doing a lseek to the end of file followed by a write of one character to file. However write() does not flush this write to disk and also does not force allocation of all the blocks.

1. So instead of lseek and write, we now use posix_fallocate() . Performance is not a concern here since we are in overflow mode, also this happens only once for every new file. Any disk space issues is detected and would return ENOSPC (linux 28) error.

2. Added a try catch block around memcpy, incase of any other errors.

Revision history for this message

Sandhya Sundaresan (sandhya-sundaresan) wrote on 2015-04-01:

Fixed and merged to mainline https://review.trafodion.org/1404

Changed in trafodion:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.