Hi Jay On 29/09/2009, at 4:08 AM, Jay Pipes wrote: >> There are several issues. length can be a larger value then what >> crc32 can take. > > Yes, agreed. > >> Also, crc32 is returns a long, so on a 32bit platform it may be ok >> to receive the value but on 64bit... >> I'd recommend pulling the crc implementation from libmemcached and >> using that (or possibly a different algo). CRC-32 is a uint32, not a long (make sure it never messes with sign bits, just to be safe). > Well, the one in libmemcached also returns a 32-bit unsigned > integer... > > uint32_t hash_crc32(const char *key, size_t key_length) > { > uint32_t x; > uint32_t crc= UINT32_MAX; > > for (x= 0; x < key_length; x++) > crc= (crc >> 8) ^ crc32tab[(crc ^ (uint64_t)key[x]) & 0xff]; > > return ~crc; > } > > But, as you can see, it handles lengths greater than 32 bits, though > in > a way that I am not sure is bitwise correct. It is cast-converting a > char into a uint64_t and XORing that against an unsigned 32-bit > integer, > then ANDing it against the value of 255. The above code looks a tad dodgy, given that the crc is uint32, not uint64. It'll work safely since it's beyond the relevant 8 bits, but just looks confusing and possibly cause extra trickery on 32bit builds? Another error in the above is that the key_length is size_t but the x counter is uint32. That may or may not be the same. When transmitted next to a block, it should a) follow the data and b) be stored/transmitted low byte first. So depending on the arch, endian- swapping is required. > If this is what you want me to use, no problem. Please verify for me, > though, that the above algorithm will indeed solve the bug. I > *think* it > does, but would like you to verify. Thanks! I forget what the exact threshold rule is, but ancient memory recalls that CRC-16 becomes unreliable when the block is a few K long, and thus we moved to CRC-32 at that time. So I'll take the educated guess that a CRC-32 is *definitely* out of its depth when you get to >32bit size data. Some numbers for your enjoyment (see that it's designed for serial transmissions lines, really ;-) CRC-16 (CCITT) ERROR DETECTION: All single and double bit errors, all errors with an odd number of bits, all burst errors of length 16 or less, 99.997% of 17-bit error bursts, and 99.998% of 18-bit and longer. CRC-32 ERROR DETECTION: When correctly applied, undetected errors are reduced by a factor of 10^-5 over the CCITT CRC-16. I presume that when dealing with blobs, you could potentially have large data blocks to deal with - so CRC-64 might be the way to go? I'd be happy to verify/validate a proper implementation. It's not complicated, except to explain. I can just see it when it's right ;-) I haven't got code handy beyond CRC-32, but if you want a proper CRC-64 implementation I'd be happy to provide proper code and a table, and optionally the code to generate the lookup table (so people can verify it's correct and see how it's done, at least). There's also an alternative using 2 16-item tables rather than 1 256-item lookup... slower but much smaller... probably not relevant for Drizzle. For CRC-64 you will have to choose which polynomial, there's ISO 3309 (HDLC) and ECMA-182. I don't know which one has better error detection (people have worked this out for 32-bits and smaller, with Hamming distance on given message sizes - I'd have to search for info on the 64-bit ones). There can be significant difference in error detection performance depending on which polynomial is used. If you want to stay with CRC-32 I can provide you with good code also. There's an awful lot of quick bad hacks and code copying going on with CRC, and just doing a quick google came up with more articles where people use CRC for fingerprints. It's just fundamentally unsuitable for that. It can just have a decent result on *integrity*, but one should never compare two blocks by their CRC. Cheers, Arjen. -- Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) Exceptional Services for MySQL at a fixed budget. Follow our blog at http://openquery.com/blog/ OurDelta: enhanced builds for MySQL @ http://ourdelta.org