Saturday, 22 August 2015

Why you should always check for download file corruption!

It is not uncommon to download a large file only to find that it is corrupt. Sometimes the problem is immediately obvious, but sometimes it can leave you scratching your head for hours, before you finally realise that your source file was corrupt!

It is also not uncommon to find that when a large file is copied to a USB drive, the copy did not work correctly and leaves you with a corrupt file. 'Fake' USB drives can also cause this type of symptom.

Downloading a Zip/7Zip/Rar compressed file is usually safe because when you unzip it, you will be warned if it is corrupt, but if you download a .ISO file you should always double-check it!

For instance, I have found file corruption in a large \sources\install.wim caused a problem with Windows failing to boot after a fresh install, but Setup did not complain that the install.wim was corrupt!

USB data-stream corruption does happen!

Many years ago (approx 2005), I set up a notebook plus a wireless router in lounge connected to the telephone line and used a D-Link wireless USB 2.0 'dongle' connected to my other PC in the office.

Because the signal was quite weak in the office, I ran a 3 metre shielded USB 2.0 cable from my office PC to the USB WiFi dongle and stuck the dongle onto the wall, next to the office door, using a Velcro sticky-pad.

Note: The USB dongle that gave problems was actually an old D-Link WiFi dongle and not the one shown in this picture!

The D-Link WiFi dongle appeared to be working fine, however I noticed that sometimes when I downloaded a large ZIP file from the internet using the WiFi dongle, the file would be reported by 7Zip as being corrupt when I unpacked it. I knew the file on the internet was OK because sometimes it would download without being corrupt and also I could always download it reliably using the notebook in the lounge which did not use the wireless dongle.

I also noticed that this problem tended to be worse at night, but if I used an Ethernet cable instead of the WiFi dongle, there was no problem.

To cut a long story short and after a lot of experimental downloads, I found that the problem was due to the location of the WiFi dongle. If I moved it away from the wall light dimmer switch, all files downloaded cleanly. As soon as I moved the WiFi dongle near to the dimmer switch, the file corruption problems returned. The problem was a lot worse when the dimmer switch was actually switched on (hence worse at night!).

The only conclusion I could draw from this, was that electromagnetic interference from the dimmer switch and mains cable wiring was the cause, and that the USB 2.0 driver (or the WiFi decoding circuitry/chips?) were not doing a very good job of error detection and error correction! This conclusion rather surprised me, because I would have expected extensive error detection to be employed at all stages.

I have not tested other USB WiFi dongles since, so I cannot say if things have improved since then.

Hash Utilities

When you unzip a .zip file or .rar file, the contents are automatically checked because the CRC32 hash value of each file is stored inside the .zip/.rar file.

If however, you download an ISO file, it is difficult to check if the file is corrupt or not, unless you first know what it's hash value should be.

The best way to check that the contents of an ISO (or other) file is correct, is to check it's hash value against it's 'official' published hash value.

A good description of what a file hash is, and various utilities that can be used to obtain the hash of a file can be found on gizmo's freeware site here.

Personally, I use HashTab which integrates into the 'Properties' dialog of Windows Explorer:

HashTab adds an extra tab to the File Properties dialog box.

Easy2Boot and CRC32

Easy2Boot will display the CRC32 value of any payload file, if you select it from the E2B Menu and hold down the Ctrl key first (or SHIFT+Ctrl on v 1.72+), before pressing the ENTER key to run it. This only works on payload files that are displayed in the menu and not other sorts of menu entries. It is, however, a lot slower than using Windows to calculate the hash values of the file.

After displaying the CRC32 value, E2B will then run the payload file as usual.

You should also bear in mind that E2B may modify the contents of some ISOs. So if you have run an ISO once using E2B, the file hash value may be changed, because E2B may have modified the ISO file on the E2B USB drive. For instance, Windows XP ISOs are modified by E2B to prevent it asking you to 'press a key to boot from the CD'.

If you suspect that a file on the E2B USB drive is corrupt, always check it's hash value immediately after you have copied it over to the USB drive - do not compare hash values once you have used E2B to boot from it.