Corrupted Hard Drive? Python to the Rescue! | by Angela Kochoska | Apr, 2022

The story of how I saved a 12 months’s value of labor with a easy Python script when all else failed.

Picture by George Milton https://www.pexels.com/@george-milton/

In in the present day’s world of clouds and limitless superior backup options, storing your most valuable initiatives on a easy exterior onerous drive might sound a little bit outdated, however as I discovered just lately (in opposition to my will) it appears to nonetheless be musicians’ most popular selection.

My (very particular) downside

So I used to be in for an attention-grabbing weekend when one in every of my husband’s exterior onerous drives simply determined to croak. It additionally occurred to be the one the place he was storing all of his newest music initiatives, sound results for his streams and one thing known as “stems” that I’ve discovered permits him to copyright his music. Not enjoyable.

So to spare myself days of attempting to consolation him over the lack of a complete 12 months of labor, I made a decision to dive head first and attempt to save as a lot of his information as potential — utilizing standard strategies I researched on the Web just like the good tech-y individual that I’m. This shortly proved to be fruitless: the disk appears to be suffering from random unreadable sectors that not solely decelerate past motive any type of scanning poor Disk Drill or equal apps tried to do, but in addition any try of automated or handbook backup was finally interrupted by the disk being randomly unmounted and re-mounted at any time when a program hit a snag.

An precise estimate of the remaining backup time utilizing Disk Drill. For context, 8489 hours is 354 days, or rather less than A YEAR.

The one redeemable factor is that the disk remains to be detectable and (nevertheless slowly) readable. With loads of persistence, I discovered that copying particular person information truly works and in the event that they’re not corrupted it occurs at comparatively regular speeds. So, to avoid wasting as a lot information as potential, one must undergo every single file and attempt to copy it, then wait to see if it really works or the file raises an I/O error. HAH! However we’re speaking 500GB of knowledge right here throughout numerous directories and sub-directories and sub-sub-sub-sub directories… Time to surrender, proper? No, as a result of we are able to automate this!

Due to the very particular nature of the issue, I needed to write a really particular script for the file backup, however components of this answer could also be helpful for a fellow sufferer when all else fails.

Right here’s the roadmap for my script:

1. Reproduce the listing tree in a backup iCloud listing

  • select a subset of folders to backup
  • for every discovered folder make a sibling within the backup location
  • preserve a listing of particular person information with full paths from every discovered folder (to be copied in Step 2)
  • save file listing for future retrieval

And the code for this step:

Relying on the variety of directories and sub-directories, this will likely take some time by itself, however on the finish of it we now have a listing of the entire particular person information with their full paths within the onerous drive.

I selected this selection as an alternative of copying every file whereas strolling by means of the directories merely due to how lengthy the backup takes and the way usually it may be interrupted by unexpected errors. This manner, I solely compile the listing as soon as and may seek advice from it sooner or later.

Don’t overlook to avoid wasting the listing to a file in case you suspect the script will crash sooner or later! I’m a sucker for numpy, so that is how I did it:

A better choice might be to jot down every path to a file inside the loop itself, in order that’s homework for anybody unlucky sufficient to have to breed this.

2. Copy every particular person file to its sibling location on iCloud

  • load file listing
  • initialize a flags array to maintain observe of progress — helps in case you nonetheless don’t know what errors you’re attempting to catch and uncaught ones break the loop. Additionally nice for future evaluation of how a lot of the disk was saved vs misplaced.
  • strive/besides to repeat every file, with updates to the flag array based mostly on success or caught exceptions

After which, the precise backup loop:

You’ll discover that I preserve observe of a number of totally different errors and right here’s why:

  • FileNotFoundError — raised when the disk randomly unmounts and re-mounts or a file merely can’t be discovered. To maintain observe of each eventualities, I add time.sleep(120) which permits time for the disk to re-mount, in addition to a counter (j) for the variety of occasions a file has been tried. If it fails twice in a row, I skip it and flag it as not discovered (flag -1)
  • IOError — nothing a lot to do right here besides increase a flag to point the file is corrupted and misplaced (flag -2)
  • TimeOutException — I give every file a most of 5 minutes to repeat. If this time restrict is exceeded a timeout exception is raised and flagged (-3). A few of these information can then be revisited if they’re just too massive or too essential to skip.
  • All different exceptions are flagged with -999 to point they’re not one of the above (and could be revisited if we care to look at them in additional element).

And that’s it!

This easy Python-based strategy managed to avoid wasting 78% of the essential information on this corrupted exterior drive when fancy apps didn’t even get well a single byte. Time-wise, writing up the script took lower than half an hour, troubleshooting it one other hour or so, and the backup itself took about two days (if this sounds too lengthy, keep in mind the 8,000+ hours Disk Drill estimated!). Relying on the dimensions of the backup and your persistence, this could even be additional optimized with multiprocessing.

And, the most effective half — it’s free!

A visualization of the components recovered vs misplaced throughout the chosen “essential” directories.

I’ll now go provide a sacrifice to the Python gods.

More Posts