Styler66 - Jun 14, 2005 - 4:04 am
Hi Guys
Here's my dilema. I have a powerbook with @10,000 photos and other stuff, then I have a Maxtor 300Gb external drive, also with some of the data which is on the Powerbook...
My problem is that the photos are moved around and re-named on the powerbook due to customers selecting and re-naming etc... now I want to back up to the external but avoid DUPLICATES? As these files also exist there but in different folders and different names...
Nightmare heh!... I tried with Retrospect Express 6.0 but the wording in that program escapes me.... I just dont get it, I have now SuperDuper and DataBackUp. ANY HELP would be very very much appreciated.
I also have 3or4 iTunes Libraries from various installations some including the same songs or albums etc... How do you put them all in the one Library on the external without Duplicates too and then select only the Albums I want to Re-Import to the Powerbooks Library....cheers.
DevilRocks - Jun 14, 2005 - 2:26 pm
About your backing up photo's problem, if they are named differantly it is going to be very hard to not have duplicates, I dont know how to help you there, or know if anyone can. What i would do is to delete the originals on the maxtor and copy over the ones from the powerbook, if that's possible.
About iTunes, if these libraries are all on differant computers you need to copy all your music files to one computer and delete the libraries and remake a new library. in iTunes on tiger you go to music on the left tab when opening hard drive iTunes/iTunes Music/ and all of your files should be stored there, then copy all them to one place, open them all in one computer, share it all wirelessly and use that computer for streaming music.
That is what i would recomend
Styler66 - Jun 21, 2005 - 3:00 am
Thanks Thomas
Sorry for the delay in Reply But I was travelling around Europe.
Your reply was helpful to a degree. I guess I have special needs here with sooooo many pictures, its extreeme.
Anyway cheers for the quick response.
macbri - Jun 24, 2005 - 3:08 am
Just to add my 2cents worth to Thomas' response. iTunes has an option to show duplicates, so you could have it take care of that and then remove copies of each (open iTunes, go to Edit -> Show Duplicates. But his idea is probably the way to go for iTunes I think.
As for images, the problem can be reduced with a shell script (command to run in a terminal). Luckily I've done this kind of thing before... Although two copies of the same image where the file names might be different, if the *content* hasn't been altered, then the files will have an identical checksum (a kind of code derived from the file content). The command to check this is "cksum". Let's say for example you backup *all* your images into a folder (with subfolders if necessary) not worrying about duplicates. Of course, the possibility exists that two *different* images might have the same name (in different folders) so be sure you don't overwrite anything!
Anyway at this stage, we would build a list of that magic checksum for every file, and then scan that list for duplicates, so you could then examine just those images. Two different images having the checksum is practically impossible, but better to trust your eyes than a machine right?
There's no denying a bit of work would be involved, but here's the start: In a terminal:
find /FolderNameWhereImagesAre -type f -exec cksum {} \; | sort > list.txt
Of course you'll substitute your images directory. When finished (and it may take a while) we can search the results for duplicates!
Last of all (for this message) try this command to get an idea of how many duplicates you potentially have (all on one line)
find /FolderNameWhereImagesAre -type f -exec cksum {} \; | sort -n | awk '{ print $3 " " $1}' | uniq -f 1 -d
Scary commands, but useful! Let me know if you want to pursue this!
--------
Brian -- MacOSX.com Technical Support
Styler66 - Jun 24, 2005 - 12:54 pm
Wow .... amazing answer. My abilities with terminal are VERY limited.
I dont want to screw things up,will give it a try asap.Thanks for the input.
Are you available via iChat AV??? Just to know.
I am in Switzerland GMT +1 hr.
thanks
Styler
macbri - Jun 24, 2005 - 9:32 pm
Hey Rob -
Let me know how it goes with that command. No, I'm not on iChatAV I'm afraid. I do have YahooIM and MSN though, maybe they'd be of use? Anyway, I'll be online here on macosx.com regularly over the weekend.
- Brian
--------
Brian -- MacOSX.com Technical Support
Styler66 - Jun 25, 2005 - 4:03 am
hi Brian.
All my pictures are located under one folder in my home dir. called "Work"
Work has several other folders for my clients ..... then I have another folder called for example SPRING/SUMMER 2005
In here is a huge mess... duplicates galore, also this has many sub-folders where my photos are by category. Some have been moved around a lot too.
so copies can have occured.
Im a total newbie to Terminal.... can you make the script idiot-proof for an idiot like me??? I'm the creative kind...... not a logical thinker ;(
Cheers Brian
Rob
macbri - Jun 25, 2005 - 5:44 am
Hey Rob -
I've pasted a Perl script into the reply. I've also made it avaiable for download for a short time at the following link:
http://www.bgstech.com/download.php?find_duplicates.pl
If you cut and paste the script below, save it in a text file and be sure to OMIT the lines labelled "CUT HERE". In other words, the first line of your saved file should be "#!/usr/bin/perl".
Whether you cut and paste it, or save it from my website, move it to your /Work directory. Then in a terminal make sure it can be executed with the command:
cd /Work && chmod +x ./find_duplicates.pl
Now, to make the script do it's thing, just type:
./find_duplicates.pl | tee duplicates.txt
And sit back and wait for the results. One major thing to note - I testes this on my iPhoto library, and I noticed some thumbnails were saved twice. So I added a flag in the script "$examine_thumbnails" which I set to 0. Change this to 1 if you want a full accounting for EVERY file.
Let me know how it goes. Once finished you'll have the results on your terminal screen but also in a file /Work/duplicates.txt. From that you can check the files yourself, or if you're really confident we can write another short script to prune the results automatically. Please note that at this stage I *strongly* encourage you to check some of the results yourself to verify things reported as duplicates really are. This script will report duplicates but will NOT delete or remove ANY of your files in any way.
#--------------- CUT HERE ------------------
#!/usr/bin/perl
#
# This Perl script locates duplicate files starting in the current
# directory and working downwards. Any two files with the same checksum will
# be reported, even if they have different names and/or are located in
# different directories.
#
# Usage: ./find_duplicates.pl
#
# Brian S.
# MacOSX.com, Jun 25th 2005
#
$examine_thumbnails = 0; # Set this to 1 to process thumbnail files
$verbose_output = 1; # Set this to 0 for quiet operation
# Run 'find' command to get all unique files starting in the current directory
# and run a checksum on each one. Sort results.
open (LIST, "find ./ -type f -exec cksum {} \\; | sort -n |") or die "Can't exec
ute command\n";
# Initialize some counters
$total = 0;
$count = 0;
$cksum = 0;
$file = "";
# Process file list
while (
) {
chomp; # Strip newline from end of line
@result = split(' ', $_); # Split into 'words'
# If a filename or directory has spaces in it, account for that here
if ($#result > 2) {
$newfile = "";
for ($i = 2; $i <= $#result; $i++) {
$newfile .= $result[$i] . " ";
}
} else {
$newfile = $result[2];
}
# If the previous checksum matches this one, report it
if ($cksum == $result[0]) {
if ($examine_thumbnails) {
print "$file MATCHES $newfile\n";
} else {
print "$file MATCHES $newfile\n" unless ("$newfile" =~ /Thumbs/ or
"$file" =~ /Thumbs/);
}
}
$cksum = $result[0];
$file = $newfile;
$count++;
# Report every 1000 files processed, to let the user know we're working
if ($verbose_output) {
if ($count % 1000 == 0) {
$total += $count;
printf("Processed %7d files...\n", $total);
$count = 0;
}
}
}
# Done, clean up.
close LIST;
exit 0;
#------------------- CUT HERE -------------------
--------
Brian -- MacOSX.com Technical Support
Styler66 - Jun 27, 2005 - 5:12 am
thanks Brian, but Work is NOT a directory, just a folder in my user account.
I made a copy as described put into my work folder, and get this......
Last login: Mon Jun 27 11:05:16 on ttyp1
Welcome to Darwin!
robert-knights-powerbook-g4-15:~ rwsknight$ cd /Work && chmod +x ./find_duplicates.pl
-bash: cd: /Work: No such file or directory
robert-knights-powerbook-g4-15:~ rwsknight$ ./find_duplicates.pl | tee duplicates.txt
-bash: ./find_duplicates.pl: No such file or directory
robert-knights-powerbook-g4-15:~ rwsknight$
I think I should search usersfor MY account then, Home folder then my folder called Work.
Please advise..... amazing script I have to say compliments.
rob
macbri - Jun 27, 2005 - 6:37 am
Hey Rob -
Well, whatever your Folder name is, that's where the script should go. So first, drag and drop the find_duplicates.pl file into that folder. Then in a terminal, \ type:
cd
but don't hit enter. press the spacebar, then drag your work folder (whatever it may be called) into the terminal window and it will be magically converted into the full directory name! Then hit enter. Ok so now if you type "ls" and hit enter you'll see among your stuff the find_duplicates.pl file. Now all you need to do is:
chmod +x ./find_duplicates.pl
./find_duplicates.pl
And sit back and enjoy a cup of coffee
--------
Brian -- MacOSX.com Technical Support
Styler66 - Jun 27, 2005 - 8:03 am
Brian Hi....
Script running now... thanks for the clear advise..
will post results. Big Thanx.
Rob