|
Formatting the directory output listing from Windows
In trying to catalog the many thousands of files that exist on the various hard disks and especially CDs and DVDs, I (J.) have run across a problem: namely that there is no easy way to format the output of a directory listing to make it easily exportable to Excel or other format.
There are programs out there that do this, but they usually cost money and do not seem to do the main thing I want, which is to find duplicate files on the disks. So, this is a project to see if I can write some software that will eventually print out a report on what files are duplicated on what disks, and maybe even a script that will help me to build DVDs of whatever content in the future
The first step is to run the “dir /s” command in a command prompt window. This is a sample of the output:
Volume in drive D is 040613_1655 Volume Serial Number is F4CB-D8C4
Directory of D:\
01/04/2004 12:48 AM 58,826,634 17664 Complete Warrior.pdf 12/11/2003 12:47 AM 49,258,067 3.5E - Core - Dungeon Masters Guide.pdf 12/07/2003 04:16 PM 57,531,969 3E - Core - Book of Exalted Deeds.pdf 01/04/2004 03:14 PM 56,267,706 88574 City of the Spider Queen.pdf 01/03/2004 01:55 PM 50,569,463 88581 Underdark.pdf
So we have the file name and the size available, which is what is needed at a basic level. Unfortunately, there can be spaces in the name of the file (or not), and there’s no way to tell Excel to concatenate the file name. But I can, and the way I’m doing it is with an awk script. Awk is a commonly used language in the Unix world for doing pattern recognition and text file manipulation. Awk, along with sed, should be in every administrator’s toolbox. For my purposes, the script should print out the name of the file, then the size of the file, and then finally which disc actually has the file. Below is a sample of the output:
17664CompleteWarrior.pdf 58,826,634 on disc 040613_1655 3.5E-Core-DungeonMastersGuide.pdf 49,258,067 on disc 040613_1655 3E-Core-BookofExaltedDeeds.pdf 57,531,969 on disc 040613_1655 88574CityoftheSpiderQueen.pdf 56,267,706 on disc 040613_1655 88581Underdark.pdf 50,569,463 on disc 040613_1655
The awk script that produces the output shown is below:
BEGIN{ getline getline } { if (NF >= 5 && $4 !~ /DIR/ && $5 != "free") { for (i=5;i <= NF;i++) { title[i - 5] = $i } for (j=0;j <= NF - 5; j++) { printf("%s", title[j]) } print(" "$4" on disc "volname) }
} END{ }
The script is called with the following line on the command prompt:
gawk -v volname=040613_1655 -f test.awk c:\040613_1655.txt
The next step will be to take the output of the awk script and do something useful with that output, which will be the focus of the next installment.
J.
|