smalltooSNORTYVILLE ... How-tos, Reviews, & Pop Culture Fun!

Copyright © Snortyville. All rights reserved.
Unauthorized copy of content without written permission is strictly prohibited

New: In the Box: Dungeon Twister Review, In the Box: Axis and Allies Miniatures Review, The Mist, Anita Blake Comics, Vampire Hunter Readerís Guide, Ratatouille




Formatting the directory output listing from Windows

In trying to catalog the many thousands of files that exist on the various hard disks and especially CDs and DVDs, I (J.) have run across a problem: namely that there is no easy way to format the output of a directory listing to make it easily exportable to Excel or other format.

There are programs out there that do this, but they usually cost money and do not seem to do the main thing I want, which is to find duplicate files on the disks. So, this is a project to see if I can write some software that will eventually print out a report on what files are duplicated on what disks, and maybe even a script that will help me to build DVDs of whatever content in the future

The first step is to run the ďdir /sĒ command in a command prompt window. This is a sample of the output:

 Volume in drive D is 040613_1655
 Volume Serial Number is F4CB-D8C4

 Directory of D:\

01/04/2004 12:48 AM       58,826,634 17664 Complete Warrior.pdf
12/11/2003 12:47 AM       49,258,067 3.5E - Core - Dungeon Masters Guide.pdf
12/07/2003 04:16 PM       57,531,969 3E - Core - Book of Exalted Deeds.pdf
01/04/2004 03:14 PM       56,267,706 88574 City of the Spider Queen.pdf
01/03/2004 01:55 PM       50,569,463 88581 Underdark.pdf

So we have the file name and the size available, which is what is needed at a basic level. Unfortunately, there can be spaces in the name of the file (or not), and thereís no way to tell Excel to concatenate the file name. But I can, and the way Iím doing it is with an awk script. Awk is a commonly used language in the Unix world for doing pattern recognition and text file manipulation. Awk, along with sed, should be in every administratorís toolbox. For my purposes, the script should print out the name of the file, then the size of the file, and then finally which disc actually has the file. Below is a sample of the output:

17664CompleteWarrior.pdf 58,826,634 on disc 040613_1655
3.5E-Core-DungeonMastersGuide.pdf 49,258,067 on disc 040613_1655
3E-Core-BookofExaltedDeeds.pdf 57,531,969 on disc 040613_1655
88574CityoftheSpiderQueen.pdf 56,267,706 on disc 040613_1655
88581Underdark.pdf 50,569,463 on disc 040613_1655

The awk script that produces the output shown is below:

if (NF >= 5 && $4 !~ /DIR/ && $5 != "free")
  for (i=5;i <= NF;i++)
   title[i - 5] = $i
  for (j=0;j <= NF - 5; j++)
   printf("%s", title[j])
  print(" "$4" on disc "volname)


The script is called with the following line on the command prompt:

gawk -v volname=040613_1655 -f test.awk c:\040613_1655.txt

The next step will be to take the output of the awk script and do something useful with that output, which will be the focus of the next installment.