1 File into many

Crispy · Post by **Crispy** » Thu Jul 27, 2006 3:06 pm

I have a file that has two columns and holds HR data, the first contains a reference for the department, the second holds a reference for the employee ie,

Dept,Employee
P0073,00064137
P0073,00101104
P0073,00003956
P0074,00100343
P0074,00046509

The file holds approx 9500 records for approx 25 departments.

What I'd like to do is have a single column list file for each department. The file would have one column holding the employee reference and be called P0073.txt, P0074.txt etc.

All Employee numbers are unique and 8 characters, all department codes are P and 4 numerals.

Any ideas as how to achieve this easily?
Many thanks.

marbx · Post by **marbx** » Thu Jul 27, 2006 7:26 pm

I don't think there's any kind of search & replace or macro that'll work over multiple files. With Textpad the quickest and easiest I can think of is to copy the original file 25 times. It's probably quicker to use the command line to do this; run:

Code: Select all

xcopy original.txt p0073.txt

Then up arrow, backspace and replace the 3 with a 4, and so on. Far easier than dragging the file with the mouse 25 to copy, targeting each file with the mouse, clicking to rename, typing... yeah. Anyway, you'd just run two search & replaces in each file: in P0073.txt you'd replace ^P0073, with "" (nothing) to give you your single column data, and replace ^.*,.*\n with "" (nothing) to get rid of all the lines you don't need - they're the only ones that still have commas in them. Then replace ^P0074 in the next file, and so on.

If you have Perl, or some other kind of scripting language you're familiar with, you can write something quick in it to do all this for you, for example:

Code: Select all

perl -ne "if(/^(P\d+),(\d+)/){open O,qq(>> $1.txt) or die;print O qq($2\n);close O}" original.txt

But of course you'd have to have Perl installed in order to use that.

Crispy · Post by **Crispy** » Thu Jul 27, 2006 10:37 pm

Thanks for the response Marbx, unfortunately I don't use Perl, but cheers for the RegEx.

Post by **MudGuard** » Fri Jul 28, 2006 6:19 am

First, sort the file, so that all lines for each department are consecutive (attention, this might change the order of the lines for departments).

Then, select the lines of the first department.
Then, go to File - Save As, Choose the appropriate file name, make sure "Selection only" is checked, then save.
Then, delete selection.

Start from the beginning ;-)

Or, use a script in whatever language you prefer to do the job.

bveldkamp · Post by **bveldkamp** » Fri Jul 28, 2006 6:51 am

If you have access to a Linux system, enter this line at the command prompt (replace dept.txt with your own filename) :

Code: Select all

for f in `cut -f1 -d, dept.txt | sort -u` ; do grep $f dept.txt | cut -f2 -d, > $f.txt ; done