VMS variable length record file format?
Posted: Thu Sep 04, 2003 4:06 am
Any chance for more file formats?!. Textpad seems to do a great job on various stream text files, be they CR, LF or CRLF terminated records.
However, I recently had to look at a file that came in from a VMS system
where the default file format for text files is something called 'variable length record format'. It gave me only the hex format to play with.
It's a pretty simple format. Each record start with a 16 bit little endian count word that indicates the length of that record. The databytes for the record follow that. After that the a 16-bit alligned next length word for the next record. Repeat untill length = -1 or offset = EOF. Minor side rules:
- records always start with the lengthword on an even file offset
- lengths are less than 32768.
- negative lengths indicate a deleted record or end of data in 512 byte block / file. (VMS no-span attribute)
optional format recognition/verification is easy... Just see if there is a 'reasonable' next lenght word on current offset + current length for 3 itterations. Reasonable being: less than 32K, but really more likely less than 255, leaving 0 bytes in odd positions.
Any chance to work on these file in other then binary mode?
Feel free to contact me for details if interested : Hein at hp dot com.
below a perl script that 'converts' such vms file to an NT or Unix file.
Regards,
Hein.
binmode STDIN;
while (read STDIN,$length_word,2) {
# avoid using "S" or "V". just do the math.
($length,$null) = unpack ("CC",$length_word);
$length += $null*256;
last if ($length > 32767);
$read = read STDIN,$line,$length;
print "$line\n";
read STDIN,$null,1 if ($length & 1);
}
However, I recently had to look at a file that came in from a VMS system
where the default file format for text files is something called 'variable length record format'. It gave me only the hex format to play with.
It's a pretty simple format. Each record start with a 16 bit little endian count word that indicates the length of that record. The databytes for the record follow that. After that the a 16-bit alligned next length word for the next record. Repeat untill length = -1 or offset = EOF. Minor side rules:
- records always start with the lengthword on an even file offset
- lengths are less than 32768.
- negative lengths indicate a deleted record or end of data in 512 byte block / file. (VMS no-span attribute)
optional format recognition/verification is easy... Just see if there is a 'reasonable' next lenght word on current offset + current length for 3 itterations. Reasonable being: less than 32K, but really more likely less than 255, leaving 0 bytes in odd positions.
Any chance to work on these file in other then binary mode?
Feel free to contact me for details if interested : Hein at hp dot com.
below a perl script that 'converts' such vms file to an NT or Unix file.
Regards,
Hein.
binmode STDIN;
while (read STDIN,$length_word,2) {
# avoid using "S" or "V". just do the math.
($length,$null) = unpack ("CC",$length_word);
$length += $null*256;
last if ($length > 32767);
$read = read STDIN,$line,$length;
print "$line\n";
read STDIN,$null,1 if ($length & 1);
}