Large text file truncating on save

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Large text file truncating on save

Post by crater »

I have a large text file 40.8 MB (42,811,392 bytes) which is a 7-bit ASCII file created by a software utility I'm developing.

To test part of its capability, I am embedding whitespaces (newlines and tabs) into it within TextPad (4.7.2). However, when I save the file, it is truncated to about 9 Mb. This happens consistently.

When I use MS Word to perform the same task, it saves the file OK.

My PC is running Windows XP Professional, has 512 Kb memory, and a 2.53 Mhz processor, and has about 23 Gb free disk space.
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

Textpad has an option to remove trailing whitespace (whitespace at the end of a line) when saving.
Is this option activated?

The option is in Configure - Preferences - Document Classes - Your document class
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

How is this helpful? I'm trying to ADD whitespace, not REMOVE it.
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

If that option is set, the space you add will be removed. MudGuard is advising you to make sure there is NO checkmark to Strip trailing spaces from lines when saving.
Hope this was helpful.............good luck,
Bob
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

I appreciate you are trying to help, but you all seem to be missing the point.

WHY is the file being truncated?????

Whether I add spaces or remove them, or TextPad removes them, how does the addition of 3 or 4 whitespaces cause the file to be truncated from 42Mb to 9Mb????

It has to do with the size of the file, as smaller files up to 31Mb that I've tried don't seem to be affected.
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

I have a 41meg txt file and I'm unable to confirm your results.
Note that I'm not using 7-bit ASCII. Perhaps you can strip possible confidential
data from it, make sure you get the same results, zip it and send to some of us?
(PM's are ok on this forum and my 41meg txt file compresses about 95% in rar format :shock:)
Using WinXPsp1 and TP472.
Then I open up and see
the person fumbling here is me
a different way to be
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

The 7-bit ASCII file is encoded from a binary file - a TIF file in actual fact.

Consequently it only compresses down to 25 Mb with RAR compression.
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

TIFF files I can get but what util to use when converting them to 7-bit ASCII?
Found some util called 7bit.exe from 1999, converted a small TIFF file and TP
reports null chars, cannot modify the file without deleting them, my test would
hardly be the same as yours. Guess I'm stumped.
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

[old man voice]Bah, kids these days... Using this here internet without knowing their history. [/old man voice]

I was able to uuencode a tiff file a-ok and then edit it without having TP truncate the results (granted the file is garbage now but it didn't truncate). I suspect that s_reynisson's observation that the file has a null character in it will hold true.
I choose to fight with a sack of angry cats.
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

There are no null characters, unless TextPad is inserting it/them. If there were, my decoder would abort at that point. As I said in a previous post, Word does the job that TextPad doesn't seem to be able to handle.
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Well then, post it up here or PM it to me and I'll host it and let's see if someone can verify what's going on.
I choose to fight with a sack of angry cats.
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

I've put the ASCII file onto my web site, if you want to download it to try it out.

The url is:
http://homepages.nildram.co.uk/~iccarte ... footac.zip

When you've downloaded it, unzip it, and just open it with TextPad and save again as a different name. The saved file will be about 9 Mb.
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

On my system TP only reads 9.255.663 bytes or 8.82 mb.
The next four chars in the original file are hex 21 34 2A 28.
1. TP reports 9.255.663 bytes read
2. After save from TP Explorer reports 9.255.665 bytes
3. Reopen saved file and TP again reports 9.255.663 bytes
Edit 1 When I open the file with file format binary (file->open dialog) it reports
42.810.095 bytes read.
Edit 2 When I open the file using EmEdit 4.03 it reports 42.810.095 bytes read
in one line and dies (CPU goes to 100% etc, had to kill it in task man)
Edit 3 jEdit 4.2b9 reports Java error, Netbeans 3.6b dies, Crimson Editor 3.6 reads
all the file, takes a while, and reports 10455 lines, each line with 4095 bytes in it,
dies when I try to save the file (task man reports 1,7 gigabytes of memory in use).
Anyone with UE installed for the acid test? ;)
Last edited by s_reynisson on Mon Mar 08, 2004 7:42 pm, edited 3 times in total.
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Well, my initial observation is that when the file is opened in TextPad, it reads everything into a single line. It appears the upper bound for reading in a line length is 9,255,644 characters. Word breaks the file at the hyphens so maybe that's why it can handle it? I'll poke around more at home, where my machine has some power, unlike this dog.

[edit]Bah! Beat to the punch![/edit]
I choose to fight with a sack of angry cats.
crater
Posts: 15
Joined: Sun Mar 07, 2004 9:42 am
Location: Rotherham, UK

Post by crater »

Thanks. I appreciate the help.
Post Reply