Finding non-code page characters

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
melvers
Posts: 26
Joined: Tue Aug 05, 2003 1:52 pm

Finding non-code page characters

Post by melvers »

I have a couple of files that I get the standard character not on the cod page 1252 ... will be converted to the system characters.

I am fine with that. However, I would like to find the characters that were replaced and put some other character(s) there that can be displayed.

Does anyone know how to find the characters that got replaced?
RLRANDALLX
Posts: 3
Joined: Sun Jun 24, 2007 6:03 am

Finding non-code page characters

Post by RLRANDALLX »

I have the very same problem. I would like to chnage '?' to a less common char. such as '@' or '~' or '_' or '|'. '?' is way too common!
-rlrandallx
RLRANDALLX
Posts: 3
Joined: Sun Jun 24, 2007 6:03 am

Finding 'Extended ASCII' chars. before they change to '?'

Post by RLRANDALLX »

Here's a small Perl program to read a file and determine where the "bad" characters are when TextPad says it converted them to '?'
It gives the line #, offset, code value of the char., and tries to print the char. plus one other char. have fun! - rlrandallx

Code: Select all

open(FILE, "<:encoding(UTF-8)", "file.txt") or die $!;
binmode(STDOUT, ":utf8");
my @lines=<FILE>;
my $i=0;
my $j=0;
for ($i=0; $i<=$#lines; $i++)
  {
  my $l=length($lines[$i]);
  for ($j=0;$j<$l;$j++)
     {
      my $c=substr($lines[$i],$j,1);
      my $v=ord($c);
      if ( $v > 127 )     # must be in the extended ASCII set or Unicode
         {
         my $ec=substr($lines[$i],$j,2);
         print "l:$i c:$j v:$v $ec\n";
         }
     }
  }
Post Reply