list of unique words in a text file

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
mehl
Posts: 8
Joined: Wed Oct 01, 2003 7:39 pm

list of unique words in a text file

Post by mehl »

Hello --

Is there a macro or technique to get a list of each unique word in a text file, with the count of occurrences?

Thanks for any help.

Larry Mehl
slmehl@earthlink.net
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Install Python and you're good to go.

Copy this code and paste into Textpad.
Save it to something like C:\uniqueWordAndCount.py
Update the path for fileName
C:\>python uniqueWordAndCount.py

Code: Select all

#create an empty dictionary
d = {}
fileName = r"c:\bfellows\someFile.txt"
#iterate through each line
for currentLine in open(fileName, 'r').readlines():
   #break each line into a list of tokens based on spaces
   l = currentLine.split(' ')
   for token in l:
      #trim trailing whitespace (not sure if needed but what the heck)
      token = token.rstrip()
      #see if we've already encountered this token
      if d.has_key(token):
         #update the value
         d[token] = d[token] + 1
      else:
         #create new entry
         d[token] = 1


#order by token name ascending
l = d.keys()
l.sort()

for key in l:
   print key + ' occurs ' + (str)( d[key]) + ' times '
I choose to fight with a sack of angry cats.
mehl
Posts: 8
Joined: Wed Oct 01, 2003 7:39 pm

Post by mehl »

Thank you.

I am new at using TextPad for anything other than simple text editing.

What do I do to

"Update the path for fileName
C:\>python uniqueWordAndCount.py"

Larry
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

Sorry, I assume most people use TP for editing programs. No worries!

In this line, you need to change it to fit your file structure.
fileName = r"c:\bfellows\someFile.txt"

For example, say the file is in your My Documents folder and it is named sample.txt (and also assuming you are on Windows 2000 and your user id is slmehl)
fileName = r"C:\Documents and Settings\slmehl\My Documents\sample.txt"
If it's on the desktop, it'd be
fileName = r"C:\Documents and Settings\slmehl\Desktop\sample.txt"

If you are unsure of the path to the file, open it up in TP and go to File, Rename... and copy the text in the dialog box. Paste that into fileName, save the file and you should be good to go.

Please don't be offended if you feel I'm talking down to you, I just wanted to make sure I fully explained everything. Let me know if I failed.

And in case anyone is trying to decipher Python (which is not hard), the r before the double quote means raw string which allows the use of normal backslashes. Otherwise, it'd look really confusing like fileName = "C:\\Documents and Settings\\slmehl\\Desktop\\sample.txt" If you use Unicode characters, preface your string with a u and plug away.
I choose to fight with a sack of angry cats.
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

talleyrand, is there a way to set my locale in Python? I've been
googling around to get the l.sort() to work for my character set.
I found this code and can not get it to work:

Code: Select all

import locale
locale.setlocale(locale.LC_ALL, '')
Is there a way to pass the two lines from this code to setlocale?

Code: Select all

for x in locale.getdefaultlocale():print x
Thank's for your most excellent code!
Then I open up and see
the person fumbling here is me
a different way to be
User avatar
talleyrand
Posts: 624
Joined: Mon Jul 21, 2003 6:56 pm
Location: Kansas City, MO, USA
Contact:

Post by talleyrand »

I haven't forgotten about your question, s_reynisson. To be quite honest, I've never touched locale. The lines you asked about worked fine for me but I believe you were wanting to change your locale to allow the sort to work for your language encoding, no?

Code: Select all

F:\exposed>python
ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on
Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> for x in locale.getdefaultlocale():print x
...
en_US
cp1252
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> for x in locale.getdefaultlocale():print x
...
en_US
cp1252
>>>
I did a fair bit of googling and maybe what I found was outdated or I didn't fully understand it (which is quite possible) but it sounded like there might be an issue with switching locales without rebooting. I seem unable to switch to any local but my default locale. I tried using en_GB, de_DE, GB, DE, de and some other codes I could find but couldn't get anything but en or US to take (they're the same thing). Hmmm, my guess is just be an American, which is where the language was written and paid for. ;)

Seriously though, if you can give me some more info on what I need to be looking for, I might be able to search better or I'll just ask the geeks at python.org. They know everything!
User avatar
s_reynisson
Posts: 939
Joined: Tue May 06, 2003 1:59 pm

Post by s_reynisson »

Keeping in mind the amount of data on this on Google I'm expecting a
FAQ on this in the near future :wink:
The closest I came to solution on this was "wait for the next version" :roll:
Then I open up and see
the person fumbling here is me
a different way to be
Post Reply