Hello --
Is there a macro or technique to get a list of each unique word in a text file, with the count of occurrences?
Thanks for any help.
Larry Mehl
slmehl@earthlink.net
list of unique words in a text file
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
Install Python and you're good to go.
Copy this code and paste into Textpad.
Save it to something like C:\uniqueWordAndCount.py
Update the path for fileName
C:\>python uniqueWordAndCount.py
Copy this code and paste into Textpad.
Save it to something like C:\uniqueWordAndCount.py
Update the path for fileName
C:\>python uniqueWordAndCount.py
Code: Select all
#create an empty dictionary
d = {}
fileName = r"c:\bfellows\someFile.txt"
#iterate through each line
for currentLine in open(fileName, 'r').readlines():
#break each line into a list of tokens based on spaces
l = currentLine.split(' ')
for token in l:
#trim trailing whitespace (not sure if needed but what the heck)
token = token.rstrip()
#see if we've already encountered this token
if d.has_key(token):
#update the value
d[token] = d[token] + 1
else:
#create new entry
d[token] = 1
#order by token name ascending
l = d.keys()
l.sort()
for key in l:
print key + ' occurs ' + (str)( d[key]) + ' times '
I choose to fight with a sack of angry cats.
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
Sorry, I assume most people use TP for editing programs. No worries!
In this line, you need to change it to fit your file structure.
fileName = r"c:\bfellows\someFile.txt"
For example, say the file is in your My Documents folder and it is named sample.txt (and also assuming you are on Windows 2000 and your user id is slmehl)
fileName = r"C:\Documents and Settings\slmehl\My Documents\sample.txt"
If it's on the desktop, it'd be
fileName = r"C:\Documents and Settings\slmehl\Desktop\sample.txt"
If you are unsure of the path to the file, open it up in TP and go to File, Rename... and copy the text in the dialog box. Paste that into fileName, save the file and you should be good to go.
Please don't be offended if you feel I'm talking down to you, I just wanted to make sure I fully explained everything. Let me know if I failed.
And in case anyone is trying to decipher Python (which is not hard), the r before the double quote means raw string which allows the use of normal backslashes. Otherwise, it'd look really confusing like fileName = "C:\\Documents and Settings\\slmehl\\Desktop\\sample.txt" If you use Unicode characters, preface your string with a u and plug away.
In this line, you need to change it to fit your file structure.
fileName = r"c:\bfellows\someFile.txt"
For example, say the file is in your My Documents folder and it is named sample.txt (and also assuming you are on Windows 2000 and your user id is slmehl)
fileName = r"C:\Documents and Settings\slmehl\My Documents\sample.txt"
If it's on the desktop, it'd be
fileName = r"C:\Documents and Settings\slmehl\Desktop\sample.txt"
If you are unsure of the path to the file, open it up in TP and go to File, Rename... and copy the text in the dialog box. Paste that into fileName, save the file and you should be good to go.
Please don't be offended if you feel I'm talking down to you, I just wanted to make sure I fully explained everything. Let me know if I failed.
And in case anyone is trying to decipher Python (which is not hard), the r before the double quote means raw string which allows the use of normal backslashes. Otherwise, it'd look really confusing like fileName = "C:\\Documents and Settings\\slmehl\\Desktop\\sample.txt" If you use Unicode characters, preface your string with a u and plug away.
I choose to fight with a sack of angry cats.
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm
talleyrand, is there a way to set my locale in Python? I've been
googling around to get the l.sort() to work for my character set.
I found this code and can not get it to work:
Is there a way to pass the two lines from this code to setlocale?
Thank's for your most excellent code!
googling around to get the l.sort() to work for my character set.
I found this code and can not get it to work:
Code: Select all
import locale
locale.setlocale(locale.LC_ALL, '')
Code: Select all
for x in locale.getdefaultlocale():print x
Then I open up and see
the person fumbling here is me
a different way to be
the person fumbling here is me
a different way to be
- talleyrand
- Posts: 624
- Joined: Mon Jul 21, 2003 6:56 pm
- Location: Kansas City, MO, USA
- Contact:
I haven't forgotten about your question, s_reynisson. To be quite honest, I've never touched locale. The lines you asked about worked fine for me but I believe you were wanting to change your locale to allow the sort to work for your language encoding, no?
I did a fair bit of googling and maybe what I found was outdated or I didn't fully understand it (which is quite possible) but it sounded like there might be an issue with switching locales without rebooting. I seem unable to switch to any local but my default locale. I tried using en_GB, de_DE, GB, DE, de and some other codes I could find but couldn't get anything but en or US to take (they're the same thing). Hmmm, my guess is just be an American, which is where the language was written and paid for.
Seriously though, if you can give me some more info on what I need to be looking for, I might be able to search better or I'll just ask the geeks at python.org. They know everything!
Code: Select all
F:\exposed>python
ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on
Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> for x in locale.getdefaultlocale():print x
...
en_US
cp1252
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> for x in locale.getdefaultlocale():print x
...
en_US
cp1252
>>>
Seriously though, if you can give me some more info on what I need to be looking for, I might be able to search better or I'll just ask the geeks at python.org. They know everything!
- s_reynisson
- Posts: 939
- Joined: Tue May 06, 2003 1:59 pm