Thursday, November 4, 2010

Popularity tag cloud program

Popularity tag clouds are used very commonly in blogs, forums and websites these days to show off the topics those sites talk about frequently. The code snippet I am going to present here is a simple program that finds out the frequency of each word in a text file.

file=open('workfile.txt', 'r')#save a text file with some text in same #directory as your .py file
fin=file.read()

list1=(fin.split(' '))
excl_words=['and','is','in','the','an','which','both','of','off'] #words #                                                        to be excluded

freq_data={}
for char in list1:
    if char in excl_words:continue
    #print(char,":",list1.count(char))
    freq_data[char]=list1.count(char)
print (freq_data)

As I've already mentioned above, this program does nothing except count the number of times each word appears in the text file. This could be a base for building a full-fledged popularity cloud generator I'll be working on.I'll keep you updated on that.

PS. Some possible improvements to the program-
  1. Change all words to lower case before counting.

Do send in your suggestions and ideas!!