Jump to content

Grouping words


Cantfigureitout

Recommended Posts

Hi everyone,

 

I'm developing a little tool that analyzes content from a wide variety of sources in other words, a lot of data.

 

I need to analyze the most common paired and higher word groups.

 

So example:

 

"Johny went to the store after which Johny went to buy gas and at the end of the day Johny went home!"

 

What I'm trying to achieve is scan for groups of 2 words, groups of 3 words etc. so in this case it woud result in:

 

Most common groups of 2 words:

#1: "Johny went"  (found 3 times)

#2: "went to"  (found 2 times)

etc.

And same thing for groups of 3 words and possibly 4 depeng on how intensive it is

 

I can either dump ALL data into one huge variable containing all the text to be analyzed (around 100,000 words on average at the moment)

Or

(and this might result in better groupings too) split the content whenever a . or , or ; or ? or ! occurs and store things in an array (probably faster hehe)

 

Anyway, what do you guys think is the best way to then analyze the contents to count for word groupings?

 

Link to comment
Share on other sites

By the way, my backup plan is to grab the top 500 most common SINGLE words and run it through the whole 100,000 words content and grab the words before and after it and come up with the word groups that way. But that's a backup if my above question would be too intensive or impossible (then again, nothing is impossible)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.