Jump to content

Need sort compare between a set of text and word list


bostongio

Recommended Posts

I have a 4500 word list, where each word or word fragment is in 1 to 10 different categories (represented by uppercase letters below), like:

 

year B G

abandon A K P

 

Then I have a text snippet like:

 

"I had a great time last year, but not so much this year."

 

I need to analyze the text snippet -- which can be up to 140,000 characters in length -- for the occurrence of words from the 4,500 word list and, if found, increments each of the word's category counters by one.

 

So in the above example, the word "year" corresponds to two categories -- B and G. Since it appears twice in the above sentence, these two categories would now have values of 2.

 

I've explored a number of possible solutions, but they don't seem to scale well when the text to be analyzed could be 140,000 characters in length, and it needs to look for each of the 4,500 words.

 

The word list could be placed into a corresponding array or mysql table, whichever works best for performance.

 

Any suggestions or thoughts on how to make this work without bogging down the server? Thanks!!

 

Gio

 

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.