Jump to content

Breaking long texts into 1000 word segments


GeeDeezy

Recommended Posts

I have long plain text files that I would break into segments of say 1000 words, then insert each segment into a table in a DB, using PHP.

 

I can handle the DB part, but I am at a loss at figuring out how to break a long body of text into shorter segments.

 

In my head, the logic would look like this:

 

Open text file for reading, start at beginning

Loop

Get next 1000 words, put them in a variable.

Create new DB record and insert them into the record in a specific field.

end Loop

Close text file

 

Any help available?

Edited by GeeDeezy
Link to comment
Share on other sites

There are some easy solutions, but the problem is what logic would you use to determine what a "word" is. The easiest solution would be to explode() the string using spaces, then use array_chunk() to create elements with 1,000 elements each and implode those back with spaces.

 

$words = explode(' ', $originalString);
$wordChunks = array_chunk($words, 1000);
foreach($wordChunks as &$words)
{
   $words = implode(' ', $words);
}

// $wordChunks is an array of 1000 word strings

 

But, as stated above - it depends what you consider a word. This could create some differences from what you expect.

Edited by Psycho
Link to comment
Share on other sites

Thanks. I would consider a word to be any set of 1 or more characters that end with a {space} or {carriage return}.

 

Then that will be more difficult to implement. Using just spaces to determine words would be very close. If your intent is just to get the content broken out into pieces that are relatively 1,000 words I would think that would suffice. But, if you want something that will split into exactly 1,000 words based upon specific requirements you will have more work to do. I guess preg_split() using space or line break for the split expression would be a good option.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.