GeeDeezy Posted February 7, 2013 Share Posted February 7, 2013 (edited) I have long plain text files that I would break into segments of say 1000 words, then insert each segment into a table in a DB, using PHP. I can handle the DB part, but I am at a loss at figuring out how to break a long body of text into shorter segments. In my head, the logic would look like this: Open text file for reading, start at beginning Loop Get next 1000 words, put them in a variable. Create new DB record and insert them into the record in a specific field. end Loop Close text file Any help available? Edited February 7, 2013 by GeeDeezy Quote Link to comment Share on other sites More sharing options...
Psycho Posted February 7, 2013 Share Posted February 7, 2013 (edited) There are some easy solutions, but the problem is what logic would you use to determine what a "word" is. The easiest solution would be to explode() the string using spaces, then use array_chunk() to create elements with 1,000 elements each and implode those back with spaces. $words = explode(' ', $originalString); $wordChunks = array_chunk($words, 1000); foreach($wordChunks as &$words) { $words = implode(' ', $words); } // $wordChunks is an array of 1000 word strings But, as stated above - it depends what you consider a word. This could create some differences from what you expect. Edited February 7, 2013 by Psycho Quote Link to comment Share on other sites More sharing options...
GeeDeezy Posted February 8, 2013 Author Share Posted February 8, 2013 Thanks. I would consider a word to be any set of 1 or more characters that end with a {space} or {carriage return}. Quote Link to comment Share on other sites More sharing options...
Psycho Posted February 8, 2013 Share Posted February 8, 2013 Thanks. I would consider a word to be any set of 1 or more characters that end with a {space} or {carriage return}. Then that will be more difficult to implement. Using just spaces to determine words would be very close. If your intent is just to get the content broken out into pieces that are relatively 1,000 words I would think that would suffice. But, if you want something that will split into exactly 1,000 words based upon specific requirements you will have more work to do. I guess preg_split() using space or line break for the split expression would be a good option. Quote Link to comment Share on other sites More sharing options...
GeeDeezy Posted February 11, 2013 Author Share Posted February 11, 2013 I don't need it to be exactly 1000 words. I think "the next breaking point after 1000 words" would accurately describe my need. The next space or a carriage-return/line-feed would do. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.