GeeDeezy Posted February 7, 2013 Share Posted February 7, 2013 I have long plain text files that I would break into segments of say 1000 words, then insert each segment into a table in a DB, using PHP. I can handle the DB part, but I am at a loss at figuring out how to break a long body of text into shorter segments. In my head, the logic would look like this: Open text file for reading, start at beginning Loop Get next 1000 words, put them in a variable. Create new DB record and insert them into the record in a specific field. end Loop Close text file Any help available? Link to comment https://forums.phpfreaks.com/topic/274170-breaking-long-texts-into-1000-word-segments/ Share on other sites More sharing options...
Psycho Posted February 7, 2013 Share Posted February 7, 2013 There are some easy solutions, but the problem is what logic would you use to determine what a "word" is. The easiest solution would be to explode() the string using spaces, then use array_chunk() to create elements with 1,000 elements each and implode those back with spaces. $words = explode(' ', $originalString); $wordChunks = array_chunk($words, 1000); foreach($wordChunks as &$words) { $words = implode(' ', $words); } // $wordChunks is an array of 1000 word strings But, as stated above - it depends what you consider a word. This could create some differences from what you expect. Link to comment https://forums.phpfreaks.com/topic/274170-breaking-long-texts-into-1000-word-segments/#findComment-1410803 Share on other sites More sharing options...
GeeDeezy Posted February 8, 2013 Author Share Posted February 8, 2013 Thanks. I would consider a word to be any set of 1 or more characters that end with a {space} or {carriage return}. Link to comment https://forums.phpfreaks.com/topic/274170-breaking-long-texts-into-1000-word-segments/#findComment-1411027 Share on other sites More sharing options...
Psycho Posted February 8, 2013 Share Posted February 8, 2013 Thanks. I would consider a word to be any set of 1 or more characters that end with a {space} or {carriage return}. Then that will be more difficult to implement. Using just spaces to determine words would be very close. If your intent is just to get the content broken out into pieces that are relatively 1,000 words I would think that would suffice. But, if you want something that will split into exactly 1,000 words based upon specific requirements you will have more work to do. I guess preg_split() using space or line break for the split expression would be a good option. Link to comment https://forums.phpfreaks.com/topic/274170-breaking-long-texts-into-1000-word-segments/#findComment-1411041 Share on other sites More sharing options...
GeeDeezy Posted February 11, 2013 Author Share Posted February 11, 2013 I don't need it to be exactly 1000 words. I think "the next breaking point after 1000 words" would accurately describe my need. The next space or a carriage-return/line-feed would do. Link to comment https://forums.phpfreaks.com/topic/274170-breaking-long-texts-into-1000-word-segments/#findComment-1411796 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.