Jump to content

preg_split to enter line break into array


breakaway

Recommended Posts

I am using a version of nrg_alpha's code from this thread:

http://www.phpfreaks.com/forums/index.php/topic,268001.0.html

$article = $_POST['article'];

$chunks = preg_split('#[\r\n]#', $article, -1, PREG_SPLIT_NO_EMPTY);

foreach($chunks as $val){
    preg_match_all('#(?:\s[a-z]\.(?:[a-z]\.)?|.)+?[.?!]+#i', $val, $paragraph);
    foreach($paragraph[0] as $val){

$sentences[] = ltrim($val);
    }
}

 

$article is defined on the previous page by a written multi-paragraph article entered into the text box.

 

When the array is printed on the next page, it does accurately separate the sentences (thanks so much nrg_alpha) but it doesn't enter any of the new paragraph line breaks into the array, and I would like it to..., having an empty line is fine, just something so i can see when looking at the array sentence by sentence, i can see where a new paragraph is.

 

Any suggestions?

Link to comment
Share on other sites

If I understand correctly, you would like the empty elements within the array to signify a new sentence (from the standpoint of a carriage return / new line)?

If so, then perhaps something like this?

 

Example:

$article = 'This is the first sentence.
This is the second one.
And this is the third one.';

$chunks = preg_split('#(\r)#', $article, -1, PREG_SPLIT_DELIM_CAPTURE);
$chunks = array_map('ltrim', $chunks);
echo '<pre>'.print_r($chunks, true);

 

Output:

Array
(
    [0] => This is the first sentence.
    [1] => 
    [2] => This is the second one.
    [3] => 
    [4] => And this is the third one.
)

 

By capturing the carriage return, we keep this result due to the flag PREG_SPLIT_DELIM_CAPTURE. That incunjunction with an ltrim callback applied to the whole resulting array, we get a nice single spaced empty entry. Granted, this separates sentences based on the use of carriage returns (\r) and newlines (\n).

Link to comment
Share on other sites

The sentences within a paragraph should have no space between them, but when there is a paragraph break, I'd like the single spaced empty space to show up between paragraphs.

 

Well, my point was that the pattern splits when it runs into a carriage return (in the above example, I simplified things by using simple sentences between breaks instead of complete paragraphs. I tweaked the pattern a bit and used two small paragraphs as a test:

 

$article = 'Phasellus molestie rhoncus odio, vitae vehicula nisl varius eget. Nullam eget aliquam nibh. Quisque turpis diam, adipiscing non consequat a, hendrerit sit amet lacus. Donec commodo egestas ipsum id placerat. Nulla aliquet posuere neque, eget ultricies nulla feugiat et.

Nunc et nunc molestie nibh viverra pretium at eu felis. Cras nec quam eros. Nunc in velit ac mauris consequat tempor. Etiam pretium eros non erat molestie dapibus.';

$chunks = preg_split('#(\R+)#', $article, -1, PREG_SPLIT_DELIM_CAPTURE);
$chunks = array_map('ltrim', $chunks);
echo '<pre>'.print_r($chunks, true);

 

the \R is a shorthand for \r\n.. so using \R+ is basically saying, whenever someone hits the enter key one ore more consecutive times, this is how everything will be split up. Granted, if someone types a sentence, hits enter (with the intention of still keeping those sentences within the same paragraph), the pattern will still split this.. if this is going to be a problem, then perhaps changing \R+ to \R{2,}

 

EDIT - If this is still not what you are looking for, please provide a small sample set of paragraphs, and show what the end result array should look like.

 

Link to comment
Share on other sites

Sorry... I need to work on putting words to what I have in my head, so i greatly apologize...

 

 

For the following test paragraphs:

 

 

Phasellus molestie rhoncus odio, vitae vehicula nisl varius eget. Nullam eget aliquam nibh. Quisque turpis diam, adipiscing non consequat a, hendrerit sit amet lacus. Donec commodo egestas ipsum id placerat. Nulla aliquet posuere neque, eget ultricies nulla feugiat et.

 

Nunc et nunc molestie nibh viverra pretium at eu felis. Cras nec quam eros. Nunc in velit ac mauris consequat tempor. Etiam pretium eros non erat molestie dapibus.

 

I'd want it to look like...

 

Array
(
    [0] => Phasellus molestie rhoncus odio, vitae vehicula nisl varius eget.
    [1] => Nullam eget aliquam nibh. Quisque turpis diam, adipiscing non consequat a, hendrerit sit amet lacus.
    [2] => Donec commodo egestas ipsum id placerat.
    [3] => Nulla aliquet posuere neque, eget ultricies nulla feugiat et.
    [4] => 
    [8] => Nunc et nunc molestie nibh viverra pretium at eu felis.
    [9] => Cras nec quam eros.
    [10] => Nunc in velit ac mauris consequat tempor.
    [11] => Etiam pretium eros non erat molestie dapibus.
)

 

I definitely appreciate all of the help you've given me....

 

Essentially, I'd want a mix of the original code, and the code you just posted.

 

I still want it to separate the sentences, but i also want it to recognize the paragraph breaks.

Link to comment
Share on other sites

Ah, I see now what the end result should be like.. yeah, cags pattern will work (if the goal is only to split by periods). If there is a sentence with say an exclamation mark or question mark, it won't work. In that case, you would have to change the positive look behind assertion to something like: (?<=[.!?])

Link to comment
Share on other sites

So, I don't know if I am communicating it well enough or not...

 

I LOVE the way that the code in the OP has the sentences broken down (it's beyond just periods and exclamation points, and question marks) and have already implemented it into a program.

 

I basically just want to modify the existing coding shown in the OP to make a blank line in the array when there is a paragraph break.

 

 

Link to comment
Share on other sites

The following worked....

 

<?php
$article = 'Phasellus molestie rhoncus odio, vitae vehicula nisl varius eget. Nullam eget aliquam nibh. Donec commodo egestas ipsum id placerat.

Nulla aliquet posuere neque, eget ultricies nulla feugiat et. Nunc et nunc molestie nibh viverra pretium at eu felis. Cras nec quam eros.

Nunc in velit ac mauris consequat tempor. Etiam pretium eros non erat molestie dapibus. Quisque turpis diam, adipiscing non consequat a, hendrerit sit amet lacus.
';

$chunks = preg_split('#[\r\n]#', $article, -1, PREG_SPLIT_NO_EMPTY);

foreach($chunks as $val){
preg_match_all('#(?:\s[a-z]\.(?:[a-z]\.)?|.)+?[.?!]+#i', $val, $paragraph);
foreach($paragraph[0] as $val){
	$sentences[] = ltrim($val);
}
$sentences[] = "";
}
array_pop($sentences);

echo '<pre>'.print_r($sentences, true);
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.