Jump to content

I Hate Regular Expressions! Can anyone help?


LordLanky

Recommended Posts

Hi All, i am trying to write a bit of php that will split a document written in html into chapters.

 

An example doc is:

 

<h1>The Work of an Idiot</h2>

<p>Edited by A Total Moron</p>

<h2>Chapter 1</h2>

<p>Here is some random text</p>

<h2>Chapter 2 - The Wrath of Khan's Mum</h2>

<p>Here is some more random text</p>

<h2>Chapter 3</h2>

<p>Again.. i can ramble for ages</p>

 

What i need is to split it into an array or a number of variables with each chunk being a chapter. So for example, an array called strChapters() being:

 

strChapters[0][text] => "<h1>The Work of an Idiot</h2><p>Edited by A Total Moron</p>"

strChapters[0][title] => ""

strChapters[1][text] => "<h2>Chapter 1</h2><p>Here is some random text</p>"

strChapters[1][title] => "Chapter 1"

strChapters[2][text] => "<h2>Chapter 2 - The Wrath of Khan's Mum</h2><p>Here is some more random text</p>"

strChapters[2][title] => "Chapter 2 - The Wrath of Khan's Mum"

strChapters[3][text] => "<h2>Chapter 3</h2><p>Again.. i can ramble for ages</p>"

strChapters[3][title] => "Chapter 3"

 

My guess is i need a robust regular expression to take into account the fact that a chapter string can contain a number and a title. I also need to have the title on its own as well.

 

I'm fairly good at php now, but this just escapes my experience. I was thinking of exploding on the word "chapter" but i dont want it to split it if it's just a word in a sentence, i.e. "as mentioned in chapter 2, Khan's not going to get any pocket money this month". Any help is really appreciated!

 

You have not closed your H1 tag correctly!!!!

Try this helpful function.

<?php
function parseArray($string, $openTag, $closeTag, $excluding = false) {
preg_match_all("($openTag(.*)$closeTag)siU", $string, $matches);
if($excluding) {
	return $matches[1];
}
return $matches[0];
}

$string = "<h1>The Work of an Idiot</h1>
<p>Edited by A Total Moron</p>
<h2>Chapter 1</h2>
<p>Here is some random text</p>
<h2>Chapter 2 - The Wrath of Khan's Mum</h2>
<p>Here is some more random text</p>
<h2>Chapter 3</h2>
<p>Again.. i can ramble for ages</p>";

$array = parseArray($string,"<h1>","</p>");
$array = array_merge($array,parseArray($string,"<h2>","</p>"));

print "<xmp>";
print_r($array);
print "</xmp>";
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.