Jump to content

Recommended Posts

Hi All, i am trying to write a bit of php that will split a document written in html into chapters.

 

An example doc is:

 

<h1>The Work of an Idiot</h2>

<p>Edited by A Total Moron</p>

<h2>Chapter 1</h2>

<p>Here is some random text</p>

<h2>Chapter 2 - The Wrath of Khan's Mum</h2>

<p>Here is some more random text</p>

<h2>Chapter 3</h2>

<p>Again.. i can ramble for ages</p>

 

What i need is to split it into an array or a number of variables with each chunk being a chapter. So for example, an array called strChapters() being:

 

strChapters[0][text] => "<h1>The Work of an Idiot</h2><p>Edited by A Total Moron</p>"

strChapters[0][title] => ""

strChapters[1][text] => "<h2>Chapter 1</h2><p>Here is some random text</p>"

strChapters[1][title] => "Chapter 1"

strChapters[2][text] => "<h2>Chapter 2 - The Wrath of Khan's Mum</h2><p>Here is some more random text</p>"

strChapters[2][title] => "Chapter 2 - The Wrath of Khan's Mum"

strChapters[3][text] => "<h2>Chapter 3</h2><p>Again.. i can ramble for ages</p>"

strChapters[3][title] => "Chapter 3"

 

My guess is i need a robust regular expression to take into account the fact that a chapter string can contain a number and a title. I also need to have the title on its own as well.

 

I'm fairly good at php now, but this just escapes my experience. I was thinking of exploding on the word "chapter" but i dont want it to split it if it's just a word in a sentence, i.e. "as mentioned in chapter 2, Khan's not going to get any pocket money this month". Any help is really appreciated!

 

You have not closed your H1 tag correctly!!!!

Try this helpful function.

<?php
function parseArray($string, $openTag, $closeTag, $excluding = false) {
preg_match_all("($openTag(.*)$closeTag)siU", $string, $matches);
if($excluding) {
	return $matches[1];
}
return $matches[0];
}

$string = "<h1>The Work of an Idiot</h1>
<p>Edited by A Total Moron</p>
<h2>Chapter 1</h2>
<p>Here is some random text</p>
<h2>Chapter 2 - The Wrath of Khan's Mum</h2>
<p>Here is some more random text</p>
<h2>Chapter 3</h2>
<p>Again.. i can ramble for ages</p>";

$array = parseArray($string,"<h1>","</p>");
$array = array_merge($array,parseArray($string,"<h2>","</p>"));

print "<xmp>";
print_r($array);
print "</xmp>";
?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.