Parsing Text - Capture text between two tags

Rottingham · March 10, 2007

This is probably a very basic question, but here I go.

I want to have text in files, such as

{body}

{/body}

and then be able to open that file and grab all of the data inbetween those tags. I could open the file and go through it until I see the {body} tag and then store data until I reach {/body} but that is a bad way to do it.

Any suggestions? Thanks!

Glyde · March 10, 2007

preg_match("@\{body\}(.+?)\{/body\}@is", $input, $matches);
$bodyText = $matches[1] ? $matches[1] : "";

Just make sure {/body} doesn't actually appear anywhere in the body text.

Rottingham · March 10, 2007

Thank you ! That was very helpful and fully functional!

Now I want to advance that. I'm not very good at the regular expression stuff yet, so thanks for helping.

Lets say my file includes...

<html>
<head></head>
<title>Test</title>
<body>
Some content
</body>
</html>

{row}
<tr>
</tr>
{/row}

Now I have two functions...

function get_template($file) {}

Simply loads the template file, and

function get_template_tag($tag)
{
      preg_match("@\{$tag\}(.+?)\{/$tag\}@is", $input, $matches);
      return $bodyText = $matches[1] ? $matches[1] : "";
}

The get_template_tags works great, but I want the get_template function to only read in everything up to the first tag found! So I would look for EVERYTHING up to a { match. Can you point me in the right direction for this as well?

Glyde · March 11, 2007

Depends on what you want.

This will get up to the first tag

preg_match("@(.+?)\{[^\}]+\}@is", $input, $matches);
$bodyText = $matches[1] ? $matches[1] : "";

cjohnweb · July 24, 2010

It sounds to me as though you need PHP XPath!

I don't know how familiar you are with XML, but I'll try to cover the basics for you:

XML is more of a format than anything else. For instance, Instant Messenger programs such as Google Talk use the JABBER protocol to communicate, which uses XML to communicate. Because of the way tags are nested into each other there is also a parent/child relationship between each entry/tag. An HTML document for instance is an XML document formatted to HTML specifications. So taking HTML aside, an XML document could look like anything you want, example:

<?xml version="1.0"?>

<whateveryouwant>

<tag1>
<tag2>Some info</tag2>
<another-tag attribute="xxx">Other information</another-tag>
</tag1>

<tag name="tag2">
<tag2>Some info</tag2>
<another-tag attribute="xxx">Other information</another-tag>
</tag>

</whateveryouwant>

You can see how an XML document can be used to transfer nearly any data in an organized manner. Thus you can see that HTML is in fact an XML document, just written to HTML specifications.

Now let's get to the heart of the matter:

XPath uses expressions to identify specific pieces of information within an XML document. XPath is much simpler than regular expressions too, so you will be able to modify it to meet your needs really easily!

So here is what I suggest:

If you can, your text files should be in an XML/HTML format. Use PHP to open the file and load the content into a variable.

Here is my working example, I tested it too:

xpath-test.php

<?php

$filename = "txt-file.txt"; // xml formatted text file...

// open the file and load contents into $string
$fh = fopen($filename, "r") or die("Can't open file");
$string = fread($fh, filesize($filename)); 
fclose($fh);

// Get it ready for XPath
$xml = new SimpleXMLElement($string);

// Specify your XPath query / expression
$result = $xml->xpath('/body');

// Loop through each result XPath has returned
while(list( ,$node) = each($result)) {
    echo '/body: ',$node,"\n";
}


?>

txt-file.txt

<?xml version="1.0"?>

<body>
This line of information will be pulled because it is in between the body tags!
</body>

The script should output the following:

/body: 
This line of information will be pulled because it is in between the body tags!

Good luck!

Find more information on XPath formatting:

My original blog post on this:

http://iluvjohn.com/knowledge-database/computers/general-cross-platform/web-internet/php-mysql-curl/xpath/php-xpath-xml-formatting-722/

The W3C XPath Tutorials:

http://www.w3schools.com/xpath/default.asp

Wikipedia:

http://en.wikipedia.org/wiki/XPath

The W3C on Xpath:

http://www.w3.org/TR/xpath/

Sign In

Parsing Text - Capture text between two tags

Recommended Posts

Rottingham

Link to comment

Share on other sites

Glyde

Link to comment

Share on other sites

Rottingham

Link to comment

Share on other sites

Glyde

Link to comment

Share on other sites

cjohnweb

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information