Jump to content

Parsing Text - Capture text between two tags


Rottingham

Recommended Posts

This is probably a very basic question, but here I go.

 

I want to have text in files, such as

 

{body}

{/body}

 

and then be able to open that file and grab all of the data inbetween those tags. I could open the file and go through it until I see the {body} tag and then store data until I reach {/body} but that is a bad way to do it.

 

Any suggestions? Thanks!

Thank you ! That was very  helpful and fully functional!

 

Now I want to advance that. I'm not very good at the regular expression stuff yet, so thanks for helping.

 

Lets say my file includes...

<html>
<head></head>
<title>Test</title>
<body>
Some content
</body>
</html>

{row}
<tr>
</tr>
{/row}

 

Now I have two functions...

 

function get_template($file) {}

 

Simply loads the template file, and

 

function get_template_tag($tag)
{
      preg_match("@\{$tag\}(.+?)\{/$tag\}@is", $input, $matches);
      return $bodyText = $matches[1] ? $matches[1] : "";
}

 

The get_template_tags works great, but I want the get_template function to only read in everything up to the first tag found! So I would look for EVERYTHING up to a { match. Can you point me in the right direction for this as well?

  • 3 years later...

It sounds to me as though you need PHP XPath!

 

I don't know how familiar you are with XML, but I'll try to cover the basics for you:

 

XML is more of a format than anything else. For instance, Instant Messenger programs such as Google Talk use the JABBER protocol to communicate, which uses XML to communicate. Because of the way tags are nested into each other there is also a parent/child relationship between each entry/tag. An HTML document for instance is an XML document formatted to HTML specifications. So taking HTML aside, an XML document could look like anything you want, example:

 

<?xml version="1.0"?>

<whateveryouwant>

<tag1>
<tag2>Some info</tag2>
<another-tag attribute="xxx">Other information</another-tag>
</tag1>

<tag name="tag2">
<tag2>Some info</tag2>
<another-tag attribute="xxx">Other information</another-tag>
</tag>

</whateveryouwant>

 

You can see how an XML document can be used to transfer nearly any data in an organized manner. Thus you can see that HTML is in fact an XML document, just written to HTML specifications.

 

Now let's get to the heart of the matter:

 

XPath uses expressions to identify specific pieces of information within an XML document. XPath is much simpler than regular expressions too, so you will be able to modify it to meet your needs really easily!

 

 

So here is what I suggest:

If you can, your text files should be in an XML/HTML format. Use PHP to open the file and load the content into a variable.

 

Here is my working example, I tested it too:

 

xpath-test.php

<?php

$filename = "txt-file.txt"; // xml formatted text file...

// open the file and load contents into $string
$fh = fopen($filename, "r") or die("Can't open file");
$string = fread($fh, filesize($filename)); 
fclose($fh);

// Get it ready for XPath
$xml = new SimpleXMLElement($string);

// Specify your XPath query / expression
$result = $xml->xpath('/body');

// Loop through each result XPath has returned
while(list( ,$node) = each($result)) {
    echo '/body: ',$node,"\n";
}


?>

 

 

txt-file.txt

<?xml version="1.0"?>

<body>
This line of information will be pulled because it is in between the body tags!
</body>

 

The script should output the following:

/body: 
This line of information will be pulled because it is in between the body tags!

 

Good luck!

 

Find more information on XPath formatting:

 

My original blog post on this:

http://iluvjohn.com/knowledge-database/computers/general-cross-platform/web-internet/php-mysql-curl/xpath/php-xpath-xml-formatting-722/

 

The W3C XPath Tutorials:

http://www.w3schools.com/xpath/default.asp

 

Wikipedia:

http://en.wikipedia.org/wiki/XPath

 

The W3C on Xpath:

http://www.w3.org/TR/xpath/

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.