Jump to content

Archived

This topic is now archived and is closed to further replies.

nick1

Retrieve data that's between tags

Recommended Posts

Greetings,

To start, lets say you have a script that returns a variable which contains a random string that is never
the same length and never contains the same information.  It could look something like this:

[code]$results = <td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>[/code]

My question is this:
How do I retrieve only what is between the <td></td> tags?

For example:
[code]<td class="name1">some data1</td>
$a = some data1[/code]

I only want the information that is between the <td></td> tags, nothing else.
I would probably want to place each piece of data between the <td></td> tags into an array
so that I could later take each key => value and write value to a database, into it's proper column.

Many thanks in advance,

*Nick*

Share this post


Link to post
Share on other sites
This is something I have been fighting with for a few months, data harvest, I have a friend who creates bots for a living, and he got me into it, I am trying to figure out how to get certain tags, that have specific id's or attributes.  Or standard tags, the only way he said, is to learn regular expressions, once you learn how to find and retrieve data from regular expressions, all of that will come natural, you have ot learn how to find data, and how to replace data with regular expressions, and you could right up something to check for that on a page.

Share this post


Link to post
Share on other sites
I am no specialist on this, but I think you need a regular expression to accomplish this. There must be some specialists watching this forum.

Ronald   8)

Share this post


Link to post
Share on other sites
Yes, you will need regular expressions.

[url=http://www.regular-expressions.info/]http://www.regular-expressions.info/[/url]

Share this post


Link to post
Share on other sites
Why couldn't you just use something like this...

$pieces = explode("</td>", $results);

[code]
<?php
//function to get a substring between between two other substrings

function substring_between($haystack,$start,$end) {
  if (strpos($haystack,$start) === false || strpos($haystack,$end) === false) {
      return false;
  } else {
      $start_position = strpos($haystack,$start)+strlen($start);
      $end_position = strpos($haystack,$end);
      return substr($haystack,$start_position,$end_position-$start_position);
  }
}

//use of this function to get the title of an html document

$handle = fopen($filename, 'r');
$contents = fread($handle, filesize($filename));
fclose($handle);

$contents = htmlspecialchars($contents);
$title = substring_between($contents,'<title>','</title>');

?>
[/code]

now of course that code would need to be altered a little bit but that is the basic jist of what you are trying to accomplish..

Share this post


Link to post
Share on other sites
Just found a snippet regex.com for you:
[code]$m=array();
$pattern = "</?(\w+)(\s+\w+=(\w+|\"[^\"]*\"|\'[^\']*\'))*>";
$text = "<td xxxxx>ABCD</td>";
preg_match_all($pattern, $text, $m);
echo '<pre>'; print_r($m);[/code]

Prints:
[code]Array
(
    [0] => Array
        (
            [0] => td
            [1] => xxxxx
            [2] => ABCD
            [3] => /td
        )
[/code]

Ronald  :cool:

Share this post


Link to post
Share on other sites
[code]
<pre>
<?php
$results = <<<STR
<td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>
STR;

preg_match_all('%<td.*?>(.+?)</td>%', $results, $matches);
array_shift($matches);
print_r($matches);
?>
</pre>
[/code]

Share this post


Link to post
Share on other sites
The snippet I posted was also to be used for other tags then the <td></td> ones.

Ronald  8)

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.