Jump to content


Photo

Retrieve data that's between tags


  • Please log in to reply
7 replies to this topic

#1 nick1

nick1
  • Members
  • PipPipPip
  • Advanced Member
  • 41 posts

Posted 05 September 2006 - 09:27 PM

Greetings,

To start, lets say you have a script that returns a variable which contains a random string that is never
the same length and never contains the same information.  It could look something like this:

$results = <td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>

My question is this:
How do I retrieve only what is between the <td></td> tags?

For example:
<td class="name1">some data1</td>
$a = some data1

I only want the information that is between the <td></td> tags, nothing else.
I would probably want to place each piece of data between the <td></td> tags into an array
so that I could later take each key => value and write value to a database, into it's proper column.

Many thanks in advance,

*Nick*

#2 Ninjakreborn

Ninjakreborn
  • Members
  • PipPipPip
  • Information Technology Specialist
  • 3,922 posts
  • Age:33

Posted 05 September 2006 - 09:30 PM

This is something I have been fighting with for a few months, data harvest, I have a friend who creates bots for a living, and he got me into it, I am trying to figure out how to get certain tags, that have specific id's or attributes.  Or standard tags, the only way he said, is to learn regular expressions, once you learn how to find and retrieve data from regular expressions, all of that will come natural, you have ot learn how to find data, and how to replace data with regular expressions, and you could right up something to check for that on a page.

------

Business Website: http://www.infotechnologist.biz

Personal Website: http://www.joyelpuryear.com

Blog Site: http://www.realmofwriting.com
Services: Web development, application development, mobile development, and custom development. All services listed on my website.


#3 ronverdonk

ronverdonk
  • Members
  • PipPipPip
  • Advanced Member
  • 277 posts
  • LocationNetherlands

Posted 05 September 2006 - 09:30 PM

I am no specialist on this, but I think you need a regular expression to accomplish this. There must be some specialists watching this forum.

Ronald   8)
RTFM is an almost extinct art form, it should be subsidized.

#4 roopurt18

roopurt18
  • Staff Alumni
  • Advanced Member
  • 3,749 posts
  • LocationCalifornia, southern

Posted 05 September 2006 - 09:33 PM

Yes, you will need regular expressions.

http://www.regular-expressions.info/
PHP Forms : Part I | Part II

JavaScript: Singleton

http://www.rbredlau.com

#5 radar

radar
  • Members
  • PipPipPip
  • Advanced Member
  • 645 posts
  • LocationSLC

Posted 05 September 2006 - 09:39 PM

Why couldn't you just use something like this...

$pieces = explode("</td>", $results);

<?php
//function to get a substring between between two other substrings

function substring_between($haystack,$start,$end) {
   if (strpos($haystack,$start) === false || strpos($haystack,$end) === false) {
       return false;
   } else {
       $start_position = strpos($haystack,$start)+strlen($start);
       $end_position = strpos($haystack,$end);
       return substr($haystack,$start_position,$end_position-$start_position);
   }
}

//use of this function to get the title of an html document

$handle = fopen($filename, 'r');
$contents = fread($handle, filesize($filename));
fclose($handle);

$contents = htmlspecialchars($contents);
$title = substring_between($contents,'<title>','</title>');

?> 

now of course that code would need to be altered a little bit but that is the basic jist of what you are trying to accomplish..

#6 ronverdonk

ronverdonk
  • Members
  • PipPipPip
  • Advanced Member
  • 277 posts
  • LocationNetherlands

Posted 05 September 2006 - 09:46 PM

Just found a snippet regex.com for you:
$m=array();
$pattern = "</?(\w+)(\s+\w+=(\w+|\"[^\"]*\"|\'[^\']*\'))*>";
$text = "<td xxxxx>ABCD</td>";
preg_match_all($pattern, $text, $m);
echo '<pre>'; print_r($m);

Prints:
Array
(
    [0] => Array
        (
            [0] => td
            [1] => xxxxx
            [2] => ABCD
            [3] => /td
        )

Ronald  :cool:
RTFM is an almost extinct art form, it should be subsidized.

#7 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 September 2006 - 09:52 PM

<pre>
<?php
	$results = <<<STR
	<td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>
STR;

	preg_match_all('%<td.*?>(.+?)</td>%', $results, $matches);
	array_shift($matches);
	print_r($matches);
?>
</pre>

Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#8 ronverdonk

ronverdonk
  • Members
  • PipPipPip
  • Advanced Member
  • 277 posts
  • LocationNetherlands

Posted 05 September 2006 - 10:48 PM

The snippet I posted was also to be used for other tags then the <td></td> ones.

Ronald  8)
RTFM is an almost extinct art form, it should be subsidized.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users