Jump to content

Help extracting something.


russelburgraymond

Recommended Posts

I have a page that the source code looks similar to this.

 

<div class="middle">				
	<div id="displayimage">
		<a href="http://aserver.com/347398r"><img src="http://aserver.com/images/no_pic.gif" alt="" /></a>
	</div>

 

Now of course this is within a page that is actually around 110 kb and is crammed with image liks, javascript, etc.  What I want to do is load the page remotely and extract that image link. 

 

I tried fopen and several other but if the file does not exist they throw an error.  I then tried preg_match_quote to extract this info but that did not work.  while the div is always the same that image will change every page.

 

What this is for is a script where someone can add their myspace ID and it will get their profile image and show it on my page for them.  Any help would be greatly appreciated.

Link to comment
https://forums.phpfreaks.com/topic/161430-help-extracting-something/
Share on other sites

Basicall I want to echo a variable and have

http://aserver.com/images/no_pic.gif

display.  Of couse that image will be different.

 

if I can just get this

<a href="http://aserver.com/347398r"><img src="http://aserver.com/images/no_pic.gif" alt="" /></a>

 

to display I will be happy.

Regular expressions:

 

<?php
//ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10');
//$url = 'http://example.com/';
//$html = file_get_contents($url);
$html = '   <div class="middle">            
      <div id="displayimage">
         <a href="http://aserver.com/347398r"><img src="http://aserver.com/images/no_pic.gif" alt="" /></a>
      </div>';
preg_match('~<div id="displayimage">\s*<a[^>]+><img src="([^"]+)~i', $html, $matches);
$link = $matches[1];
?>

Just uncomment the first three lines, insert the real URL and remove the other $html. Then you should be good.

 

Didn't work. This is what I got.

 

function add($a)
{
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10');
$html = file_get_contents($a);
preg_match('~<div id="displayimage">\s*<a[^>]+><img src="([^"]+)~i', $html, $matches);
$link = $matches[1];
echo "$link";
die();
}

All it returns is an empty page.

preg_match('~<div id="displayimage">.*?<img src="([^"]*)~is',$html,$matches);

 

If that doesn't work, try:

 

preg_match('~<div[^>]*id\s?=\s?["\']displayimage["\'][^>]*>.*?<img[^>]*src\s?=\s?["\']([^"\']*)~is',$html,$matches);

 

2nd is not as efficient but gives a bit of breathing room for variation of coding.

  • 2 months later...


preg_match('~<div[^>]*id\s?=\s?["\']displayimage["\'][^>]*>.*?<img[^>]*src\s?=\s?["\']([^"\']*)~is',$html,$matches);

 

Can you guys point to where I can find the breakdown of this?  For instance  What does [^'] mean?  etc?  Can't seem to find anything about it in the php manual.

[pre]

 

~<div[^>]*id\s?=\s?["\']displayimage["\'][^>]*>.*?<img[^>]*src\s?=\s?["\']([^"\']*)~is

 

~              start of pattern delimiter

<div          literal match

[^>]*          match 0 or more of anything that is not a >

id            literal match

\s?            match 0 or 1 space or tab

=              literal match

\s?            match 0 or 1 space or tab

["\']          match a single or double quote (single quote escaped since it is used to wrap the pattern)

displayimage  literal match

["\']          match a single or double quote (single quote escaped since it is used to wrap the pattern)

[^>]*          match 0 or more of anything that is not a >

>              literal match

.*?            non-greedy match of 0 or more of anything

<img          literal match

[^>]*          match 0 or more of anything that is not a >

src            literal match

\s?            match 0 or 1 space or tab

=              literal match

\s?            match 0 or 1 space or tab

["\']          match a single or double quote (single quote escaped since it is used to wrap the pattern)

(              start of a group/match capture

[^"\']*        match 0 or more of anything that is not a single or double quote

)              end of a group/match capture

~              end of pattern delimiter

i              modifier to make matching case-insensitive

s              modifier to make quantifiers ignore newline character while matching

[/pre]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.