tmhai Posted October 29, 2008 Share Posted October 29, 2008 Hello. I'm trying to extract the image src value. The sample data I'm trying to extract the link from is: <a name="poster" href="/rg/action-box-title/primary-photo/media/rm118396416/tt0811080" title="Speed Racer"><img border="0" alt="Speed Racer" title="Speed Racer" src="http://ia.media-imdb.com/images/M/MV5BMTA5MjgxMDE4OTVeQTJeQWpwZ15BbWU3MDgyNjc4NjE@._V1._SX94_SY140_.jpg" /></a> The expected output should be ANYTHING in between the quotation marks of the src field. The regex expression will need to search for the <a name="poster" text to extract the correct image link value. This is the code I have so far, which extracts other data from an IMDB page. I'm trying to extract the Poster Image link as well: <?php //url $imdbcode = $_GET['code']; $url = 'http://www.imdb.com/title/'.$imdbcode.'/'; //get the page content $imdb_content = get_data($url); //parse for product name $name = get_match('/<title>(.*)<\/title>/isU',$imdb_content); $director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb_content)); $plot = get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb_content); $release_date = get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb_content); $mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb_content); $run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb_content); //build content $content.= '<h2>Film</h2><p>'.$name.'</p>'; $content.= '<h2>Director</h2><p>'.$director.'</p>'; $content.= '<h2>Plot</h2><p>'.substr($plot,0,strpos($plot,'<a')).'</p>'; $content.= '<h2>Release Date</h2><p>'.substr($release_date,0,strpos($release_date,'<a')).'</p>'; $content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>'; $content.= '<h2>Run Time</h2><p>'.$run_time.'</p>'; $content.= '<h2>Full Details</h2><p><a href="'.$url.'" rel="nofollow">'.$url.'</a></p>'; echo $content; //gets the match content function get_match($regex,$content) { preg_match($regex,$content,$matches); return $matches[1]; } //gets the data from a URL function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout); $data = curl_exec($ch); curl_close($ch); return $data; } ?> Cheers. As a sidenote. I took a stab at it and came up with the following line which I placed under the "//parse for product name" comment: $poster = get_match('/<a name="\/poster" [^>]*><img [^> src="(.*)" /></a>/isU',$imdb_content); as well as added under the first instance of $content: $content.= '<h2>Poster</h2><p>'.$poster.'</p>'; That yeilded the following error: Warning: preg_match() [function.preg-match]: Unknown modifier '>' in /home/jurud/public_html/imdb.php on line 34 I'm guessing my regex expression isnt up to scratch. Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/ Share on other sites More sharing options...
DarkWater Posted October 29, 2008 Share Posted October 29, 2008 Since you used / as your delimiter, you need to escape all of the / in the regex, or use a different delimiter. Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-677325 Share on other sites More sharing options...
ghostdog74 Posted October 29, 2008 Share Posted October 29, 2008 Hello. I'm trying to extract the image src value. when parsing XML/HTML, its better to use dedicated classes/methods (if you have them) than constructing regex from scratch. $string = '<a name="poster" href="/rg/action-box-title/primary-photo/media/rm118396416/tt0811080" title="Speed Racer"><img border="0" alt="Speed Racer" title="Speed Racer" src="http://ia.media-imdb.com/images/M/MV5BMTA5MjgxMDE4OTVeQTJeQWpwZ15BbWU3MDgyNjc4NjE@._V1._SX94_SY140_.jpg" /></a>'; if ( ($start = strpos($string,'<a name="poster"' ) ) !==FALSE ) { $xml = new SimpleXMLElement($string); echo $xml->img['src']; } Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-677375 Share on other sites More sharing options...
tmhai Posted October 29, 2008 Author Share Posted October 29, 2008 Thank you both for your replies. @DarkWater: I didn't write this code, and I have absoultely no experience with regex expressions but I figured out what u mean by escaping all the slashes so I came up with this: $poster = get_match('/<a name="\/poster" [^>]*><img [^>]* src="(.*)" \/><\/a>/isU',$imdb_content); That now doesn't show any error messages, but it also doesn't return anything either. So now I just need the help with figuring out how to correctly identify what Im looking for with the correct regex expression. If it helps the page I'm parsing is: http://www.imdb.com/title/tt0811080/ @ghostdog74: I wouldn't be sure how to implement your solution with the code I have already. However, thank you for your help. Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-677392 Share on other sites More sharing options...
nrg_alpha Posted October 29, 2008 Share Posted October 29, 2008 Well, here's how I would fetch anything within quotes (double or single at that) in an src: $str = '<a name="poster" href="/rg/action-box-title/primary-photo/media/rm118396416/tt0811080" title="Speed Racer"><img border="0" alt="Speed Racer" title="Speed Racer" src="http://ia.media-imdb.com/images/M/MV5BMTA5MjgxMDE4OTVeQTJeQWpwZ15BbWU3MDgyNjc4NjE@._V1._SX94_SY140_.jpg" /></a>'; preg_match('#src=["\']([^"\']+)["\']#', $str, $match); echo $match[1]; Output: http://ia.media-imdb.com/images/M/MV5BMTA5MjgxMDE4OTVeQTJeQWpwZ15BbWU3MDgyNjc4NjE@._V1._SX94_SY140_.jpg Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-677506 Share on other sites More sharing options...
jojo2a2a Posted February 8, 2013 Share Posted February 8, 2013 possible to update for this code <td rowspan="2" id="img_primary"> <div class="image"> <a href="/media/rm2761862144/tt0978762?ref_=tt_ov_i" > <img height="317" width="214" alt="Mary et Max. (2009) Poster" title="Mary et Max. (2009)" src="http://ia.media-imdb.com/images/M/MV5BMTQ1NDIyNTA1Nl5BMl5BanBnXkFtZTcwMjc2Njk3OA@@._V1_SY317_CR4,0,214,317_.jpg" itemprop="image" /> </a> </div> </td> actually i this pregmatch preg_match('#<td rowspan="2" id="img_primary">[^"]+<div class="image"><a.*" ><img * src="(.*)" .*><\\/a><\\/div><\\/td>#isU', $text, $photo ); this not work very thanks Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-1411013 Share on other sites More sharing options...
Zane Posted February 8, 2013 Share Posted February 8, 2013 you are MUCH MUCH better off using PHP's DOMDocument class. Scraping information like that is sooo much easier and it is also easier to fix whenever that site changes something in their code. Google DOMDocument and you should receive everything you need. Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-1411069 Share on other sites More sharing options...
jojo2a2a Posted February 14, 2013 Share Posted February 14, 2013 you are MUCH MUCH better off using PHP's DOMDocument class. Scraping information like that is sooo much easier and it is also easier to fix whenever that site changes something in their code. Google DOMDocument and you should receive everything you need. yes i recoded old scrypt after , Actually not issue for pregmatch ? Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-1412429 Share on other sites More sharing options...
Christian F. Posted February 14, 2013 Share Posted February 14, 2013 One line might be doable with Regular Expressions, but with the amount of attributes and tags you're trying to match RegExps are not going to be adequate. So, no; This is not an issue for which you want to use preg_match (). HTML is a markup language, not a regular language, after all. Quote Link to comment https://forums.phpfreaks.com/topic/130538-extracting-an-image-src-using-a-hyperlink-as-an-anchor/#findComment-1412440 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.