Hello.
I'm trying to extract the image src value. The sample data I'm trying to extract the link from is:
The expected output should be ANYTHING in between the quotation marks of the src field. The regex expression will need to search for the <a name="poster" text to extract the correct image link value.
This is the code I have so far, which extracts other data from an IMDB page. I'm trying to extract the Poster Image link as well:
<?php
//url
$imdbcode = $_GET['code'];
$url = 'http://www.imdb.com/title/'.$imdbcode.'/';
//get the page content
$imdb_content = get_data($url);
//parse for product name
$name = get_match('/<title>(.*)<\/title>/isU',$imdb_content);
$director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb_content));
$plot = get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb_content);
$release_date = get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb_content);
$mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb_content);
$run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb_content);
//build content
$content.= '<h2>Film</h2><p>'.$name.'</p>';
$content.= '<h2>Director</h2><p>'.$director.'</p>';
$content.= '<h2>Plot</h2><p>'.substr($plot,0,strpos($plot,'<a')).'</p>';
$content.= '<h2>Release Date</h2><p>'.substr($release_date,0,strpos($release_date,'<a')).'</p>';
$content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>';
$content.= '<h2>Run Time</h2><p>'.$run_time.'</p>';
$content.= '<h2>Full Details</h2><p><a href="'.$url.'" rel="nofollow">'.$url.'</a></p>';
echo $content;
//gets the match content
function get_match($regex,$content)
{
preg_match($regex,$content,$matches);
return $matches[1];
}
//gets the data from a URL
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
Cheers.
As a sidenote. I took a stab at it and came up with the following line which I placed under the "//parse for product name" comment:
$poster = get_match('/<a name="\/poster" [^>]*><img [^> src="(.*)" /></a>/isU',$imdb_content);
as well as added under the first instance of $content:
$content.= '<h2>Poster</h2><p>'.$poster.'</p>';
That yeilded the following error:
I'm guessing my regex expression isnt up to scratch.