Jump to content
Old threads will finally start getting archived ×

tmhai

New Members
  • Posts

    2
  • Joined

  • Last visited

    Never

Profile Information

  • Gender
    Not Telling

tmhai's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. Thank you both for your replies. @DarkWater: I didn't write this code, and I have absoultely no experience with regex expressions but I figured out what u mean by escaping all the slashes so I came up with this: $poster = get_match('/<a name="\/poster" [^>]*><img [^>]* src="(.*)" \/><\/a>/isU',$imdb_content); That now doesn't show any error messages, but it also doesn't return anything either. So now I just need the help with figuring out how to correctly identify what Im looking for with the correct regex expression. If it helps the page I'm parsing is: http://www.imdb.com/title/tt0811080/ @ghostdog74: I wouldn't be sure how to implement your solution with the code I have already. However, thank you for your help.
  2. Hello. I'm trying to extract the image src value. The sample data I'm trying to extract the link from is: The expected output should be ANYTHING in between the quotation marks of the src field. The regex expression will need to search for the <a name="poster" text to extract the correct image link value. This is the code I have so far, which extracts other data from an IMDB page. I'm trying to extract the Poster Image link as well: <?php //url $imdbcode = $_GET['code']; $url = 'http://www.imdb.com/title/'.$imdbcode.'/'; //get the page content $imdb_content = get_data($url); //parse for product name $name = get_match('/<title>(.*)<\/title>/isU',$imdb_content); $director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb_content)); $plot = get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb_content); $release_date = get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb_content); $mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb_content); $run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb_content); //build content $content.= '<h2>Film</h2><p>'.$name.'</p>'; $content.= '<h2>Director</h2><p>'.$director.'</p>'; $content.= '<h2>Plot</h2><p>'.substr($plot,0,strpos($plot,'<a')).'</p>'; $content.= '<h2>Release Date</h2><p>'.substr($release_date,0,strpos($release_date,'<a')).'</p>'; $content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>'; $content.= '<h2>Run Time</h2><p>'.$run_time.'</p>'; $content.= '<h2>Full Details</h2><p><a href="'.$url.'" rel="nofollow">'.$url.'</a></p>'; echo $content; //gets the match content function get_match($regex,$content) { preg_match($regex,$content,$matches); return $matches[1]; } //gets the data from a URL function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout); $data = curl_exec($ch); curl_close($ch); return $data; } ?> Cheers. As a sidenote. I took a stab at it and came up with the following line which I placed under the "//parse for product name" comment: $poster = get_match('/<a name="\/poster" [^>]*><img [^> src="(.*)" /></a>/isU',$imdb_content); as well as added under the first instance of $content: $content.= '<h2>Poster</h2><p>'.$poster.'</p>'; That yeilded the following error: I'm guessing my regex expression isnt up to scratch.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.