Jump to content

Scraping images from javascript object


dadamssg

Recommended Posts

I'm trying to scrape images from the mark-up of certain webpages. These webpages all have a slideshow. Their sources are contained in javascript objects on the page. I'm thinking i need to get_file_contents("http://www.example.com/page/1"); and then have a preg_match_all() function that i can input a phrase(ie. '"LargeUrl": ', or '"Description":') and get whatever's in the quotes directly after those instances.

 

var photos = {}; 
photos['photo-391094'] = {"LargeUrl": "http://www.example.org/images/1.png","Description":"blah blah balh"};
photos['photo-391095'] = {"LargeUrl": "http://www.example.org/images/2.png","Description":"blah blah balh"};
photos['photo-391096'] = {"LargeUrl": "http://www.example.org/images/3.png","Description":"blah blah balh"};

 

I have this function, but it returns the entire line after the input phrase. How can i modify it to look for whatever's in quotes directly after the input keyword? Or am i doing it all wrong and theres an easier way?

 

$page = file_get_contents("http://www.example.org/page/1");
$word = "\"LargeUrl\":";

if(preg_match_all("/(?<=$word)\S+/i", $page, $matches))
{
echo "<pre>";
print_r($matches);
echo "</pre>";
} 

Link to comment
https://forums.phpfreaks.com/topic/265867-scraping-images-from-javascript-object/
Share on other sites

"LargeUrl": "([^"]+)

 

Match the characters <"LargeUrl": "> literally <"LargeUrl": ">

Match the regular expression below and capture its match into backreference number 1 <([^"]+)>

  Match any character that is NOT a <"> <[^"]+>

      Between one and unlimited times, as many times as possible, giving back as needed (greedy) <+>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.