Jump to content

Get product information from html source - regex


hamidjoukar

Recommended Posts

When I read HTML source of below link

http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html

I can find below data about the product:

<script type="text/javascript">
item.stock['ss42356']=[];
DL.item.stock['ss42356']['qty']=56;
DL.item.stock['ss42356']['sku']='SV000837_B';
DL.item.stock['ss42356']['inexistence']=0;
DL.item.stock['ss42356']['down_shelf']=0;
DL.item.stock['ss42356']['procurement_cycle']='8';
DL.item.stock['ss42356']['paid_set']=[];
DL.item.stock['ss42356']['paid_set'].push(35630);
DL.item.color_image['35630']='of7ea7';
DL.item.stock['ss42357']=[];
DL.item.stock['ss42357']['qty']=29;
DL.item.stock['ss42357']['sku']='SV000837_G';
DL.item.stock['ss42357']['inexistence']=0;
DL.item.stock['ss42357']['down_shelf']=0;
DL.item.stock['ss42357']['procurement_cycle']='6';
DL.item.stock['ss42357']['paid_set']=[];
DL.item.stock['ss42357']['paid_set'].push(35631);
DL.item.color_image['35631']='of710e';
DL.item.stock['ss42358']=[];
DL.item.stock['ss42358']['qty']=14;
DL.item.stock['ss42358']['sku']='SV000837_BR';
DL.item.stock['ss42358']['inexistence']=0;
DL.item.stock['ss42358']['down_shelf']=0;
DL.item.stock['ss42358']['procurement_cycle']='17';
DL.item.stock['ss42358']['paid_set']=[];
DL.item.stock['ss42358']['paid_set'].push(35632);
DL.item.color_image['35632']='of77c1';
DL.item.stock['ss42359']=[];
DL.item.stock['ss42359']['qty']=36;
DL.item.stock['ss42359']['sku']='SV000837_O';
DL.item.stock['ss42359']['inexistence']=0;
DL.item.stock['ss42359']['down_shelf']=0;
DL.item.stock['ss42359']['procurement_cycle']='7';
DL.item.stock['ss42359']['paid_set']=[];
DL.item.stock['ss42359']['paid_set'].push(35633);
DL.item.color_image['35633']='of7136';
</script>

I need to know the quantity for each SKU, so I need to produce a simple array containing each SKU name and it's quantity like below

$a = array( 'SV000837_B' => '56',
            'SV000837_G' => '29',
            'SV000837_BR' => '14',
            'SV000837_O'  => '36',

          );
Please help me write a PHP code using regex and any other method to provide above array.
 
Link to comment
Share on other sites

Try

<?php

// webpage you are scraping the javascript code from
$page_url = 'http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html';

// load the webpage into DOMDocument
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($page_url);

// use XPath to return the second <script> element inside the <div class="dd1"> element
// this is where the javascript code containing the stock array is in the webpage
$xpath = new DOMXPath($doc);
$result = $xpath->query('//div[@class="dd1"]/script[2]');

// retrieve the node element value 
$JS_stock_array_code = $result[0]->nodeValue;

// use regex to find the qty and sku values
preg_match_all("~\[('[\w\d]+')\]\['qty'\]=(\d+);.+\[\\1\]\['sku'\]='([\w\d]+)'~", $JS_stock_array_code, $matches);

// loop through the results and define sku array
// the sku is used as the array key
// the quantity is the assigned to the sku
$skus = array();
foreach($matches[3] as $key => $sku)
{
    $qty = $matches[2][$key];
    $skus[$sku] = $qty;
}

// output $sku array
printf('<pre>%s</pre>', print_r($skus, 1));

Output for me is

Array
(
    [SV000837_B] => 49
    [SV000837_G] => 26
    [SV000837_BR] => 11
    [SV000837_O] => 35
)
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.