Jump to content
Sign in to follow this  
hamidjoukar

Get product information from html source - regex

Recommended Posts

When I read HTML source of below link

http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html

I can find below data about the product:

<script type="text/javascript">
item.stock['ss42356']=[];
DL.item.stock['ss42356']['qty']=56;
DL.item.stock['ss42356']['sku']='SV000837_B';
DL.item.stock['ss42356']['inexistence']=0;
DL.item.stock['ss42356']['down_shelf']=0;
DL.item.stock['ss42356']['procurement_cycle']='8';
DL.item.stock['ss42356']['paid_set']=[];
DL.item.stock['ss42356']['paid_set'].push(35630);
DL.item.color_image['35630']='of7ea7';
DL.item.stock['ss42357']=[];
DL.item.stock['ss42357']['qty']=29;
DL.item.stock['ss42357']['sku']='SV000837_G';
DL.item.stock['ss42357']['inexistence']=0;
DL.item.stock['ss42357']['down_shelf']=0;
DL.item.stock['ss42357']['procurement_cycle']='6';
DL.item.stock['ss42357']['paid_set']=[];
DL.item.stock['ss42357']['paid_set'].push(35631);
DL.item.color_image['35631']='of710e';
DL.item.stock['ss42358']=[];
DL.item.stock['ss42358']['qty']=14;
DL.item.stock['ss42358']['sku']='SV000837_BR';
DL.item.stock['ss42358']['inexistence']=0;
DL.item.stock['ss42358']['down_shelf']=0;
DL.item.stock['ss42358']['procurement_cycle']='17';
DL.item.stock['ss42358']['paid_set']=[];
DL.item.stock['ss42358']['paid_set'].push(35632);
DL.item.color_image['35632']='of77c1';
DL.item.stock['ss42359']=[];
DL.item.stock['ss42359']['qty']=36;
DL.item.stock['ss42359']['sku']='SV000837_O';
DL.item.stock['ss42359']['inexistence']=0;
DL.item.stock['ss42359']['down_shelf']=0;
DL.item.stock['ss42359']['procurement_cycle']='7';
DL.item.stock['ss42359']['paid_set']=[];
DL.item.stock['ss42359']['paid_set'].push(35633);
DL.item.color_image['35633']='of7136';
</script>

I need to know the quantity for each SKU, so I need to produce a simple array containing each SKU name and it's quantity like below

$a = array( 'SV000837_B' => '56',
            'SV000837_G' => '29',
            'SV000837_BR' => '14',
            'SV000837_O'  => '36',

          );
Please help me write a PHP code using regex and any other method to provide above array.
 

Share this post


Link to post
Share on other sites

Try

<?php

// webpage you are scraping the javascript code from
$page_url = 'http://www.dresslink.com/women-candy-color-handbag-leather-cross-body-shoulder-bag-bucket-bag-p-10908.html';

// load the webpage into DOMDocument
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($page_url);

// use XPath to return the second <script> element inside the <div class="dd1"> element
// this is where the javascript code containing the stock array is in the webpage
$xpath = new DOMXPath($doc);
$result = $xpath->query('//div[@class="dd1"]/script[2]');

// retrieve the node element value 
$JS_stock_array_code = $result[0]->nodeValue;

// use regex to find the qty and sku values
preg_match_all("~\[('[\w\d]+')\]\['qty'\]=(\d+);.+\[\\1\]\['sku'\]='([\w\d]+)'~", $JS_stock_array_code, $matches);

// loop through the results and define sku array
// the sku is used as the array key
// the quantity is the assigned to the sku
$skus = array();
foreach($matches[3] as $key => $sku)
{
    $qty = $matches[2][$key];
    $skus[$sku] = $qty;
}

// output $sku array
printf('<pre>%s</pre>', print_r($skus, 1));

Output for me is

Array
(
    [SV000837_B] => 49
    [SV000837_G] => 26
    [SV000837_BR] => 11
    [SV000837_O] => 35
)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.