Jump to content

Data Screen Scrape


ainoy31

Recommended Posts

I an trying to get the data from an aspx form called __VIEWSTATE but the returned array is empty.

 

Here is my code:

<?
$url="http://simpleinternetsite.com";

$channel = curl_init();
curl_setopt($channel, CURLOPT_URL, $url);
curl_setopt($channel, CURLOPT_FAILONERROR, 1);
curl_setopt($channel, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($channel, CURLOPT_RETURNTRANSFER, 1);

$data=curl_exec($channel);

if($data)
{
            preg_match('/<input id="__VIEWSTATE" name="__VIEWSTATE" type="hidden" value="([^"]*?)">/', $data, $matches); 

print_r($matches);
}

?>

 

Much appreciation on this. AM

Link to comment
Share on other sites

Assuming the input tag has those attributes in that particular order...

 

Example:

$html = <<<END
<div>
<input type="hidden" name="ctl00_ScriptManager1_HiddenField" id="ctl00_ScriptManager1_HiddenField" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTM4NjcyMDM5N2QYAgUyY3RsMDAkTWFpbkNvbnRlbnQkY3RsMDAkY3RsMDAkRmVhdHVyZWRQcm9kdWN0c1ZpZXcPD2RmZAUkY3RsMDAkRGlzY291bnRTaG9wcGVyQmFubmVyJG12QmFubmVyDw9kAgFkYW+MRR7yYW2BPd+5NsA+6H9x/D8=" />
</div>
END;

preg_match('#<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="([^"]+)" />#i', $html, $matches);
echo $matches[1];

 

Output:

/wEPDwUKMTM4NjcyMDM5N2QYAgUyY3RsMDAkTWFpbkNvbnRlbnQkY3RsMDAkY3RsMDAkRmVhdHVyZWRQcm9kdWN0c1ZpZXcPD2RmZAUkY3RsMDAkRGlzY291bnRTaG9wcGVyQmFubmVyJG12QmFubmVyDw9kAgFkYW+MRR7yYW2BPd+5NsA+6H9x/D8=

 

In your code, you didn't have to make the star into a lazy quantifier... and you forgot the [space]/ part after the closing quote at the end of the line (i'm going off the code in the url you mentioned).

Link to comment
Share on other sites

 

RAWR!

 

preg_match('~<input[^>]*((?:id|name)\s?=\s?["\']__VIEWSTATE["\'])?[^>]*value\s?=\s?["\']([^"\']*)["\'](?(1)|[^>]*(?:id|name)\s?=\s?["\']__VIEWSTATE["\'])[^>]*>~i', $html, $matches);

 

Your value will be found in $match[2]

 

Yessiree, unlike your last prom date, this chica won't say no to you!

 

So basically this model offers top of the line fault-tolerance for your content stealing scraping needs.  Basically, if your input tag has an id or name attribute with __VIEWSTATE in it somewhere, it will get that value. 

 

- Only id? MATCHED.

- Only name? MATCHED.

- Both? MATCHED. 

- Before the value? MATCHED.

- After the value? MATCHED. 

- spaces in-between equal signs? MATCHED. 

- Single quotes used? MATCHED. 

- Double quotes used? MATCHED. 

- Case-insensitive? MATCHED. 

 

 

Link to comment
Share on other sites

  • 2 weeks later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.