Jump to content

Pasring with SIMPLEHTMLPARSER


natasha_thomas

Recommended Posts

Folks,

 

I am using SIMPLEHTMLPARSER.

 

I am not able to parse HTML, looks like nothing is showing up when i do

 

var_dump($html->find('div[id=Teaser_Item] img[src]', 0));

 

Actually, what i want to extract is the IMG SRC which is:

 

http://wap.ebay.com/Pages/RbHttpHandler.ashx?width=313&height=592&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C!jEE2n%28iTLozBNwBPG0bUg~~0_1.JPG%3Fset_id%3D8800005007

 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><META HTTP-EQUIV="expires" CONTENT="0"><META HTTP-EQUIV="cache-control" CONTENT="no-cache"><META HTTP-EQUIV="pragma" CONTENT="no-cache"><META name="google-site-verification" content="2AT13qxpDWUTCw-6xXa-Hme6iQ7ds3rYZ5cH5-_K13Y"><META http-equiv="Content-Style-Type" content="text/css"><title>YSLBlack Suede Platform Pumps Size 7 (39) - eBay Mobile (item 160586179890 5/14/2011 8:39:22 AM)</title><link rel="stylesheet" type="text/css" href="/nbinternal/global.css"><style>div.body {margin-left:5px !important;margin-right:5px !important;width:1253px;} div, p, td, span, li {color:#000000;} div.body > div, div.body > table {color:#000000;} hr {color:#000000;} a {color:#0000CC;} td.tabbed-active, a.tabbed-active {border-bottom-color:#FFFFFF;} div, td, form, li, input, select, textarea {font-size:12px;} .medium, .medium *, .medium td * {font-size:12px !important;} .headline *, .medium .headline * {font-size:14px !important;} .large .headline *, .headline .large * {font-size:16px !important;} .large, .large *, .large td * {font-size:14px !important;} .small .headline * {font-size:12px !important;} .small, .small *, .small td * {font-size:10px !important;} </style></head><body style="width:1253px;background-color:#FFFFFF;"><div class="body" style="width:1253px;background-color:#FFFFFF;"> <div style="margin-bottom: 4px;background-color: #ffffff;" id="CommonHeader" class="pageheader mode1"><table class="pageheader" cellspacing="0" cellpadding="0"><tr><td class="logo" style="background-color: #ffffff;"><a href="/Default.aspx?emvAD=1263x592&aid=160586179890&emvcc=0"><img src="RbHttpHandler.ashx?width=1253&height=592&fsize=999000&format=gif&url=%7E%2FImages%2FeBayLogos%2Funscaled___ebay_logo_large.gif" alt="eBay mobile"></a></td></tr></table></div><div id="ebayLine1" class="separator mode1"> <img src="RbHttpHandler.ashx?width=1253&height=592&fsize=999000&format=gif&url=%2Fimages%2FeBayLines%2Funscaled___630.gif" alt="" class="separator "> </div><div id="Status" class="default"> <div style="margin-left: 5px;margin-right: 5px;padding-top: 4px;padding-bottom: 4px;border:none;" id="Teaser_Item" class="teaser mode11"><table cellpadding="0" cellspacing="0" style="width:100%;"><tr><td style="vertical-align:top;padding-right:2px;width:317px;" valign="top"><img src="RbHttpHandler.ashx?width=313&height=592&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C%21jEE2n%28iTLozBNwBPG0bUg%7E%7E0_1.JPG%3Fset_id%3D8800005007" alt=""></td><td class="ttext" style="vertical-align:top;" valign="top"><strong>YSLBlack Suede Platform Pumps Size 7 (39)</strong></td></tr></table></div> <div style="padding-top: 0px;padding-bottom: 0px;border-color: #fae273;border-style: solid;border-width: 1px;border-top:none;border-left:none;border-right:none;background-color: #ffd869;background-image: url(RbHttpHandler.ashx?url=/images/BlockHeader/unscaled___630.gif);background-repeat: no-repeat;background-position: top-left;" id="BgHeader" class="text mode1 small"> <div> </div></div> <div style="padding-top: 4px;padding-bottom: 4px;vertical-align: middle;border-color: #bababa;border-style: solid;border-width: 1px;border-top:none;border-bottom:none;background-color: #f0eff7;text-align: center;" class="buttonmenu mode2"> <span id="ButtonRefresh" class="button-image" style="margin:0px;background-color: #7a7a7a;margin-right: 3px;"><a href="/Pages/ViewItem.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" class="button-inactive" style="color: #ffffff !important;border-color: #606060;border-style: solid;border-width: 1px;border-style: solid;"><strong>Refresh</strong></a></span> </div></div><div id="Content" class="default"> <div style="padding-top: 0px;padding-bottom: 0px;text-align: center;line-height: 1.5em;background-image: url(RbHttpHandler.ashx?url=~/Images/TabbedMenu_BgGradient.jpg);background-repeat: repeat-x;background-position: top-left;" id="MenuA" class="tabbedmenu mode1 small"> <table cellspacing="0" cellpadding="0"> <tr> <td style="color: #000000 !important;background-color: #ffffff;border-color: #bababa;border-style: solid;border-bottom-color: #ffffff;border-style: solid;text-align: center;line-height: 1.5em;" class="tabbed-active"> <a href="/Pages/ViewItem.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="color: #000000 !important;border:none;" id="ButtonMenuItem1" class="tabbed-active"> Summary</a> </td> <td style="color: #00008b !important;border-color: #bababa;border-style: solid;border-style: solid;text-align: center;line-height: 1.5em;" class="tabbed-inactive"> <a href="/Pages/ViewItemPic.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="color: #00008b !important;border:none;" id="ButtonMenuItem2" class="tabbed-inactive"> Picture</a> </td> <td style="color: #00008b !important;border-color: #bababa;border-style: solid;border-style: solid;text-align: center;line-height: 1.5em;" class="tabbed-inactive tabbed-last"> <a href="/Pages/ViewItemDesc.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="color: #00008b !important;border:none;" id="ButtonMenuItem3" class="tabbed-inactive"> Description</a> </td> </tr> </table> </div> <div style="vertical-align: top;border-color: #bababa;border-style: solid;border-width: 1px;border-top:none;border-bottom:none;" class="table mode1"><table cellpadding="0" cellspacing="0" style="vertical-align: top;"><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;font-weight:bold;">Item number:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><span style="color:#000000 !important;font-weight:bold;">160586179890</span></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;font-weight:bold;">Last Bid:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><span style="color:#000000 !important;font-weight:bold;">US $99.00</span></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;font-weight:bold;">Ended:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><strong>5/14/2011 8:39:22 AM</strong></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Bid count:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><span style="color:#000000 !important;">0</span></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">High bidder:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><span style="color:#000000 !important;">-</span></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Quantity:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top"><span style="color:#000000 !important;">1</span></td></tr></table></div> <div style="padding-top: 0px;padding-bottom: 0px;border-color: #bababa;border-style: dotted;border-width: 1px;border-bottom:none;border-left:none;border-right:none;" id="SeparatorLine1" class="text mode1"> <div></div></div> <div style="vertical-align: top;border-color: #bababa;border-style: solid;border-width: 1px;border-top:none;border-bottom:none;" class="table mode1"><table cellpadding="0" cellspacing="0" style="vertical-align: top;"><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Seller:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">namtalae (64)</td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Feedback:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">97.3% Positive</td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Location:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">Istanbul<br>TR</td></tr></table></div> <div style="padding-top: 0px;padding-bottom: 0px;border-color: #bababa;border-style: dotted;border-width: 1px;border-bottom:none;border-left:none;border-right:none;" class="text mode1"> <div></div></div> <div style="padding-top: 4px;vertical-align: top;border-color: #bababa;border-style: solid;border-width: 1px;border-top:none;border-bottom:none;" class="table mode1"><table cellpadding="0" cellspacing="0" style="vertical-align: top;"><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Ships to:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">Worldwide</td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Postal costs:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">US $25.50<br><a href="/Pages/ShippingCosts.aspx?emvAD=1263x592&aid=160586179890&emvcc=0">Additional</a></td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Insurance:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">Optional</td></tr><tr> <td style="padding: 4px;vertical-align: top;text-align: right;width: 45%;" valign="top"><span style="color:#999999 !important;">Payment<br>methods:</span></td> <td style="padding: 4px;vertical-align: top;width: 55%;" valign="top">PayPal</td></tr></table></div> <div style="padding-top: 0px;padding-bottom: 0px;border-color: #bababa;border-style: dotted;border-width: 1px;border-bottom:none;border-left:none;border-right:none;" id="SeparatorLine2" class="text mode1"> <div></div></div> <div style="padding-top: 3px;padding-bottom: 3px;border-color: #bababa;border-style: solid;border-width: 1px;border-bottom:none;background-color: #eaedf7;" id="PayPalInfo" class="text mode1"> <div></div></div> <div style="padding-top: 4px;padding-bottom: 4px;vertical-align: middle;border-color: #bababa;border-style: solid;border-width: 1px;background-color: #f0eff7;text-align: center;" class="buttonmenu mode2"> <span id="ButtonRefresh" class="button-image" style="margin:0px;background-color: #7a7a7a;margin-right: 3px;"><a href="/Pages/ViewItem.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" class="button-inactive" style="color: #ffffff !important;border-color: #606060;border-style: solid;border-width: 1px;border-style: solid;"><strong>Refresh</strong></a></span> </div> <div style="color: #808080 !important;" id="NotFullInfo" class="text mode1"> <div style="color: #808080 !important;"><strong>Note:</strong> To view the full item listing, visit www.ebay.com using a computer before you bid or buy.</div></div> <div style="padding-top: 0px;padding-bottom: 0px;border-color: #bababa;border-style: solid;border-width: 1px;border-bottom:none;border-left:none;border-right:none;" id="SeparatorLineBottom" class="text mode1"> <div></div></div> <div id="TextBreadcrump" class="text mode1"> <div><a href="/Pages/SearchResults.aspx?emvcc=0"><span style="color:#7A7A7A !important;font-weight:bold;">&#x3c;</span> Results</a></div></div></div><div style="margin-top: 4px;" id="EBayLine2" class="separator mode1"> <img src="RbHttpHandler.ashx?width=1253&height=592&fsize=999000&format=gif&url=%2Fimages%2FeBayLines%2Funscaled___630.gif" alt="" class="separator "> </div> <div style="padding-top: 0px;padding-bottom: 0px;" id="MainMenu" class="buttonmenu mode1"> <table width="1253" cellspacing="0" cellpadding="0"> <tr> <td><a href="/Pages/Search.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="border:none;"><img src="RbHttpHandler.ashx?width=417&height=592&fsize=999000&format=gif&url=%2Fimages%2FButtonMenu%2Fen%2Fgif%2F630%2Funscaled___bmenu_highlight_left.gif" alt="Search"></a></td> <td><a href="/Member/MyEbay.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="border:none;"><img src="RbHttpHandler.ashx?width=417&height=592&fsize=999000&format=gif&url=%2Fimages%2FButtonMenu%2Fen%2Fgif%2F630%2Funscaled___bmenu_normal_mid.gif" alt="My eBay"></a></td> <td><a href="/Default.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" style="border:none;"><img src="RbHttpHandler.ashx?width=417&height=592&fsize=999000&format=gif&url=%2Fimages%2FButtonMenu%2Fen%2Fgif%2F630%2Funscaled___bmenu_normal_right.gif" alt="Home"></a></td></tr></table></div> <div style="padding-left: 7px;" id="FooterMenu" class="pipedmenu mode1 small"> <a href="/Pages/About/US.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" id="BAbout" class="piped-inactive">About eBay</a> <span>|</span> <a href="/Pages/UserAgreement/US.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" id="BUA" class="piped-inactive">User Agreement</a> <span>|</span> <a href="/Pages/Help.aspx?emvAD=1263x592&aid=160586179890&emvcc=0" id="BHelp" class="piped-inactive">Help</a> </div> <div id="Text1" class="text mode1 small"> <div>view ebay in<br>Mobile | <a href="http://www.ebay.com/?redirect=mobile"> Classic </a></div></div></div></body></html>

 

can someone help me debugging this, please?

 

Cheers

Natasha Thomas

 

Link to comment
Share on other sites

I can't even get that image with my parser, and I tried to add many exceptions and rules trying to get any type of image.

 

Here is the link and I'll discuss the issues.

 

<img src="RbHttpHandler.ashx?width=313&height=592&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C%21jEE2n%28iTLozBNwBPG0bUg%7E%7E0_1.JPG%3Fset_id%3D8800005007" alt="">

 

Problem one:

This is an internal link, there is no ./ ../ or the host  before the script name, I'm sure can use some type of pattern and start from the scripts name of RbHttpHandler.ashx, but simple parser isn't going to do that.

 

Problem two:

There is no image type extension which simple parser looks for

 

Problem three:

If visit these links, You'll see it runs through a script process.

http://cgi.ebay.com/RbHttpHandler.ashx?width=313&height=592&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24(KGrHqN%2C!jEE2n(iTLozBNwBPG0bUg~~0_1.JPG

 

 

 

And both of these links I can't even connect to see the image.

http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C%21jEE2n%28iTLozBNwBPG0bUg%7E%7E0_1.JPG%3Fset_id%3D8800005007

 

http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C%21jEE2n%28iTLozBNwBPG0bUg%7E%7E0_1.JPG

 

I guess ebay is making every effort for people not to scrape their data and use one of their api's.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.