Perad Posted March 5, 2009 Share Posted March 5, 2009 Please... someone... there has got to be a quick way to do this. I just spent 20 minutes doing this and made absolutely NO progress what-so-ever. Programming solution is therefore required. I have attached the html page. Look at this. <td colspan="1" rowspan="1" width="50%" align="" valign="top"><br><b><font size="4">Angola 1972 100 Escudos UNC Bank Note Money<br><br>Price= US$9.79</font></b><br> </td><td colspan="1" rowspan="2" width="50%" align="center" valign="top"><table summary="image table" cellpadding="0" cellspacing="0" border="0" width="174" style=""><tr> <td align="center" valign="top"> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td align="center"><img width="149" src="images/149_Angola_4.jpg" alt="" style="" border="0" height="140"></td> </tr> <tr> <td align="center"></td> </tr> </table> </td> </tr></table></td> </tr> <tr align="" valign=""> <td colspan="1" rowspan="1" width="50%" align="" valign="top"><FORM action=https://www.paypal.com/cgi-bin/webscr method=post target=paypal><INPUT type=image alt="Make payments with PayPal - it's fast, free and secure!" src="https://www.paypal.com//en_US/i/btn/sc-but-01.gif" border=0 name=submit> <IMG height=1 alt="" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width=1 border=0> <INPUT type=hidden value=1 name=add> <INPUT type=hidden value=_cart name=cmd> <INPUT type=hidden [email protected] name=business> <INPUT type=hidden value="Angola 1972 100 Escudos UNC Bank Note Money" name=item_name P55c? UNC 0?> <INPUT type=hidden value=9.79 name=amount> <INPUT type=hidden value=2 name=no_shipping> <INPUT type=hidden value=1 name=no_note> <INPUT type=hidden value=USD name=currency_code> <INPUT type=hidden value=PP-ShopCartBF name=bn> </FORM><br> </td> I need to extract the following data. Angola 1972 100 Escudos UNC Bank Note Money 9.79 <FORM action=https://www.paypal.com/cgi-bin/webscr method=post target=paypal><INPUT type=image alt="Make payments with PayPal - it's fast, free and secure!" src="https://www.paypal.com//en_US/i/btn/sc-but-01.gif" border=0 name=submit> <IMG height=1 alt="" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width=1 border=0> <INPUT type=hidden value=1 name=add> <INPUT type=hidden value=_cart name=cmd> <INPUT type=hidden [email protected] name=business> <INPUT type=hidden value="Angola 1972 100 Escudos UNC Bank Note Money" name=item_name P55c? UNC 0?> <INPUT type=hidden value=9.79 name=amount> <INPUT type=hidden value=2 name=no_shipping> <INPUT type=hidden value=1 name=no_note> <INPUT type=hidden value=USD name=currency_code> <INPUT type=hidden value=PP-ShopCartBF name=bn> </FORM> 149_Angola_4.jpg As the code varies some what it will be tricky to find this all. Ideally I would like the following to happen. 1) Use reg/ex or some programming voodoo to hack out the above 4 bits of data. 2) Place it in an array. 3) If all 4 cannot be found, add whatever can be found to a fail array for manual insertion. Honestly, I do not think this is as hard is it looks. The hardest bit seems to be price especially later on when there is more than one dollar sign. In which case I think the number would have to be added to the fail array for checking. I would greatly appreciate it if someone could show me how to go about doing this. I have to do this to several sites looking for different criteria. I was roped into doing this for a friend of a friend. I am not even being paid [attachment deleted by admin] Link to comment https://forums.phpfreaks.com/topic/148105-tear-info-from-html-page/ Share on other sites More sharing options...
Stephen68 Posted March 5, 2009 Share Posted March 5, 2009 Can you change the HTML page? just add tokens in and use then just grab all the information between them. This is how I use to do this kind of stuff back in my perl days Link to comment https://forums.phpfreaks.com/topic/148105-tear-info-from-html-page/#findComment-777520 Share on other sites More sharing options...
Perad Posted March 5, 2009 Author Share Posted March 5, 2009 Thanks for the reply. I would need to do this over a thousand times. Seems very time consuming. Is there a way to strip out all html elements barring the following. <form></form> and anything in between <img> If there was a way to do this I may have a better chance of making this work. Link to comment https://forums.phpfreaks.com/topic/148105-tear-info-from-html-page/#findComment-777530 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.