adamjblakey Posted March 5, 2008 Share Posted March 5, 2008 Hi, How would i go about doing the following. I have a URL like this e.g. www.website.co.uk/orgs-details.asp?OrgsID= On this page there is Company Name, Contact:, Tel, E-mail: and Web site: I want to extract these details and add them into a table. How would i do this? This is how the data is shown if this helps? <table width="385" border="0" cellspacing="0" cellpadding="0"> <tr> <td valign="top"><h1>Company Name</h1> <table width="100%" border="0" cellspacing="0" cellpadding="3"> <tr> <th width="30%"><strong>Contact:</strong></th> <td width="70%"><strong>Persons Name </strong> </td> </tr> <tr> <th><strong>Tel:</strong></th> <td><strong>000 000 000</strong></td> </tr> <tr> <td> </td> <td><span class="small">Information Here</span>.</td> </tr> <tr> <th><strong>E-mail:</strong></th> <td><strong><a href="info@website.com">info@website.com</a></strong></td> </tr> <tr> <th><strong>Web site:</strong></th> <td><strong><a href="http://www.website.com" target="_blank" id="451" onClick="return trackclick(this.id);" title="Visit Site">www.website.com</a></strong></td> </tr> </table> Could something like this be adapted to work that i have used in the past to extract email addresses: for($i=1;$i<$max_val;$i++) { $content = file_get_contents('http://www.website.com/slist.php?item='.$i); preg_match_all($email_match_regex, $content, $matches); if(count($matches[0])) { foreach($matches[1] as $index => $value) { $insert_id = mysql_query('INSERT INTO.....'); } } } Cheers, Adam Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/ Share on other sites More sharing options...
cooldude832 Posted March 5, 2008 Share Posted March 5, 2008 regex or selective tag stripping is the best way to do it Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484246 Share on other sites More sharing options...
adamjblakey Posted March 5, 2008 Author Share Posted March 5, 2008 Thank you, Have you got any sites or code i can look at to get an idea as i have never used these before. Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484248 Share on other sites More sharing options...
cooldude832 Posted March 5, 2008 Share Posted March 5, 2008 from the looks of it try striping all the tags bu the <tr> tag then explode at the <tr> tag and see if you can find a structure in the array Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484262 Share on other sites More sharing options...
adamjblakey Posted March 5, 2008 Author Share Posted March 5, 2008 How would i go about only taking in that block of data as the data i supplied is part of a full page. Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484265 Share on other sites More sharing options...
cooldude832 Posted March 5, 2008 Share Posted March 5, 2008 the page is probably dynamic so you can't do it perfectly this is the general <?php $file = "http://www.website.co.uk/orgs-details.asp?OrgsID="; $data = file_get_contents($file): $data = strip_tags($data,"<tr>"); $data =explode("<tr",$data); print_r($data); ?> Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484268 Share on other sites More sharing options...
adamjblakey Posted March 5, 2008 Author Share Posted March 5, 2008 Thank you for that i will have a go at trying to extract the data and if i have any problems i will post back. Thanks for your help so far. Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484271 Share on other sites More sharing options...
sasa Posted March 5, 2008 Share Posted March 5, 2008 try <?php $a = '<table width="385" border="0" cellspacing="0" cellpadding="0"> <tr> <td valign="top"><h1>Company Name</h1> <table width="100%" border="0" cellspacing="0" cellpadding="3"> <tr> <th width="30%"><strong>Contact:</strong></th> <td width="70%"><strong>Persons Name </strong> </td> </tr> <tr> <th><strong>Tel:</strong></th> <td><strong>000 000 000</strong></td> </tr> <tr> <td> </td> <td><span class="small">Information Here</span>.</td> </tr> <tr> <th><strong>E-mail:</strong></th> <td><strong><a href="info@website.com">info@website.com</a></strong></td> </tr> <tr> <th><strong>Web site:</strong></th> <td><strong><a href="http://www.website.com" target="_blank" id="451" onClick="return trackclick(this.id);" title="Visit Site">www.website.com</a></strong></td> </tr> </table>'; preg_match_all('/E-mail[^"]+"([^@]+@[^"]+)"/',$a,$b); print_r($b); ?> Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484397 Share on other sites More sharing options...
adamjblakey Posted March 6, 2008 Author Share Posted March 6, 2008 Thanks for that, but the only problem with this is that is just one table from the page so i would have to strip down everything to leave just that data. I have come up with the following to do what i need. Can anyone see anything i need to do to it? <?php for($i=1;$i<3000;$i++) { $data = file_get_contents('http://www.website.com?id='.$i); $data = strip_tags($data,"<tr>"); $data = explode("<tr>",$data); foreach($data as $index => $value) { mysql_query("INSERT INTO `table` (entry, entry2, entry3, entry4, entry5) VALUES ('" . $data[12] . "', '" . $data[13] . "', '" . $data[14] . "', '" . $data[15] . "', '" . $data[16] . "')"); } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484698 Share on other sites More sharing options...
cooldude832 Posted March 6, 2008 Share Posted March 6, 2008 well your foreach loop makes no sense, but it looks okay Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484814 Share on other sites More sharing options...
adamjblakey Posted March 6, 2008 Author Share Posted March 6, 2008 Would foreach($data as $index) work? Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484829 Share on other sites More sharing options...
cooldude832 Posted March 6, 2008 Share Posted March 6, 2008 did u print_r($data) after exploding it to see how it looks? Paste it in here so I can take a look at it or send me a link to it Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484943 Share on other sites More sharing options...
adamjblakey Posted March 6, 2008 Author Share Posted March 6, 2008 Please see here for an example: www(dot)wishscripts(dot)com/test/test.php Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-484968 Share on other sites More sharing options...
cooldude832 Posted March 6, 2008 Share Posted March 6, 2008 what parts of the array do you want? Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-485079 Share on other sites More sharing options...
sasa Posted March 6, 2008 Share Posted March 6, 2008 try <?php $a = file_get_contents('http://www.uk-disco.co.uk/org-details.asp?OrgID=503'); preg_match_all('/mailto:([^\?]+)\?/',$a,$b); foreach ($b[1] as $email){ echo $email=html_entity_decode($email); } ?> Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-485369 Share on other sites More sharing options...
adamjblakey Posted March 6, 2008 Author Share Posted March 6, 2008 The bits i need are: 12, 13, 14, 16 and 17 Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-485438 Share on other sites More sharing options...
adamjblakey Posted March 7, 2008 Author Share Posted March 7, 2008 Any ideas on this one? Quote Link to comment https://forums.phpfreaks.com/topic/94568-extracting-data/#findComment-486064 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.