benjaminbeazy Posted March 22, 2007 Share Posted March 22, 2007 okay, i am trying to data mine through a website for some information for a dental directory i need to extract the info from the following piece of code and i'm not quite sure what the best method is. in a perfect world, per this example, i'd like to be able to extract the info as first name: Frank H. last name: Alley credentials: DDS, FAGD practice name: Shoreline Family Dental Group address: 1121 Ottawa Beach Rd Ste 100 city: Holland state: MI zip: 49424-2528 phone: (616) 399-9520 <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail93230.html" style="font-weight:bold;text-decoration:underline;">Jacqueline Anderson, DDS, FAGD</a><br />Shoreline Family Dental Group<br />1121 Ottawa Beach Rd Ste 100<br />Holland, MI 49424-2528 </td> <td style="text-align:left;padding-top:25px"> <p>(616) 399-9520</p> </td> </tr> not sure whether i'm gonna need ereg or if i can preg_match this or what i need any ideas or suggestions? all help is much appreciated.. thanks guys Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/ Share on other sites More sharing options...
per1os Posted March 22, 2007 Share Posted March 22, 2007 <?php $code = ' <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail93230.html" style="font-weight:bold;text-decoration:underline;"> Jacqueline Anderson, DDS, FAGD</a><br />Shoreline Family Dental Group<br />1121 Ottawa Beach Rd Ste 100<br />Holland, MI 49424-2528 </td> <td style="text-align:left;padding-top:25px"> <p>(616) 399-9520</p> </td> </tr>'; list(,$after) = split('style="font-weight:bold;text-decoration:underline;">', $code); list($before) = split('</td>', $after); list($name, $group, $address, $citystate) = split('<br />', $before); $name = str_replace('</a>', "", $name); list($name, $cred, $cred2) = split(",", $name); $cred = $cred . ", " . $cred2; list($fname, $lname) = split(" ", $name); list($city, $state, $zip) = split(" ", $citystate); print $fname . " " . $lname . " " . $cred . " " . $city . " " . $state . " " . $zip . " " . $address . " " . $group; ?> Should work. Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213068 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 sorry, i gave wrong info, had a different example need to extract as... first name: Jacqueline A. <=not actually in code, but the middle initial sometimes shows up, so need that too last name: Anderson credentials: DDS, FAGD practice name: Shoreline Family Dental Group address: 1121 Ottawa Beach Rd Ste 100 city: Holland state: MI zip: 49424-2528 phone: (616) 399-9520 Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213070 Share on other sites More sharing options...
per1os Posted March 22, 2007 Share Posted March 22, 2007 I am sure you can manipulate my code to adjust. It is pretty straight forward. Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213073 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 thanks a lot, i'll do my best and let you know what i come up with... Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213076 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 okay, one more question, i'm having a retard moment if i want to grab an entire page with multiple records, how do i separate each record to process my extraction on i know what i want it to look for... find "SFdetail" and grab until next occurrence "</tr>" for each of these, run extraction thanks Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213092 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 there's 25 records per page, can i do another list or is there something easier like a loop i can run Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213093 Share on other sites More sharing options...
per1os Posted March 22, 2007 Share Posted March 22, 2007 $sfDetailArr = split("SFdetail", $page); // puts it all into an array. Than foreach through it and run your parsing. Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213094 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 still not there yet, i'm using preg_match to see whether 2 or 3 breaks occur in $before which tells me whether or not a group name is present preg_match("/<br /", $before, $matches); echo "matches = count($matches)<br><br>"; <= this outputs: matches = count(Array) if(count($matches) == 2){ list($name, $address, $citystate) = split('<br />', $before); }elseif(count($matches) == 3){ list($name, $group, $address, $citystate) = split('<br />', $before); } already tried preg_match('<br />', $before, $matches); preg_match("<br />", $before, $matches); preg_match("/<br />/", $before, $matches); $pattern = "<br />"; preg_match($pattern, $before, $matches); with various syntactical errors Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213122 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 here is the whole code as of now <?php // GET THE FILE FROM URL $page = file_get_contents('URL'); // <= this is working // NEXT WE HAVE TO SEPARATE EACH OF THE ENTRIES // SOMETHING LIKE FIND 'SFdetail' THEN GO TO 2ND '</tr>' $sfDetailArr = split("SFdetail", $page); // puts it all into an array foreach($sfDetailArr as $key => $code){ // FOR EACH OF THESE OCCURRENCES RUN FOLLOWING // EXTRACT THE 2 PIECES OF CODE TO MINE list(,$after) = split('style="font-weight:bold;text-decoration:underline;">', $code); list($before, $phone) = split('</td>', $after); $before = htmlspecialchars($before); $phone = htmlspecialchars($phone); echo "before = $before<br><br>"; echo "phone1 = $phone<br><br>"; list(,$after1) = split('<p>', $phone); list($before1) = split('</p>', $after1); $phone = str_replace(' ', " ", $before1); echo "phone = $phone<br><br>"; // CHECK HOW MANY BREAKS THERE ARE TO DETERMINE IF PRACTICE NAME IS PRESENT $pattern = "<br />"; preg_match("/<br /", $before, $matches); echo "matches = count($matches)<br><br>"; if(count($matches) == 2){ list($name, $address, $citystate) = split('<br />', $before); }elseif(count($matches) == 3){ list($name, $group, $address, $citystate) = split('<br />', $before); } $name = str_replace('</a>', "", $name); // CHECK HOW MANY , THERE ARE TO DETERMINE HOW MANY CREDS ARE PRESENT $pattern = ','; preg_match("/,/", $name, $matches); if(count($matches) == 1){ list($name, $cred) = split(",", $name); }elseif(count($matches) == 2){ list($name, $cred, $cred2) = split(",", $name); $cred = $cred . ", " . $cred2; } // NAME SPLIT, CHECK IF MI EXISTS AND DO SPLIT BASED ON THAT $pattern = " "; preg_match("/ /", $name, $matches); if(count($matches) == 2){ list($fname, $lname) = split(" ", $name); }elseif(count($matches) == 3){ list($fname, $lname) = split(".", $name); } list($city, $state, $zip) = split(" ", $citystate); echo "$fname<br>$lname<br>$cred<br>$group<br>$address<br>$city<br>$state<br>$zip<br>$phone<br><br>"; echo "<br><br><hr>"; } ?> Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213125 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 which is giving me this: before = Michele Allen, DDS</a><br />3012 Niles Rd<br />Saint Joseph, MI 49085-8608 phone1 = <td style="text-align:left;padding-top:25px"> <p>(269) 429-2555</p> phone = matches = count(Array) with different info for each ofcourse Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213126 Share on other sites More sharing options...
per1os Posted March 22, 2007 Share Posted March 22, 2007 I never liked using preg_match as I was never good with regular expressions especially perls. That and doing it all with the split I can go through it step by step. If you post examples of each scenario I can provide you code for each without too much extra work. Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213135 Share on other sites More sharing options...
benjaminbeazy Posted March 22, 2007 Author Share Posted March 22, 2007 any record can be a combination of these scenarios, hence my preg_match check scenario 1: no group name <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail92203.html" style="font-weight:bold;text-decoration:underline;">James Anderson, DDS</a><br />921 N Pine River St<br />Ithaca, MI 48847-1119 </td> <td style="text-align:left;padding-top:25px"> <p>(989) 875-4721</p> </td> </tr> scenario 2: group name <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail93230.html" style="font-weight:bold;text-decoration:underline;">Jacqueline Anderson, DDS</a><br />Shoreline Family Dental Group<br />1121 Ottawa Beach Rd Ste 100<br />Holland, MI 49424-2528 </td> <td style="text-align:left;padding-top:25px"> <p>(616) 399-9520</p> </td> </tr> scenario 3: M. I. present (group name also present) <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail93230.html" style="font-weight:bold;text-decoration:underline;">Jacqueline A. Anderson, DDS</a><br />Shoreline Family Dental Group<br />1121 Ottawa Beach Rd Ste 100<br />Holland, MI 49424-2528 </td> <td style="text-align:left;padding-top:25px"> <p>(616) 399-9520</p> </td> </tr> scenario 4: 2 credentials(also group name, and M.I.) <tr> <td style="width:25%;vertical-align:top;text-align:center;padding-top:25px"> </td> <td style="text-align:left;padding-top:25px"> <a href="SFdetail93230.html" style="font-weight:bold;text-decoration:underline;">Jacqueline A. Anderson, DDS, ABC</a><br />Shoreline Family Dental Group<br />1121 Ottawa Beach Rd Ste 100<br />Holland, MI 49424-2528 </td> <td style="text-align:left;padding-top:25px"> <p>(616) 399-9520</p> </td> </tr> i hope thats what you're looking for, also i really want to thank you for your help thus far Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213140 Share on other sites More sharing options...
benjaminbeazy Posted March 23, 2007 Author Share Posted March 23, 2007 got it had to use preg_split instead of preg_match to get the array the way i wanted it and had to change my counting scheme a lil, some other mild mods now i just have to weed out some of the junk.. anyway, here's the completed code in case anyone is interested or has problems like this in the future thanks a lot Frost for your help, 'tis much appreciated <?php // GET THE FILE FROM URL $page = file_get_contents('URL'); // NEXT WE HAVE TO SEPARATE EACH OF THE ENTRIES // SOMETHING LIKE FIND 'SFdetail' THEN GO TO 2ND '</tr>' $sfDetailArr = split("SFdetail", $page); // puts it all into an array foreach($sfDetailArr as $key => $code){ // FOR EACH OF THESE OCCURRENCES RUN FOLLOWING // EXTRACT THE 2 PIECES OF CODE TO MINE list(,$after) = split('style="font-weight:bold;text-decoration:underline;">', $code); list($before, $phone) = split('</td>', $after); list(,$after1) = split('<p>', $phone); list($before1) = split('</p>', $after1); $phone = str_replace(' ', " ", $before1); // CHECK HOW MANY BREAKS THERE ARE TO DETERMINE IF PRACTICE NAME IS PRESENT $matches = preg_split('<br />', $before); if(count($matches) == 3){ list($name, $address, $citystate) = split('<br />', $before); }elseif(count($matches) == 4){ list($name, $group, $address, $citystate) = split('<br />', $before); } $name = str_replace('</a>', "", $name); // CHECK HOW MANY , THERE ARE TO DETERMINE HOW MANY CREDS ARE PRESENT $matches = preg_split("/,/", $name); if(count($matches) == 2){ list($name, $cred) = split(",", $name); }elseif(count($matches) == 3){ list($name, $cred, $cred2) = split(",", $name); $cred = $cred . ", " . $cred2; } // NAME SPLIT, CHECK IF MI EXISTS AND DO SPLIT BASED ON THAT $matches = preg_split('/ /', $name); if(count($matches) == 2){ list($fname, $lname) = split(' ', $name); }elseif(count($matches) == 3){ list($fname, $mi, $lname) = split(' ', $name); } list($city, $state, $zip) = split(" ", $citystate); $city = substr($city, 0, -1); echo "first = $fname<br>"; echo "middle = $mi<br>"; echo "last = $lname<br>"; echo "cred = $cred<br>"; echo "group = $group<br>"; echo "address = $address<br>"; echo "city = $city<br>"; echo "state = $state<br>"; echo "zip = $zip<br>"; echo "phone = $phone<br>"; echo "<hr>"; } ?> Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213153 Share on other sites More sharing options...
per1os Posted March 23, 2007 Share Posted March 23, 2007 Glad I could get you rolling. Seems like you got it, let us know if you need anything else. Link to comment https://forums.phpfreaks.com/topic/43891-solved-data-mining1-im-stuck-ideas-helpful/#findComment-213168 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.