CodeMama Posted April 6, 2009 Share Posted April 6, 2009 I am trying to run a restaurant inspection query on our local inspection db and then store the results in my own db but I am having problems with syntax with the breaking up of data I get. My script does grab the page content and displays the separate paragraphs, each paragraph is one inspection, but now I need to be able to store Restaurant name, date of inspection, and inspection results in separate fields in my own db (is this making sense) I have been reading over cURL and different methods but I keep running into syntax problems I guess. Here is my script so far: <?php $TESTING = TRUE; $target_url = "http://www.springfieldmo.gov/health/database/foodinspections/index.jsp?st_pfx=none¤t_name=&start_day=1&end_year=2009&start_month=1&st_nmbr=&end_month=4&end_day=6&Submit=Search&st_name=&start_year=2009&str_loc=none&offset=0"; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 100); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); echo $html; $graphs = split("<p", $html); // Start at 6 to clear out junk at top. Use $i+1 since last paragraph // is footnote that is not needed. for ($i = 6; $i+1 < count($graphs); $i++) { if($TESTING) echo "$i: $graphs[$i]<br />"; //split the paragraphs into lines $graphs->getAttribute('graphs'); $lines = split("<br", $graphs); //for ($i = 1; $i+1 < count($lines); $i++) { // Grab restaurant name if($TESTING) echo "$i: $lines[$i]<br />"; <-----------------------this isn't really doing anything I don't think.... } // Grab address // Grab city // Grab date and visit type // Grab rest of text and store it. Grab numbers of violations? } /**/ // grab all the paragraphs on the page $xpath = new DOMXPath($dom); //$graphs = $xpath->evaluate("/html/body//p"); $graphs=$dom->getElementsByTagName("p"); // Set $i =5 because first 5 paragraphs are not inspections for ($i = 5; $i+1 < $graphs->length; $i++) { $paragraph = $graphs->item($i); $text = $dom->saveXML($paragraph); $text = trim($text); if($TESTING) echo "<br />$i Graph: " . $text . "<br />"; } ?> thanks in advance for any solution help. Quote Link to comment https://forums.phpfreaks.com/topic/152840-help-with-formatting-data-after-it-is-scraped/ Share on other sites More sharing options...
Axeia Posted April 6, 2009 Share Posted April 6, 2009 If you're grabbing the source alright wouldn't the easiest way be to manipulate it to extract the information from it? For example create a domdocument resource out of it and then you can use functions similar to those of javascript to reach the node (paragraph) you're after. http://php.net/dom Quote Link to comment https://forums.phpfreaks.com/topic/152840-help-with-formatting-data-after-it-is-scraped/#findComment-802646 Share on other sites More sharing options...
Maq Posted April 6, 2009 Share Posted April 6, 2009 For example create a domdocument resource out of it and then you can use functions similar to those of javascript to reach the node (paragraph) you're after. http://php.net/dom Can't you see she's already using DOM? And I don't know what you mean by similar to javascript...? but now I need to be able to store Restaurant name, date of inspection, and inspection results in separate fields in my own db (is this making sense) What part are you having trouble with? Inserting into the database or grabbing the specific information? Try and get the appropriate information displaying first, you may have to read about xpath. I don't see where you even try to insert into the database, I know you know how to do it, I've helped you with MySQL questions before. Quote Link to comment https://forums.phpfreaks.com/topic/152840-help-with-formatting-data-after-it-is-scraped/#findComment-802735 Share on other sites More sharing options...
Axeia Posted April 6, 2009 Share Posted April 6, 2009 Can't you see she's already using DOM? And I don't know what you mean by similar to javascript...? My bad, new to these boards and failed to notice the scrollbar in the code field. Assumed wrongly by the way the question was asked that she tried to get part of the source via CURL directly. (Comparison with javascript was made as I assume most webdevelopers are familiar with it, and manipulating the DOM with it makes you feel at home in the PHP DOM quite fast.. or at least I did.) If you're having problems with extracting the information out of it, the way I'd prolly attempt it would be to get the textnode of the font tag inside the paragraphs and doing an $arrRestaurantInfo = explode( '<br/>', $obtainedResult );. That should give you the all of the text in it a nice array. Once that's done it's simply a matter of doing a print_r( $arrRestaurantInfo ); so you know which part ended up where. If I'm not mistaken it should be like this: <?php $arrRestaurantInfo[0]; //Restaurant name $arrRestaurantInfo[1]; //Address $arrRestaurantInfo[2]; //Date +string "inspection" //Extract date by getting the the part before a space in the string is found. $date = substr( $arrRestaurantInfo[2], 0, stripos( $arrRestaurantInfo[2], ' ' ) ); //Same as above, but requires a very recent PHP version (5.3.0) //$date = strstr( $arrRestaurantInfo[2], ' ', false ); ?> Didn't test anything, so copy pasting might not be a good idea. Quote Link to comment https://forums.phpfreaks.com/topic/152840-help-with-formatting-data-after-it-is-scraped/#findComment-802798 Share on other sites More sharing options...
Maq Posted April 6, 2009 Share Posted April 6, 2009 My bad, new to these boards and failed to notice the scrollbar in the code field. Assumed wrongly by the way the question was asked that she tried to get part of the source via CURL directly. (Comparison with javascript was made as I assume most webdevelopers are familiar with it, and manipulating the DOM with it makes you feel at home in the PHP DOM quite fast.. or at least I did.) No biggie, wasn't trying to be rude sorry if that's what you thought. Welcome to the boards! As far as JS, I assume you're referring the OOP and the similar class methods it provides. Which yes, you could say if you've used DOM in JS then it would be a fairly easy transition to PHP DOM. Quote Link to comment https://forums.phpfreaks.com/topic/152840-help-with-formatting-data-after-it-is-scraped/#findComment-802820 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.