rofl90 Posted June 11, 2008 Author Share Posted June 11, 2008 Still the error: ~<div id="ATTRACTION_REVIEW" class="listing">([.\n]+)</div><!--/ ATTRACTION_REVIEW\.listing-->~i I've tested everything in here, it is definitely the reg ex. Quote Link to comment Share on other sites More sharing options...
DarkWater Posted June 11, 2008 Share Posted June 11, 2008 Try: ~<div id="ATTRACTION_REVIEW" class="listing">(.+)</div><!--/ ATTRACTION_REVIEW\.listing-->~is Added the 's' flag, which adds newline to the . Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 11, 2008 Author Share Posted June 11, 2008 it works! Yay, thankyou so much!!! Quote Link to comment Share on other sites More sharing options...
DarkWater Posted June 11, 2008 Share Posted June 11, 2008 Any time. Please click Solved (bottom-left) when you get the chance. =) Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 11, 2008 Author Share Posted June 11, 2008 Wait, now it doesn't work, it did it for that one, but well... I'll explain: Theres about 5,000,000 pages, and about 100,000 of them are attractions of which I want, so it needs to be able to differenciate between each of them, and when it finds an attraction send it to mysql. It just won't load now. Code: mysql_connect($CONF["HOST"], $CONF["USERNAME"], $CONF["PASSWORD"]); mysql_select_db($CONF["DATABASE"]); $first = 100000; while($first < 1500000) { $fileName = "http://www.x.com/x-g--d" . $first . ".html"; $file = file_get_contents($fileName); $match = array(); if(preg_match('~<div id="ATTRACTION_REVIEW" class="listing">(.+)</div><!--/ ATTRACTION_REVIEW\.listing-->~is', $file, $match)) { $result = mysql_query("INSERT INTO pages (linkName, theText) VALUES('$fileName', '$match[1]')") or die(mysql_error()); //important if($result) { echo "Node " . $first . ": Success"; $first++; } else { echo "Failure\n\n"; } } else { echo "Node" . $first . ": Failed"; } } ?> Quote Link to comment Share on other sites More sharing options...
DarkWater Posted June 11, 2008 Share Posted June 11, 2008 Add set_time_limit(1000); as your first line. PHP times out at 30 seconds by default, and this could be a REALLY long script. You may need to extend it further, and/or change the Apache Timeout directive also. Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 11, 2008 Author Share Posted June 11, 2008 Hmm, ok, I'll see if I can run it across a few servers, it does work with just one, so thanks, bye! Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 Hmm, I ran this all night, it didn't do a single one. Heres the code: <?php set_time_limit(1000000000000); $CONF = array(); $CONF["DATABASE"] = 'x'; $CONF["USERNAME"] = 'x'; $CONF["PASSWORD"] = 'x'; $CONF["HOST"] = 'x'; mysql_connect($CONF["HOST"], $CONF["USERNAME"], $CONF["PASSWORD"]); mysql_select_db($CONF["DATABASE"]); $first = 5000; while($first < 1500000) { $fileName = "http://www.x.com/x-g--d" . $first . ".html"; $file = file_get_contents($fileName); $match = array(); if(preg_match('~<div id="ATTRACTION_REVIEW" class="listing">(.+)</div><!--/ ATTRACTION_REVIEW\.listing-->~is', $file, $match)) { $result = mysql_query("INSERT INTO pages (linkName, theText) VALUES('$fileName', '$match[1]')") or die(mysql_error()); //important if($result) { echo "Node " . $first . ": Success"; ?> <script type="text/javascript"> d = document.getElementById("d"); d.innerHTML = <?php echo $first; ?>; </script> <?php $first++; } else { echo "Failure\n\n"; } } else { echo "Node" . $first . ": Failed"; } } ?> Although it does work on one. Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 Using it on one different to another one, it gives me this s Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 Using it on one different to another one, it gives me this sql error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 's permanent collection spans the period from about 1250 to 1900 and consists of ' at line 1 Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 Having a stab in the dark here but you are getting that error because the text inside your DIV has unescaped quote marks which is, i think, killing the query. Try this $div_data = mysql_real_escape_string($match[1]); $result = mysql_query("INSERT INTO pages (linkName, theText) VALUES('$fileName', '$div_data')") or die(mysql_error()); Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 Heh, you were right, now this script is still taking forever, how long should it be taking Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 I mean, I know this will take a while, but are we talking days.. or years? Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 Well.. Your using, im assuming, remote files which means your server has to open all of the HTML files in the while loop and get its contents so it all depends on the amount of pages.. If you want a good estimate of time..limit it to 10 or so pages and see how long that takes if that takes 10seconds you could ALMOST assume that 300 records its going to take 5 minutes.. Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 Ok, I've got it working, is there anyway for me to stop it completeing the entire script before outputting, so that I can see if each node is success failure/not an attraction in real-time? Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 Not really.. You could, i suppose, write to a text file the result of each 'node' Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 I've done this but it won't update. :S Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 the text file? Show us your code for writing to a text file Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 handle downwards: $handle = fopen("c:\my_prog\tracking_info.txt", "a+") or die("Unable to open file"); while($first < 999999) { $fileName = "http://www.x.com/x-g--d" . $first . ".html"; if($file = file_get_contents($fileName)) { $match = array(); if(preg_match('~<div id="ATTRACTION_REVIEW" class="listing">(.*?)</div><!--/ ATTRACTION_REVIEW\.listing-->~is', $file, $match)) { $match[1] = mysql_real_escape_string($match[1]); $result = mysql_query("INSERT INTO pages (linkName, theText) VALUES('$fileName', '$match[1]')") or die(mysql_error()); //important if($result) { fwrite($handle, "Node " . $first . ": Success\n\n"); $first++; } else { fwrite($handle, "Node" . $first . ": Failed\n\n>"); } } else { fwrite($handle, "Node" . $first . ": Not an attraction\n\n"); $first++; } } else { $first++; fwrite($handle, "Could not open Node" . $first . "\n\n"); } } ?> Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 it gives me the Unable to open fire error Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 $handle = fopen("c:\\my_prog\\tracking_info.txt", "a+") or die("Unable to open file"); From PHP.net: On the Windows platform, be careful to escape any backslashes used in the path to the file, or use forward slashes. Quote Link to comment Share on other sites More sharing options...
rofl90 Posted June 12, 2008 Author Share Posted June 12, 2008 I'm using it relative so say its running on example.com/find_all.php the file is example.com/tracking_info.txt - heres the line: $handle = fopen("tracking_info.txt", "a+") or die("Unable to open file"); Quote Link to comment Share on other sites More sharing options...
Buddski Posted June 12, 2008 Share Posted June 12, 2008 do you have the correct permissions on that file so you are allowed to open and write to it? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.