Jump to content

hellonoko

Members
  • Posts

    213
  • Joined

  • Last visited

Everything posted by hellonoko

  1. Alright I see how that works but it doesn't. Because its not going to evaluate period because mysql_num_rows(); isn't returning. Period.
  2. I have below simple query that compares a URL to a list of URLS in a DB. If the imput is found... $rows = 1; Then the code evaluates correctly. However if the imput is not found mysql_num_rows(); returns nothing. Not 0 rows. Not NULL and my 'else' statement fails. So I can only seem to evaluate to TRUE not to FALSE but I need both. How do I do this properly so I can evaluate both ways? $query = mysql_query("SELECT * FROM `secondarylinks` WHERE `link` = '$last_url' && `scraped` = '0' LIMIT 1") or die(mysql_error()); $rows = mysql_num_rows($query) or die(mysql_error()); if ( $rows === 1 ) { echo 'found row'; } else { echo 'not found'; }
  3. Thanks. I dunno what the problem ended up being but I rewrote the page and got it going.
  4. Might have been. I rewrote it and got it working... sorta.
  5. I see but don't I want to be a bit more exact since urls could be very similar but different? Still not sure what is happening as the below code returns neither in db or not in db. //echo $last_cralwed_link = "this.bigstereo.net/wp-content/uploads/2009/03/tomorrow-wow-remix.mp3"; echo $last_crawled_link = "thisisnotinthedb"; $query = mysql_query("SELECT * FROM `primarylinks` WHERE `link` LIKE '%$last_crawled_link%' LIMIT 1") or die(mysql_error()); $rows = mysql_num_rows($query) or die(mysql_error()); // if ( $rows == 0) { echo ' no such link'; } else { echo ' link in DB!'; }
  6. Well...... That returns a 'link in db!' And thats good. But above where I echo $last_crawled_link for the hell of it. It is blank. So I dunno how its working if the variable is empty... What do % do around a variable in a SQL statement?
  7. Well its crawling music blogs. The links will eventually be used to copy mp3s. I worked through my code a little deeper and found that at this line: echo $last_crawled_link = mysql_real_escape_string($last_crawled_link) or die(mysql_error()); The variable just goes away. That is mysql_real_escape_string turns it from a url into empty. Any ideas?
  8. My code below compares a link to a DB full of links. If the link is already in the DB it display the appropriate response. As it stands both links (fisrt two lines) are in the DB. If test is used my code works fine. However when I use the real url I get no output. No errors. Is there something I need to be doing when I am handling URLS in and outside of my DB? Any ideas? //$last_cralwed_link = "this.bigstereo.net/wp-content/uploads/2009/03/tomorrow-wow-remix.mp3"; $last_crawled_link = "test"; $query = mysql_query("SELECT * FROM `primarylinks` WHERE `link` = '$last_crawled_link' LIMIT 1") or die(mysql_error()); $rows = mysql_num_rows($query) or die(mysql_error()); // if ( $rows == 0) { echo ' no such link'; } else { echo ' link in DB!'; }
  9. Errors: Notice: Undefined variable: list_links in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 21 Once Notice: Undefined variable: list_links in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 58 Many times. MySQL server has gone away At end. Full code: <?php ini_set ("display_errors", "1"); error_reporting(E_ALL); mysql_connect("localhost","sharingi_ian","***")or die ("Could not connect to database"); mysql_select_db("sharingi_scrape") or die ("Could not select database"); //$target_url = "http://empreintes-digitales.fr"; //$target_url = 'http://redthreat.wordpress.com/'; //$target_url= 'http://www.kissatlanta.com/blog/'; //$target_url= 'http://www.empreintes-digitales.fr/'; //$target_url = 'http://electrorash.com/'; $target_url = 'http://this.bigstereo.net/'; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // crawl first page $clean_links = crawl_page( $target_url, $userAgent, $list_links); // seperates links into links that are direct mp3 links and other links. // foreach($clean_links as $key => $value) { if( strpos( $value, ".mp3") !== FALSE) { $mp3_links[] = $value; } else { $other_links[] = $value; } } $mp3_links = array_values($mp3_links); $other_links = array_values($other_links); foreach ($mp3_links as $link) { echo $link.'<br>'; } echo '<br>'; foreach ($other_links as $link) { echo $link.'<br>'; } /////// crawls second layer of links foreach ($other_links as $link) { $clean_links = crawl_page( $link , $userAgent, $list_links); foreach($clean_links as $key => $value) { if( strpos( $value, ".mp3") !== FALSE) { $mp3_links[] = $value; } else { $other_links[] = $value; } } $mp3_links = array_values($mp3_links); $other_links = array_values($other_links); } foreach ($mp3_links as $link) { echo $link.'<br>'; if ($link != NULL) { $exists = mysql_query("SELECT * FROM `links` WHERE link = '".mysql_real_escape_string($link)."' LIMIT 1") or die(mysql_error()); $rows = mysql_num_rows($exists); if ( $rows == 0) { $type = "mp3"; $query = "INSERT INTO links (`link`, `type`) VALUES ('".mysql_real_escape_string($link)."' ,'".mysql_real_escape_string($type)."' )"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } } echo '<br>'; foreach ($other_links as $link) { $type = "link"; echo $link.'<br>'; if (mysql_num_rows(mysql_query("SELECT * FROM `links` WHERE link = '$link' LIMIT 1")) == 0) { $query = "INSERT INTO links ( `link` , `type` ) VALUES ('$link' , '$type' )"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } echo $links_count; function crawl_page( $target_url, $userAgent, $links) { $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, false); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 100); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // // load scrapped data into the DOM // $dom = new DOMDocument(); @$dom->loadHTML($html); // // get only LINKS from the DOM with XPath // $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); // // go through all the links and store to db or whatever // for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); //if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front $clean_link = checkURL( $url, $target_url); $clean_link = str_replace( "http://" , "" , $clean_link); $clean_link = str_replace( "//" , "/" , $clean_link); $links[] = $clean_link; //removes empty array values foreach($links as $key => $value) { if($value == "") { unset($links[$key]); } } $links = array_values($links); } return $links; } function checkURL($url, $target_url) { if ( strpos($url, ".mp3") !== FALSE ) { if ( strpos($url , "http") === FALSE ) { //echo 'FIXED: '; $url = $target_url."/".$url; //echo '<br><br>'; return $url; } return $url; } $pos = strpos($url , $target_url); if ( $pos === FALSE ) { if ( strpos($url , "http") === FALSE ) { //echo 'FIXED: '; $url = $target_url."/".$url; //echo '<br><br>'; return $url; } } else { //echo 'COMPLETE: '.$url; //echo '<br><br>'; return $url; } } ?>
  10. I am receiving errors when I try to put an array of scraped URLs into a DB. Error: this.bigstereo.net/wp-content/uploads/2009/03/tomorrow-wow-remix.mp3 Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 82 this.bigstereo.net/wp-content/uploads/2009/03/01 Counterpoint 1.mp3 Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 82 this.bigstereo.net/wp-content/uploads/2009/03/Lips (Spruce Lee Inner Jungle Mix).mp3 Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 82 I was having similar problems with links that contained ' or " but cleaned up my query with mysql_real_escape_string() and was working perfect gathering links from another site. Can't see what the problem is with this. Any suggestions? Line 82 is: $rows = mysql_num_rows($exists); Thanks foreach ($mp3_links as $link) { echo $link.'<br>'; if ($link != NULL) { $exists = mysql_query("SELECT * FROM `links` WHERE link = '".mysql_real_escape_string($link)."' LIMIT 1"); $rows = mysql_num_rows($exists); if ( $rows == 0) { $type = "mp3"; $query = "INSERT INTO links (`link`, `type`) VALUES ('".mysql_real_escape_string($link)."' ,'".mysql_real_escape_string($type)."' )"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } }
  11. hell yea! Thanks. I will read about this real_escape_sting thing.
  12. Here: if ($link != NULL) { $exists = mysql_query("SELECT * FROM `links` WHERE link = '$link' LIMIT 1"); $rows = mysql_num_rows($exists); if ( $rows == 0) { $type = "mp3"; $query = "INSERT INTO links (`link`, `type`) VALUES ('$link' ,'$type' )"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } Only errors on links that contain ' and possibly "
  13. Well there are two instances of it but yes it is mysql_num_rows() that is giving the error. I was able to make it mostly work by cleaning up my query using ` ` But now I can see on the links that it still errors with they have ' or " in the names. Examples: rednicko.com/080923/Klaxons-Gravity'sRainbow(Guns'N'BombsFreakoutRemix).mp3 Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 76 rednicko.com/080923/GhostfaceKiller-CharlieBrown(Guns'N'BombsRemix).mp3 Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 76
  14. My below code crawls through a blog and the inserts the found links into my database. However I am receiving the following error for each time I try to insert a link: Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home2/sharingi/public_html/scrape/url_scraperV2.php on line 74 Line 74 compares the link to be inserted with existing rows to avoid duplicates. foreach ($mp3_links as $link) { echo $link.'<br>'; $query = mysql_query("SELECT * FROM links WHERE link=$link LIMIT 1"); $rows = mysql_num_rows($query); if ( $rows == 0) { $query = "INSERT INTO links (link) VALUES ('$link')"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } I am also noticing that even with this error about 1200 rows are inserted when it should be just about 600. This code worked fine in another version of the page any idea what I am doing wrong? Thanks <?php mysql_connect("localhost","sharingi_ian","*****")or die ("Could not connect to database"); mysql_select_db("sharingi_scrape") or die ("Could not select database"); //$target_url = "http://empreintes-digitales.fr"; $target_url = 'http://redthreat.wordpress.com/'; //$target_url= 'http://www.kissatlanta.com/blog/'; //$target_url= 'http://www.empreintes-digitales.fr/'; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // crawl first page $clean_links = crawl_page( $target_url, $userAgent, $list_links); // seperates links into links that are direct mp3 links and other links. // foreach($clean_links as $key => $value) { if( strpos( $value, ".mp3") !== FALSE) { $mp3_links[] = $value; } else { $other_links[] = $value; } } $mp3_links = array_values($mp3_links); $other_links = array_values($other_links); foreach ($mp3_links as $link) { echo $link.'<br>'; } echo '<br>'; foreach ($other_links as $link) { echo $link.'<br>'; } /////// crawls second layer of links foreach ($other_links as $link) { $clean_links = crawl_page( $link , $userAgent, $list_links); foreach($clean_links as $key => $value) { if( strpos( $value, ".mp3") !== FALSE) { $mp3_links[] = $value; } else { $other_links[] = $value; } } $mp3_links = array_values($mp3_links); $other_links = array_values($other_links); } foreach ($mp3_links as $link) { echo $link.'<br>'; $query = mysql_query("SELECT * FROM links WHERE link=$link LIMIT 1"); $rows = mysql_num_rows($query); if ( $rows == 0) { $query = "INSERT INTO links (link) VALUES ('$link')"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; //echo "<b>link added to db</b>"; //echo "<br>"; } } } echo '<br>'; foreach ($other_links as $link) { echo $link.'<br>'; if (mysql_num_rows(mysql_query("SELECT * FROM links WHERE link=$link LIMIT 1")) == 0) { $query = "INSERT INTO links (link) VALUES ('$link')"; if ($result = mysql_query($query)) { $link_count = $link_count + 1; } } } echo $links_count; function crawl_page( $target_url, $userAgent, $links) { $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, false); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 100); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // // load scrapped data into the DOM // $dom = new DOMDocument(); @$dom->loadHTML($html); // // get only LINKS from the DOM with XPath // $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); // // go through all the links and store to db or whatever // for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); //if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front $clean_link = checkURL( $url, $target_url); $clean_link = str_replace( "http://" , "" , $clean_link); $clean_link = str_replace( "//" , "/" , $clean_link); $links[] = $clean_link; //removes empty array values foreach($links as $key => $value) { if($value == "") { unset($links[$key]); } } $links = array_values($links); } return $links; } function checkURL($url, $target_url) { if ( strpos($url, ".mp3") !== FALSE ) { if ( strpos($url , "http") === FALSE ) { //echo 'FIXED: '; $url = $target_url."/".$url; //echo '<br><br>'; return $url; } return $url; } $pos = strpos($url , $target_url); if ( $pos === FALSE ) { if ( strpos($url , "http") === FALSE ) { //echo 'FIXED: '; $url = $target_url."/".$url; //echo '<br><br>'; return $url; } } else { //echo 'COMPLETE: '.$url; //echo '<br><br>'; return $url; } } ?>
  15. Thanks but only returns the $target_url So all the link are exactly the same. $target_url is the base URL of the links I am going through but maybe I can figure it out from here.
  16. My below code loops retrieved URLs into a array while displaying them. The URLs display correctly so I know my function is working however I must be storing them to the array or trying to display them wrong because that part of my code does not work. What am I doing wrong here? for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); echo '<b>'.$links = $clean_link = checkURL( $url, $target_url).'<b>'; echo '<br>'; } echo count($links); foreach ($links as $link) { echo $link; echo '<br>'; }
  17. Should not be any odd permission things going on. Server glitch?
  18. Arhmmm... Thanks. I might need to stop looking at this for a day or something I think I got it working how I want now thanks for your help.
  19. I finally got it to work but not how it should. If i use strpos( $url , "http" ); it works However if I use: strpos ($url, $target_url); its always false like you said. because its not comparing correctly. Any ideas on that one?
  20. To be more specific: function checkURL($url, $target_url) { echo $url.'<br>'; echo $target_url.'<br>'; echo gettype($url).'<br>'; echo gettype($target_url).'<br>'; echo '<b>'; echo $pos = strpos($url , $target_url); echo '</b>'; } Returns: http://empreintes-digitales.fr/board/register.php http://www.empreintes-digitales.fr string string http://empreintes-digitales.fr/board/login.php?action=forget http://www.empreintes-digitales.fr string string # http://www.empreintes-digitales.fr string string http://66.102.9.104/translate_c?hl=fr&sl=fr&tl=en&u=www.empreintes-digitales.fr/index.php http://www.empreintes-digitales.fr string string And on and on. Nothing from $pos = strpos()
  21. I understand that but no matter how I change it around it echos nothing. But if i take that section of code and put it in its own page. It works fine. Just not inside the function.
  22. I have tried that but its not so much that as... On line 74 where I echo $pos = strpos(); It echos nothing. The function returns empty?
  23. If you look in the source you can see it actually uses a javascript call when the buy now button is clicked to do the adding to the cart. <table cellspacing="0" cellpadding="0" onclick="javascript: document.orderform_72_1220992491.submit();" class="ButtonTable"> <tr><td><img src="/store/skin1/images/but1.gif" class="ButtonSide" alt="" /></td><td class="Button"><font class="Button">Buy Now</font></td><td><img src="/store/skin1/images/but2.gif" class="ButtonSide" alt="" /></td></tr> </table> I think..
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.