Vladinator Posted December 12, 2006 Share Posted December 12, 2006 Hello there php freaks community! This is my first post here and request for help on the board. I am planning to start using this forum as a place to get help and request help from the nice people around, so it will be nice working with you. ;)Now, for my issue. I am trying to make a .php to read of a site, get data (the data is in a algorithm-like way) and show it, for example take a look at the coding of this site: http://www.proxy4free.com/page1.html - At line 203 we have a algorithm repeating itself some times.[code]<?php//Get the content$lines = file('http://www.proxy4free.com/page1.html');//Test 1/*$i='203';$n=($i+2);while($i < $n){ $n_B = ($n-1); $str_1 = str_replace("<td>", "", $lines[$i]); $str_2 = str_replace("</td>", "", $str_1); $str_3 = str_replace("\n", ":", $str_2); $str_4 = preg_replace("([:])","X$i",$str_3); $str_5 = str_replace("X$n_B", "", $str_4); $str_6 = str_replace("X$i", ":", $str_5); echo $str_6; $i++;}*///Test 2$i='210';$n=($i+2);while($i < $n){ $n_B = ($n-1); $str_1 = str_replace("<td>", "", $lines[$i]); $str_2 = str_replace("</td>", "", $str_1); $str_3 = str_replace("\n", ":", $str_2); $str_4 = preg_replace("([:])","X$i",$str_3); $str_5 = str_replace("X$n_B", "", $str_4); $str_6 = str_replace("X$i", ":", $str_5); echo $str_6; $i++;}?>[/code]This is what I have made. Now I am totaly new at handling stuff like this, I have NEVER ever used a script to access a site like this, and to make it get data, and not to mention I am not sure how to make it work as it should. I tried but I could only make it get from a specific line, then get the IP and Port number and list them nicely with : between. It took me ages and I would like to ask for tips, hints and help from more experienced people. I am hoping to hear from you in the near future. ;)NB: Just uncomment the first code and comment the 2nd code, then you will see what happens. :P Rather newbie way if you ask me. :P Quote Link to comment Share on other sites More sharing options...
c4onastick Posted December 13, 2006 Share Posted December 13, 2006 First, let me be the first to welcome you to this community. I've found it very helpful, probably the best one I've ever been a part of. The idea you have with the code above will work great. Its a little too much work (in my opinion, and when it comes to scripting... I'm lazy). With preg_match_all() you can pull out all the data you want in one (foul|fowl) swoop. (Haha! ... a little regex joke that sounded... a lot better... in my head...)[code]preg_match_all( '/<td>(\d\d\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?)<\/td>\s+<td>(\d\d\d?\d?)<\/td>\s+<td>(transparent|anonymous|high anonymity)<\/td>\s+<td>([A-Za-z ]*)<\/td>\s+<td>(\d{4}-\d\d-\d\d)<\/td>/i', $lines, $matches, PREG_SET_ORDER );[/code]Its pretty ugly, I know, but this will return an array that looks like this:[code]Array ( Array ( 'Whole first match, everything in the little <td> chunk', 'IP', 'Port', 'transparent, anonymous, etc.', 'Country', 'date' ), Array ( 'Whole second match', 'IP', 'Port', 'transparent, anonymous, etc.', 'Country', 'date' ), ...)[/code](once you have the ip in the array, you can reconstruct the 'Whois' link, that's why I didn't bother capturing it.)I use methods like this all the time for scraping data off websites. You may also (depending on your server) want to look in to using the curl library for getting webpage data. Generally it's more robust for this type of thing than file or file_get_contents.With curl, your script would look like this:[code]$ch = curl_init();// Optionally set a timeoutcurl_setopt($ch, CURLOPT_TIMEOUT, 30);curl_setopt($ch, CURLOPT_URL, 'http://www.proxy4free.com/page1.html');curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Allows your to assign the results of the url call to a variable, instead of dumping them to the screen$output = curl_exec($ch);curl_close($ch);// Pull out the data we wantpreg_match_all( '/<td>(\d\d\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?)<\/td>\s+<td>(\d\d\d?\d?)<\/td>\s+<td>(transparent|anonymous|high anonymity)<\/td>\s+<td>([A-Za-z ]*)<\/td>\s+<td>(\d{4}-\d\d-\d\d)<\/td>/i', $output, $matches, PREG_SET_ORDER );foreach($matches as $match){ echo $match[1]."\t"; // The ip address echo $match[2]."\t"; // The port echo $match[3]."\t"; // Transparent... etc. echo $match[4]."\t"; // Country echo $match[5]."\n"; // Date}[/code]Give that a shot and see how you like it. A good regex tester:http://regexlib.com/RETester.aspxCurl in PHP:http://us3.php.net/manual/en/ref.curl.phpWelcome to the community!Kudos on the Smiley backhanding IE. Quote Link to comment Share on other sites More sharing options...
Vladinator Posted December 15, 2006 Author Share Posted December 15, 2006 Thank you so much pal, it helped me a lot! Now I also learned about a new function in php! :D Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.