Jump to content

[SOLVED] Taking numbers from a site and listing them


Vladinator

Recommended Posts

Hello there php freaks community! This is my first post here and request for help on the board. I am planning to start using this forum as a place to get help and request help from the nice people around, so it will be nice working with you.  ;)

Now, for my issue. I am trying to make a .php to read of a site, get data (the data is in a algorithm-like way) and show it, for example take a look at the coding of this site: http://www.proxy4free.com/page1.html - At line 203 we have a algorithm repeating itself some times.

[code]<?php

//Get the content

$lines = file('http://www.proxy4free.com/page1.html');

//Test 1
/*
$i='203';
$n=($i+2);
while($i < $n){
$n_B = ($n-1);
$str_1 = str_replace("<td>", "", $lines[$i]);
$str_2 = str_replace("</td>", "", $str_1);
$str_3 = str_replace("\n", ":", $str_2);
$str_4 = preg_replace("([:])","X$i",$str_3);
$str_5 = str_replace("X$n_B", "", $str_4);
$str_6 = str_replace("X$i", ":", $str_5);
echo $str_6;
$i++;
}
*/

//Test 2

$i='210';
$n=($i+2);
while($i < $n){
$n_B = ($n-1);
$str_1 = str_replace("<td>", "", $lines[$i]);
$str_2 = str_replace("</td>", "", $str_1);
$str_3 = str_replace("\n", ":", $str_2);
$str_4 = preg_replace("([:])","X$i",$str_3);
$str_5 = str_replace("X$n_B", "", $str_4);
$str_6 = str_replace("X$i", ":", $str_5);
echo $str_6;
$i++;
}

?>[/code]

This is what I have made. Now I am totaly new at handling stuff like this, I have NEVER ever used a script to access a site like this, and to make it get data, and not to mention I am not sure how to make it work as it should. I tried but I could only make it get from a specific line, then get the IP and Port number and list them nicely with : between. It took me ages and I would like to ask for tips, hints and help from more experienced people.

I am hoping to hear from you in the near future. ;)

NB: Just uncomment the first code and comment the 2nd code, then you will see what happens. :P Rather newbie way if you ask me.  :P
Link to comment
Share on other sites

First, let me be the first to welcome you to this community. I've found it very helpful, probably the best one I've ever been a part of. The idea you have with the code above will work great. Its a little too much work (in my opinion, and when it comes to scripting... I'm lazy). With preg_match_all() you can pull out all the data you want in one (foul|fowl) swoop. (Haha! ... a little regex joke that sounded... a lot better... in my head...)
[code]preg_match_all( '/<td>(\d\d\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?)<\/td>\s+<td>(\d\d\d?\d?)<\/td>\s+<td>(transparent|anonymous|high anonymity)<\/td>\s+<td>([A-Za-z ]*)<\/td>\s+<td>(\d{4}-\d\d-\d\d)<\/td>/i', $lines, $matches, PREG_SET_ORDER );[/code]

Its pretty ugly, I know, but this will return an array that looks like this:
[code]Array (
  Array ( 'Whole first match, everything in the little <td> chunk', 'IP', 'Port', 'transparent, anonymous, etc.', 'Country', 'date' ),
  Array ( 'Whole second match', 'IP', 'Port', 'transparent, anonymous, etc.', 'Country', 'date' ),
  ...
)[/code](once you have the ip in the array, you can reconstruct the 'Whois' link, that's why I didn't bother capturing it.)

I use methods like this all the time for scraping data off websites. You may also (depending on your server) want to look in to using the curl library for getting webpage data. Generally it's more robust for this type of thing than file or file_get_contents.

With curl, your script would look like this:
[code]$ch = curl_init();

// Optionally set a timeout
curl_setopt($ch, CURLOPT_TIMEOUT, 30);

curl_setopt($ch, CURLOPT_URL, 'http://www.proxy4free.com/page1.html');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Allows your to assign the results of the url call to a variable, instead of dumping them to the screen

$output = curl_exec($ch);
curl_close($ch);

// Pull out the data we want
preg_match_all( '/<td>(\d\d\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?)<\/td>\s+<td>(\d\d\d?\d?)<\/td>\s+<td>(transparent|anonymous|high anonymity)<\/td>\s+<td>([A-Za-z ]*)<\/td>\s+<td>(\d{4}-\d\d-\d\d)<\/td>/i', $output, $matches, PREG_SET_ORDER );

foreach($matches as $match)
{
  echo $match[1]."\t"; // The ip address
  echo $match[2]."\t"; // The port
  echo $match[3]."\t"; // Transparent... etc.
  echo $match[4]."\t"; // Country
  echo $match[5]."\n"; // Date
}
[/code]

Give that a shot and see how you like it.

A good regex tester:
http://regexlib.com/RETester.aspx

Curl in PHP:
http://us3.php.net/manual/en/ref.curl.php

Welcome to the community!

Kudos on the Smiley backhanding IE.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.