Jump to content

Checking over 100000 pages for code


XeroXer

Recommended Posts

Hi there.

I would like som help.
I am a member at a gaming page.
There all members have their own presentation site.
The members page adress are like this:
[a href=\"http://www.example.com/member/profile/145031\" target=\"_blank\"]http://www.example.com/member/profile/145031[/a]

The different members have special status depending on their gaming behavior.
What I now would like is to have a list of how many members have certain status.

The status is displayed of a certain image. Let's call it:
[a href=\"http://www.example.com/images/powermember.gif\" target=\"_blank\"]http://www.example.com/images/powermember.gif[/a]

So a php script that searches the html code from:
[a href=\"http://www.example.com/member/profile/1\" target=\"_blank\"]http://www.example.com/member/profile/1[/a]
to:
[a href=\"http://www.example.com/member/profile/500000\" target=\"_blank\"]http://www.example.com/member/profile/500000[/a]
after:
[a href=\"http://www.example.com/images/powermember.gif\" target=\"_blank\"]http://www.example.com/images/powermember.gif[/a]
and then displayes how many it found.

Can anyone help me with this?
I have done some php'ing before so a few guidlines and I might be able to do it myself.

THankful for any help...
Link to comment
https://forums.phpfreaks.com/topic/6207-checking-over-100000-pages-for-code/
Share on other sites

you would need to look into these two function

ereg()
file_get_contents()


assuming you know how many users there are, you'd create a loop from 1 to that number using a counter var
and have it ereg each page for that gif
using the counter var again


something like this
[code]
$parseUrl = "http://www.example.com/member/profile/";
$numUsers = 0;
for($i=0; $i<5000; $i++) {
   $parsePage = file_get_contents($parseUrl$i);
   if(ereg("http://www.example.com/images/powermember.gif", $parsePage))
       $numUsers++;
}

echo "There are " . $numUsers . " power users";[/code]


EDIT

But using something like that would take FOREVER if there are a lot of pages
[!--quoteo(post=360123:date=Mar 30 2006, 02:44 PM:name=ober)--][div class=\'quotetop\']QUOTE(ober @ Mar 30 2006, 02:44 PM) [snapback]360123[/snapback][/div][div class=\'quotemain\'][!--quotec--]
Please tell me that this is all generated from the same page and you just want to look at the status of a field in a database of all your users.
[/quote]
I think, no I'm pretty sure that he's trying to phish data from a database he doesn't have access to.
[!--quoteo(post=360123:date=Mar 30 2006, 02:44 PM:name=ober)--][div class=\'quotetop\']QUOTE(ober @ Mar 30 2006, 02:44 PM) [snapback]360123[/snapback][/div][div class=\'quotemain\'][!--quotec--]Otherwise, that's gotta be the worst design of a website I've ever heard of.
[/quote]
These forums are laid out such a way.....for instance, the 'newbies'. 'lurkers', and 'gurus' all have a significant field value on their profile...wouldn't be too hard to just do a loop, but it would take such a long time
Well I really don't know how the page is build but I hope it comes from a database.

I tried your code like this:
[code]<?php
$parseUrl = "http://www.example.com/member/view/";
$numUsers = 3;
for($i=3; $i<143089; $i++)
{
    $parsePage = file_get_contents($parseUrl$i);
    if(ereg("http://www.example.com/pictures/supermember.gif", $parsePage))
    $numUsers++;
}
echo "There are " . $numUsers . " super members. ";
?>[/code]

The first member has the number 3 and the last 143089.
This code results in nothing.
The source becomes this:
[code]<html><body></body></html>[/code]
[a href=\"http://www.gamers.nu\" target=\"_blank\"]Gamers.nu[/a]
That's the page.

I got the script working from one point of view.
It tries to read the page correctly but I get a LOT of errors.
Well I get 150000 errors :-)

It's the file_get_contents() that doesn't work.
Because probebly they generate the page from the database.
So there is no .php or .html file to read from.

This is the code that "worked":
[code]<?php
$siteurl = "http://www.gamers.nu/profile/show/";
$imgurl = "http://www.gamers.nu/_tpl/site/default/_img/flags/gold.gif";
$numusers = 0;

for($i = 0; $i < 150000; $i++)
{
    $sitepage = file_get_contents("$siteurl$i");
    if(ereg($imgurl, $sitepage))
    $numusers++;
}
?>
<html>
<head>
<title>Gnu members...</title>
</head>
<body>
<?php
echo "Det finns " . $numusers . " guldmedlemmar på gamers.nu. ";
?>
</body>
</html>[/code]

You can see the page here:
[a href=\"http://www.xeroxer.com/gnu.php\" target=\"_blank\"]Gnu - XeroXer.com[/a]
[!--quoteo(post=360187:date=Mar 31 2006, 12:48 AM:name=footballkid4)--][div class=\'quotetop\']QUOTE(footballkid4 @ Mar 31 2006, 12:48 AM) [snapback]360187[/snapback][/div][div class=\'quotemain\'][!--quotec--]
A page isn't just "generated" from the database. file_get_contents may or may not work if the host has mod_rewrite in use...you might want to try sockets or curl
[/quote]

How do I get those working?
All my php installation and .ini file are at my webhoster. I can't edit anything really.

I can set php version to PHP4 or PHP5.
I can set PHP errror messages on or off.
I can turn register globals on or off.

Please help... :-)
Well it seeems the curl needs me to install something extra while the sockets does not.
Witch means the sockets would be a better try. :-)

I have never used sockets before.
Could anyone help me with how to use it?
How to use it in the above script to get it working. :-)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.