Jump to content

I think I go a $page_contents problem, please help


CraftHell

Recommended Posts

Okay I believe I have a $page_contents problem cause im calling 2 functions to grab data from the same site but at 2 different location at the same time. The script works just like i want it but when theres too many items to grab it wont load the page.. Is there anything i can do?

 

It will load over 1000's of items down a list using just one $page_contents.. but when i try both im lucky if i can see a list of 5 items..

Cause see one function will grab a list of specified items (page.php?load=) and the next function will grab the price for each item from their own page at the same time..

So i guees if i wanna load 10 items, and have the script grab their prices from their individual pages, thats i guess 11 pages loaded in the background?

 

Heres the full code including the html:

<?php

//////////////////////////////////////////////////////////////////////////////
// 
//  This function will grab item name, id#, and start a table.
//  Usage: (thispage.php?load=http://ffxiv.yg.com/items?blahblahblah)
// 
function grabdata($site){
$page_contents = file_get_contents($site);
$pattern = '/\/item\/([a-z,-]+[^gil])\?id\=([0-9,]+)/';
preg_match_all($pattern, $page_contents, $matches);
$arrRetList = array();
if(!empty($matches)){
	foreach($matches as $item) {
		foreach($item as $pos => $row){
			$arrRetList[$pos][] = $row;	
		}
	}
	foreach($arrRetList as $row){
		echo "<tr><td align='left'valign='center'><p class='items'><a href='http://ffxiv.yg.com" . $row[0] . "'>Unknown Item</a></p></td>";
		grabprice($row[0]);
		echo "</tr>";
	}
}
}

//////////////////////////////////////////////////////////////////////////////
// 
//  This function will use the data grabbed in function grabdata to go to
//  individual item pages and get their resell value at normal, +1, +2, +3
//  and finish the table.
// 
function grabprice($price){
$pricesite = "http://ffxiv.yg.com" . $price;
$pattern = '/([Normal,+1,+2,+3]+)<\/td><td align=\"right\">([0-9,]+)<\/td><\/tr>/';
$page_contents = file_get_contents($pricesite);
$matches = array();
preg_match_all($pattern, $page_contents, $matches);
echo "<td align='center'valign='center'><p class='items'>";

//////////////////////////
// Array Key
//////////////////////////
// [1][0] = Normal
// [1][1] = +1
// [1][2] = +2
// [1][3] = +3
// [2][0] = NQ Price
// [2][1] = +1 Price
// [2][2] = +2 Price
// [2][3] = +3 Price

//////////////////////////////////////////////////////////////////////////////
//
//  If the price isnt found for normal quality it will return a 0
//
if ($matches[2][0] == "") {
	echo "0";
} else {
	echo $matches[2][0];
}
echo "g</p></td><td align='center'valign='center'><p class='items'>";
//////////////////////////////////////////////////////////////////////////////
//
//  If the price isnt found for +1 quality it will return a 0
//
if ($matches[2][1] == "") {
	echo "0";
} else {
	echo $matches[2][1];
}
echo "g</p></td><td align='center'valign='center'><p class='items'>";
//////////////////////////////////////////////////////////////////////////////
//
//  If the price isnt found for +2 quality it will return a 0
//
if ($matches[2][2] == "") {
	echo "0";
} else {
	echo $matches[2][2];
}
echo "g</p></td><td align='center'valign='center'><p class='items'>";
//////////////////////////////////////////////////////////////////////////////
//
//  If the price isnt found for +3 quality it will return a 0
//
if ($matches[2][3] == "") {
	echo "0";
} else {
	echo $matches[2][3];
}
echo "g</p></td>";
}

echo "<table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
<tr>
<td align='left' valign='center'>
<p class='items'></p>
</td>
<td align='center'valign='center'>
<p class='items'><b>Normal</b></p>
</td>
<td align='center'valign='center'>
<p class='items'><b>+1</b></p>
</td>
<td align='center'valign='center'>
<p class='items'><b>+2</b></p>
</td>
<td align='center'valign='center'>
<p class='items'><b>+3</b></p>
</td>
</tr>";
grabdata($_GET['load']);
echo "</table>";

?>

 

 

Link to comment
Share on other sites

It will load over 1000's of items down a list using just one $page_contents [...] So i guees if i wanna load 10 items, and have the script grab their prices from their individual pages, thats i guess 11 pages loaded in the background?

 

You would be right, and that's exactly your problem. HTTP requests are relatively slow, and making (what sounds like you're trying to) thousands at a time is a ridiculous load on both your server and theirs. This problem is generally overcome by "scraping" their website, page by page with a courteous delay in between, indexing the data in your own database and then querying that for the details later. Think search engines.

 

This is of course assuming you're not violating their terms of use, and they may still block you. I'm surprised they haven't already if you're firing off even 10 requests at once - probably just a matter of time till they notice to be honest.

Link to comment
Share on other sites

what application are you using? or did you write this code?

I codded this piece by piece

 

This is of course assuming you're not violating their terms of use, and they may still block you. I'm surprised they haven't already if you're firing off even 10 requests at once - probably just a matter of time till they notice to be honest.

yea they actually shutdown my server right after i posted this topic

Account ###.####.com exceeded allowed 70% CPU quota limit for more than 100 times. This is considered abnormal as it causes a high server load and overall slowdown. Website must be secured and optimized or removed form the server. :wtf:

 

Is there anyway i can get the info i need without crashing servers ?  :confused:

Link to comment
Share on other sites

I could just make the one part grab the info and database it, and the sercond to grab the prices and add them to the items in the newly created database.. running the script seperate and not crashing the server :D

 

Then calling the info back for display on my end only from the mySQL database :D

 

But this would mean i would have to rewrite my scripts, and send a apology letter to my host and try and get my service back lmfao..

Link to comment
Share on other sites

You would be best off running it from the command line, as then you won't run into time-out issues or require a browser to be left open running it. Although I'd check your host's terms of use because they might not want you running spiders on their servers.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.