Jump to content

PHP Script Causing Apache Max Connections Issue


joallen
Go to solution Solved by requinix,

Recommended Posts

Over the past few days I have been running into some issues with my server crashing due to apache max connections issues. I am running my site off of a hosted Cloud VPS with 200GB of storage, 8192MB Memory, 8TB of transfer, Apache, MySQL, PHP5, and CentOS.

 

I am afraid the issue doesn't necessarily lay in the configuration of Apache, but the way I have scripted the php on my site, the reason I am reaching out here. My site isn't your average website, it is more of a web-based customer management program. There are currently only 2 pages you can actually access via the url bar (signin.php and index.php). All other content is loaded via AJAX and JQuery processes (.load and $.getScript).

 

All AJAX requests are pointed toward a single file called functions.php where a _POST parameter contains the function name and any additional _POST data required by the function.

 

FOR EXAMPLE:

 

AJAX Call:

$.post('functions/functions.php',{func:'myFunctionName',ops:'whatever',a:'whatever',b:'whatever'},function(data){
         DO WHATEVER I WANT WITH THE RETURN DATA HERE
},"json");

PHP (functions.php)

require 'dbcon.php';
include 'main_class.php';
include 'f_customerdetail.php';
include 'f_listoptions.php';
include 'f_route.php';
include 'f_useractions.php';

if (isset($_POST['func'])){ 
   $userfunc = $_POST['func']; 
   $funcops = $_POST['ops'];

   if ($funcops != ''){ 
      $userfunc($funcops); 
   }else{ 
      $userfunc(); 
   } 
}

 

The functions.php file includes all of my other php files containing all of the functions. Each of the other files (f_customerdata.php, f_route.php, f_payroll.php, etc.) contains a number of functions which handle that specific genre of the site; this was more of an organization method I used to keep track of things.

 

Now that you have a little background, I want to know if that is a toxic way to do things? If I currently have 100 people using the site and anytime they navigate it requests data from the functions.php file then that means there are going to be a ton of requests pointing to that single file, thus causing apache to crash; correct?

 

There are multiple functions which use cURL to scrape data from another website as well. Therefore, a connection to the functions.php file may last in upwards of 5 minutes depending upon the function.

 

A large issue as that all of the content on the site is completely dynamic; it is completely driven by getting data from the database and displaying it. Am I going about this correctly by having a single file handling all of the functions? Or, do I need to re-approach it by pointing the AJAX requests directly to the file containing the functions for that particular situation?

 

I know this is a large question. I am completely self-taught, 4 years experience, and have developed a massive project over the last 6 months. I just want to be sure I am going about this the correct way.

 

Thank you for your input,

 

Josh

Edited by joallen
Link to comment
Share on other sites

  • Solution

Now that you have a little background, I want to know if that is a toxic way to do things?

Toxic? No. Unorthodox, sure, but not necessarily bad.

 

If I currently have 100 people using the site and anytime they navigate it requests data from the functions.php file then that means there are going to be a ton of requests pointing to that single file, thus causing apache to crash; correct?

No. Apache doesn't care what it's executing.

 

There are multiple functions which use cURL to scrape data from another website as well. Therefore, a connection to the functions.php file may last in upwards of 5 minutes depending upon the function.

That is the problem. If you get one hit per second triggering that behavior then after five minutes, when the very first connection is finally closing (and another is starting) there will be 300 open connections.

 

 

You could up your connection limit as a short-term fix but you need to deal with the "upwards of 5 minutes" problem. What is the nature of this scraping? Is it something you can do separate of user interaction, like on the server in the background? Can you cache anything to spare you from scraping so much?

Link to comment
Share on other sites

Unfortunately there isn't much I can do to avoid the scraping as the information I am requesting from the other site is also dynamic. Its a little difficult to explain, but what the scraping actually does is it allows me to serve the html from the other site but handle all of the functions on my site.

 

For instance, a user opens their route on my site and clicks the link to refresh their route. What it actually does is processes a cURL request to the site which actually gives us the work and returns the html. I parse the html and swap out any hrefs with my own javascript:function and then serve the page to the user. So now the user will see the content of the other site, but my site can "listen" in to what they are actually doing as they navigate. The user thinks the links are normal links, but what actually happens is a function is called which sends a cURL request for the html.

 

Although that sounds like some fishy business  :happy-04: , it actually isn't and the users of my site are well aware of what is happening. It basically enables them to have access to not only my system, but it integrates the other system so our data can match perfectly.

 

Yes, there are cURL requests which can take up to 5 minutes, but those are the requests which are handled when the admin users pull all of their work (generally in the morning). While the employees are working throughout the day the cURL requests are generally anywhere from 5-10 seconds, but I could imagine there may be hundreds of cURL requests happening simultaneously, all processed through the functions.php file.

 

You mentioned creating a background process on the server to handle the long requests, can you elaborate on that a little more. I thought about this in the past and didn't really have any idea where to start. Can a background process be started on user interaction? Is there a way I can advise the user the process is complete?

 

Is there a better method that you can think of to accomplish what I am trying to do here? It is working perfectly, but as the number of users increases, so will the number of calls to the functions.php file. Basically, as the users navigate the other site I am serving through my own, each link they click results in a cURL request.

 

I really appreciate your help!

Link to comment
Share on other sites

Im marking this as solved since requinix did answer the main question.

 

For those who may be wondering what my take away is from this, I will sum it up:

 

I cannot necessarily avoid these cURL requests from occurring, nor do I have the ability to speed them up, so my only option at this point was to ensure the max_connections for Apache was increased on the server (I had my host provider handle this); I also had them switch the PHP handler from SuPHP to DSO per their suggestion. I am not sure what the benefit in my situation would be.

 

The key takeaway from requinix is that Apache does not care which file on the server has the connections, just that the max_connections is not reached.

 

Thanks for your help requinix!

 

Josh

Link to comment
Share on other sites

You mentioned creating a background process on the server to handle the long requests, can you elaborate on that a little more. I thought about this in the past and didn't really have any idea where to start. Can a background process be started on user interaction? Is there a way I can advise the user the process is complete?

Keeping in mind that this is more suited to situations where there is no particular urgency to the request (besides "as soon as possible" of course), or where the user isn't on the edge of their seat waiting for it to complete,

 

The best way to manage this is to have a script running in the background (like, really running on the server) which looks to the database for tasks that it should run. For example, if you needed something that would save a copy of a webpage, in the database you would have a table that lists out the URLs to hit as well as whether the job has been run yet. The script grabs a URL, marks the task as in progress, does the work, and finally marks the task as complete. Giving it more work to do is a matter of inserting another URL into the database.

 

There's a bit more technical information that is important but that's the brief description.

 

To let the user know, somewhere, such as the session, you hold on to the "task" you created for them, and every once in a while you check to see the progress of the task. How often "every once in a while" is depends on the nature of the task of course; every 30-60s for the thing about admins' work sounds reasonable. (I'd leave the 5-10s stuff where it is - no need to background something that only takes a few seconds.)

 

 

Is there a better method that you can think of to accomplish what I am trying to do here? It is working perfectly, but as the number of users increases, so will the number of calls to the functions.php file. Basically, as the users navigate the other site I am serving through my own, each link they click results in a cURL request.

- You wouldn't happen to have control over these other sites, would you?

- Caching is a common solution if you're potentially grabbing the same content repeatedly

- Background the work if it takes too long and/or isn't time-sensitive

Link to comment
Share on other sites

You wouldn't happen to have control over these other sites, would you?

I have no control over the other site unfortunately. I wish they would create an API for us, but that is a shot in the dark  :tease-01: . The closest I can get is to use cURL; therefore I have no control over the speed of the requests. The speed of the requests vary dramatically as well. It may take 2 seconds to pull a single order and then you click the same order and it takes 1 minute to pull; I take it this is completely out of my control; right?

 

Caching is a common solution if you're potentially grabbing the same content repeatedly

 

It isn't so much the visual content I am collecting, it is the data. Actually the data is just displayed in table with bunch of links. Since it is different for each customer account, caching wouldn't help me here; but that is a great suggestion for other implementations of cURL.

 

Thank you for explaining the background process. I am sure I can set up a cronjob in CPanel and use your technique with the database to process the large requests. When a user clicks the button to request the work it will simply add the "job" to the database where a script will be running every 30 seconds in search for new requests. Will this be able to handle multiple requests? Lets say 4 different managers were pulling their work in the morning and all 4 requests are in the data table.

 

Thank you for your help!

Link to comment
Share on other sites

Thank you for explaining the background process. I am sure I can set up a cronjob in CPanel and use your technique with the database to process the large requests. When a user clicks the button to request the work it will simply add the "job" to the database where a script will be running every 30 seconds in search for new requests. Will this be able to handle multiple requests?

A more fluid option than a periodic cron job would be to use something like Gearman to perform the work. You would setup a server script which accepts job requests and does your scrapping and stores the result. You'd then just need to retrieve that result for display on your page. You could either pass the result around in the database or use something like Memcached to store it temporarily. Here is an example of using memcache and gearman to run tasks and get their results.

 

When a user submits a request that would require scraping, you would simply submit the scrapping request to the gearman job server then you can use an ajax poll to check for the results. If you want to get fancy you could set it up so that PHP will wait a few seconds for a result so that fast scraping jobs would be processed in one shot, but if the job is taking a long time (ie after 10 sec or so) go ahead and output a waiting page to the user with an ajax poll for the status.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.