Slow LAMP application

littlebigman · January 7, 2008

Hello

A friend of mine is running a LAMP application that's on its knees, and would like to know what to do about it: Either make changes to the PHP or MySQL code, or order a faster server.

Here's what top says:

top - 21:20:49 up  3:35,  1 user,  load average: 61.88, 51.08, 74.30

Tasks: 476 total,   7 running, 467 sleeping,   0 stopped,   2 zombie
Cpu(s): 78.8% us, 18.7% sy,  0.0% ni,  0.7% id,  0.3% wa,  0.5% hi,  1.0% si
Mem:   1015484k total,   993344k used,    22140k free,    76920k buffers 
Swap:   514040k total,   101032k used,   413008k free,   208496k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
10685 ze-card 19 19 984 736 724 R N 93.3 0.0 5671m webalizer
32072 nobody 9 0 16932 15M 12156 S 6.0 0.3 0:01 httpd
32196 nobody 9 0 16224 14M 12144 S 3.5 0.3 0:01 httpd
1868 nobody 9 0 16284 14M 12620 S 3.1 0.3 0:00 httpd
2136 nobody 9 0 16080 14M 12164 S 2.5 0.3 0:00 httpd
32205 nobody 9 0 16300 14M 12136 S 2.3 0.3 0:00 httpd
32231 nobody 9 0 16316 14M 12172 S 2.3 0.3 0:00 httpd
32124 nobody 9 0 16620 14M 12184 S 1.9 0.3 0:01 httpd

Here's what vmstat says:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
9  0      0  55588  87548 329824    0    0   166   292  369   896 69 17  5  9 
8  0      0  49116  87572 330204    0    0   102   192  681  1686 71 25  4  0
5  0      0  45632  87584 330240    0    0    16  1298  765  1910 77 21  1  0
28  0      0  43072  87592 330296    0    0     8    86  737  2098 80 19  1  0 
49  0      0  40676  87600 330384    0    0    32  1302  808  2139 76 22  3  1
52  0      0  40320  87628 330504    0    0    56  1078  855  2163 80 19  1  0
51  3      0  39904  87636 330560    0    0    32   244  822  1795 80 21  0  0 
12  1      0  37816  87668 331280    0    0   342  1424 1121  2606 75 22  0  2
36  0      0  36708  87704 331648    0    0   150   186  798  2353 72 22  3  2
36  0      0  35660  87712 331784    0    0    38    24  896  1789 80 19  0  0

Can you tell what's wrong?

Thank you for any tip.

stuffradio · January 7, 2008

One thing that would help is to disable any unused servers/services. This will also increase security of the server!

btherl · January 7, 2008

Dual cpu system?

The first thing I notice is that webalizer is running on one of the cpus. You may be able to move that to another server.

The second thing I notice is that the system is not swapping, but it does have significant I/O and system time. Optimizing Mysql queries may help to reduce those by reducing the need to access disk.

As for what to do to reduce the load, the first step is finding out where the time is being spent. The best way to do that is profiling. How to do that depends on the application.

If you have a php coder there, you can add some wrappers to time the mysql queries and find any slow ones. You can also add some timing code around major parts of the script, to see which parts take the longest. Once you've found those, you can look closely to see what's going wrong. Often it can be as simple as adding an index to a table that's grown very large.

littlebigman · January 7, 2008

One thing that would help is to disable any unused servers/services.

I'll see what we can do. Forgot to say this site is running on a virtual shared server, which might explain the 476 running processes (not sure if this is the absolut number of processes running on the host, or just those running within this virtual host).

Dual cpu system?

No, single CPU.

The first thing I notice is that webalizer is running on one of the cpus. You may be able to move that to another server.

It's not possible. We only have one virtual server, but I'll see if we can kill and disable webalizer for a while, and see if perfs improve.

The second thing I notice is that the system is not swapping, but it does have significant I/O and system time. Optimizing Mysql queries may help to reduce those by reducing the need to access disk.

OK. My friend tells me that he optimized the MySQL part, and that perfs are OK in that area, but I'll double-check to make sure he has proof of this. I heard of running "explain" and "show slow logs" to check this.

If you have a php coder there, you can add some wrappers to time the mysql queries and find any slow ones. You can also add some timing code around major parts of the script, to see which parts take the longest.

He's already done this with Microcode(), so I guess that's how he told me that MySQL isn't a problem.

How can I check if the problem is not I/O-bound, either the hard-disk or the network?

Thanks.

btherl · January 7, 2008

There's something really odd with that top output. At the top, it says 0% nice time. But below it says webalizer, which is niced to 19, is using 93% cpu. And the cpu% add up to more than 100% in total.

If a niced process is using 93% cpu, then it should say 93% nice at the top.

That kind of weirdness can happen with virtual servers, and makes it difficult to see what's really happening

littlebigman · January 7, 2008

Yup, it's kinda weird. I'll tell him to ask the support guy about this.

And I'll tell him to install the sysstat package, and run sar, mpstat, and iostat, since it looks like it's the hard-disk (possibly due to MySQL) that keeps so many processes in Sleep mode.

This is interesting Thanks for your help guys.

littlebigman · January 7, 2008

More info:

# cat /proc/stat
cpu  4648775 31 1100114 4544900 967595 14518 47729 0
cpu0 2413956 25 576313 2111222 501308 14518 44492 0
cpu1 2234818 5 523800 2433678 466287 0 3236 0
intr 29410811 252 2 0 0 0 0 3 0 0 0 0 0 0 0 2496073 0 26914481 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
ctxt 90025117
btime 1199679073
processes 1107769
procs_running 1
procs_blocked 70

Is 14518 a lot for I/Owait? What about 70 for procs_blocked?

btherl · January 8, 2008

Here's some figures from an I/O bound (but still performing adequately) database server, for comparison.

cpu  329108141 1009 30290050 2692689569 956383451 2347601 4304656
cpu0 116845067 763 15330751 64612385 800524411 2347601 4120176
cpu1 9783738 74 2960630 975569679 15455266 0 11718
cpu2 170294175 154 7208641 701713338 124409031 0 155766
cpu3 32185160 16 4790026 950794166 15994742 0 16995
intr 6486707408 1451184649 33 0 0 0 0 0 0 0 0 3 0 902 0 14 0 3051575304 0 0 0 0 0 0 0 0 0 0 0 1983946503 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 19666922524
btime 1189710960
processes 4272352
procs_running 1
procs_blocked 14

14 blocked processes pretty much matches the number of queries running, each one I/O bound. This is a server for importing of data, so being I/O bound is the normal mode of operation.

Tasks: 113 total,   1 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.0% us,  0.5% sy,  0.0% ni, 71.5% id, 21.9% wa,  0.0% hi,  0.1% si
Mem:   8226460k total,  8213332k used,    13128k free,     1832k buffers
Swap:   979956k total,     3816k used,   976140k free,  7735852k cached

This is what you would expect on an I/O bound system with little need for cpu (combined webserver + database is different, because the php code requires more cpu than most SQL queries). The reason the wait percentage is so low is because the other processors are idle. Pressing "1" shows the breakdown:

Tasks: 115 total,   1 running, 114 sleeping,   0 stopped,   0 zombie
Cpu0 : 27.0% us,  4.0% sy,  0.0% ni,  0.0% id, 69.0% wa,  0.0% hi,  0.0% si
Cpu1 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2 :  0.3% us,  0.0% sy,  0.0% ni, 38.4% id, 61.3% wa,  0.0% hi,  0.0% si
Cpu3 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   8226460k total,  8159128k used,    67332k free,     1824k buffers

Here you can see that some cpus are totally idle, which is why the average wait percentage shows up as only 20%. The active cpus are doing more waiting than anything else.

Load is around 6-8 on this machine. Once load reaches 11 or 12, queries begin to slow down. Load is a very crude measure though, and the threshold for performance problems depends very much on what the machine is doing.

Btw, your /proc/stat output shows 2 cpus, which would explain why the percentages add up to more than 100%. To see "top" output for all cpus individually, press "1" while it's running.

DyslexicDog · January 8, 2008

load average: 61.88, 51.08, 74.30

That's insane, basically that statement says you would need between 62 and 75 of the same speed processors to handle your load. You need to figure out why webalizer is causing such a headache. I've had a similar problem with it in the past, turns out when it does reverse dns queries it kills the machine it's on. If you have a lot of traffic, you're site could be like this for Hours/ days.

btherl · January 8, 2008

High load isn't always bad. And it doesn't scale simply onto more processors like that. Moving from 2 cpus to 4 cpus could easily drop a load of 60 down to something more sane like 5, if it's cpu bound.

Let's say each cpu can handle 30 processes, and 120 want to run at any one time. With 2 cpus, they will be maxed out with a load of at least 60, because 60 processes are waiting to run. But with 4 cpus, the load will drop much lower, maybe even near zero, with all 4 cpus maxed out (assuming things scale linearly, which is never the case )

I'm also interested to see what happens if webalizer is stopped or paused.

neylitalo · January 8, 2008

they will be maxed out with a load of at least 60, because 60 processes are waiting to run.

Load isn't associated with the number of processes, at least not quite like that. A load average of 60 means that if the system had been 60 times as fast, it would have been able to handle all of the processes without making any of them wait. I include this description from Wikipedia:

For example, a load average of "3.73 7.98 0.50" on a single-CPU system can be interpreted as:

during the last minute, the CPU was overloaded by 273% (1 CPU with 3.73 runnable processes, so that 2.73 processes were waiting for their turn)

the CPU was only busy half of the time over the last fifteen minutes

This means that this CPU could have handled all of the work scheduled for the last minute if it were 3.73 times as fast, or if there were 4 (3.73 rounded up) times as many CPUs, but that over the last fifteen minutes it was twice as fast as necessary to prevent runnable processes from waiting their turn.

btherl · January 8, 2008

I realize my example is over-simplified, but so is the explanation given in Wikipedia. If you look at a simple example of a cpu capable of handling 1 processes per minute but being given 2 per minute, it'll go like this:

1 process runs, 2 added, 1 waiting

1 process runs, 2 added, 2 waiting

1 process runs, 2 added, 3 waiting

and so on .. the run queue will be increasing constantly. Yet a single additional cpu will enable the machine to process the entire run queue. In terms of a snapshot at a single moment (say, when 10 jobs are in the queue) then yes, you would need 10 additional processors to clear the queue within 1 minute. But that doesn't mean you need 10 additional processors to run the same processes with a constant load of 1.

steviewdr · January 8, 2008

Who is ze-card? Check his crontab and disable webalizer (chmod apache logs 600 etc) for a day or so.

Also look towards installing munin to provide monitoring graphs of your servers usage.

http://wiki.kartbuilding.net/index.php/Munin_Statistics

-steve

Sign In

Slow LAMP application

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information