As web applications go, the one I am developing is not lightning fast. It's not bad, and we have improved it, but you wouldn't want to scale it to tens of thousands of users as it stands. A small server is capable of handling maybe ten requests a second or so, which is fine for the kinds of client we have. However, we have hit a significant stumbling block at one particular client, who is getting much worse performance, and for no obvious reason.
First of all, things are a lot better than they were. When I joined the team, the "application" consisted entirely of Perl 5.005 style code, with no object-orientation, no web framework. It was basically a big, bad, and bloated CGI script. It was typically deployed as an ActiveState compiled executable, so Perl was being started for each request. Believe me, if you think it is bad now, you should have seen it then.
Now we use Catalyst. The old CGI parts are wrapped into a controller that handles some parts of the interface, buying us time while we migrate over to a proper framework. The old CGI layer wasn't completely bad - it had factored data access into a separate module. Unfortunately, it uses global variables everywhere, so there is very little opportunity to do caching or other performance enhancements. I'll come back to this later.
Catalyst is on the heavy side, but it is architecturally very nice. It gives us all the tools we need to contain the problem. With DBIx::Class we get an ORM which allows us to make the new code simpler and clearer. We use Template::Toolkit for most of the front end, which is very much nicer than generating HTML through the old CGI function calls. That code, the old formatting code for CGI, is truly awful. You have no idea how bad it actually is. Everything I was ever taught about how to write code is done wrongly there.
Anyway, back to the performance. We use the FastCGI engine, as most clients use Windows and IIS. Because we now have persistent worker processes, each worker process can typically deliver around 10 requests a second on a good-ish server, so with (say) 10 worker processes, you can easily handle a decent-sized department doing intensive stuff. Since the application is for browsing, searching, and analysing source code, this is pretty good.
So all was going well until we encountered one client. They wanted to run everything virtualized, so being cautious, we tested everything under VMware Server, and we lost a little performance, say 15-20%, but not a showstopper. We were not prepared for our application running consistently 3-4 times more slowly on their server compared to our VMware version.
The problem is a resilient one. We started by using basic web tests, which my colleagues like (I don't). They give a reasonable statistical picture, but hide the pattern, as they are all run through IIS, which makes profiling more or less impossible. We know a few things were a problem: McAfee was bad -- it was checking everything, even the temporary files created by the database we use. But McAfee is a giveaway, if you see "McShield" showing any CPU usage, you know it will be affecting performance. A few related tunings like this boosted performance by say, 25-30%. Useful, but nowhere near enough.
We also checked the system: they're running VMware ESX 3.5 with AMD chips, so we thought: AMD 64-bit systems can be an issue with ESX, but even using a 32-bit system makes no difference. Nor does adding memory, or processing cores.
So what about the SAN? Well, MySQL has a query cache, and even when the caches are full, the system is slow. Disk is not being waited on, at least not as far as we can tell, so the SAN doesn't seem to be an issue.
Networks? Nope, not them too. MySQL runs a little faster with a shared memory connection, maybe another 20-25%. Named pipes are also quicker than TCP/IP, but we're still slower. And, of course, our VMware Server baseline is also benefiting from all these improvements.
Perl's amazing Devel::NYTProf showed more detail. The big hit was in the database. DB queries were taking 3-4 times as long on their server compared to ours, oddly enough, even when the query cache is serving all requests and the disk isn't being hit. Now the MySQL query cache is trivial: if the string of the query matches, return the result. How can this be 3-4 times slower??
At this stage, if our application's old CGI data layer was better structured, and at least used parameters rather than globals, we could work around the problem with Memoize or local Perl caching. Even the query cache uses a round-trip to the DB server. Of course, we should probably do this anyway, but that is hardly the point.
I'm usually pretty good at debugging, and I have rarely been completely at a loss. This time, we do not have access to the ESX system, so we are about at the end of what we can do. There are reports that lack of memory affinity can be a problem, especially for in-memory databases (which is what the query cached MySQL is, in part). Both MySQL and our systems (being Perl) are memory intensive.
Having said that, most reports on the Internet are pretty dumb in their analysis, even frustratingly so. They usually say it is a disk latency problem - well, we know it isn't, as if I write a stored procedure and a Perl script doing the same set of queries, one runs 10 times faster than the other. The queries are the same, so it must be the communication between Perl and the database. And, of course, the difference is much smaller, maybe a factor of 4 or 5, on our VMware Server benchmark. Other explanations imply that two cores are bad because CPU scheduling is so hard -- sorry, I don't buy it, it is not that hard that it takes this much of your server.
I am sure we haven't seen the end of this problem yet, but in the process we have probably speeded our system by a factor of two, and maybe prioritizing architectural changes to add caching to Perl will double it again. It's all a worthwhile if rather frustrating exercise.