My first ever real job was to develop code which handled dates and times. today, more than 25 years later -- and that scares even me - I've been back to handling dates and times again.
My experience is that dates and times are disproportionately risky for code. I have found more bugs in date handling code, even my own, than I care to admit. A colleague found a classic example of code that worked on Wednesdays (See: http://web.media.mit.edu/~lieber/Lieberary/Softviz/CACM-Debugging/Hairie...) and when we worked on Meet-O-Matic, we found at one stage that our code worked all months of the year except January. Obviously good testing is essential -- the point of these cases is that good testing can be very hard, as this code tends to be sensitive to environmental factors. For example, if you write code on a Wednesday, and it works on a Wednesday, you won't notice it's broken until it suddenly fails to behave on Thursday or Friday. Since no code has changed, this can be somewhat inexplicable.
Today I found another example. Some code that had worked for years suddenly started to hang. It was written by a former employee, probably about five years ago, and landed on my inbox. I had upgraded a whole bunch of CPAN modules, so it did seem likely that one of them was to blame, but there was nothing obvious.
Fortunately, it took very little time to find the culprit - some code iterated through days, using Date::Manip to add a day, until it matched the last day of a range. This code now started hanging on a day, specifically the 4th April. Yes, you may have guessed, it was the daylight savings transition. Adding a day at the DST transition only adds 23 hours, so you end up on the same day. If you then remove the time (which the code did) you end up on the same day, and there's the loop condition.
I don't really like Date::Manip. I use it for parsing crontabs, but for most purposes I use DateTime, which is slightly clearer on the object-orientedness. DateTime handles durations differently. A day is a day -- and does not need to be 24 hours long. If you add a day to midnight on the 4th April 2010, you get midnight on the 5th April 2010. This is different from Date::Manip, which gives you 11pm on the 4th April. (Date::Manip has "business day" logic which is similar, but also skips weekends, and we didn't want that.)
I like the intuitiveness of DateTime's handling of durations with subtlety. Adding a day (and adding a month, or a year) do not correspond to fixed multiples, but they are (usually) intuitive. What DateTime has done is bundle up all that highly complex intuitive logic, with all the nasty complexity of timezones and transitions, into a clear module which works well.
We still need to port the rest of our module to use DateTime, but using it for the straight math of adding a day broke out of the loop just fine. However, I'm feeling a Test::DateTime module would be helpful. One which would hack into the time handling in a way which would allow testing with different time environments, including a range of timezones, a range of years, and all months of the year/days of the week, and the usual start-and-ends-of-months. This would have caught this bug much sooner, as one of the reasons we'd missed it was that in testing the DST had corresponded with a weekend, and we were skipping weekends internally. Just trying a few different years we'd have caught the problem sooner.
And for the record, I'm not the only one that noticed this. It's covered at Date::Manip::Problems. I stand by the view that adding a day is not always the same as 24 hours. Personally, I'd make "+1d" and "+24h" behave differently, as conceptually, they are different.
I've just about calmed down enough to blog about MSSQL.
Today, I had a call from a client who was getting error messages from our system, relating to different collations being applied in a join comparison. I'm used to this kind of problem, so I thought this would be easy. It's not.
Our system does both case-sensitive and case-insensitive comparisons at various stages, so we need to be a little cunning. This is because we handle both case-sensitive languages (like C and Java) and case-insensitive languages (like VB and COBOL). We use collations and the occasional cast to make this work. It's an issue even in MySQL, our default DBMS, but we handle it OK.
Our client has created a database on one server, backed it up, and moved onto a second server. The default collation is attached to a database, so that was brought over correctly. We'd set the database default collation correctly. The surprise was that MSSQL uses (by default) a different default collation for temporary tables, defined by the server. If you want to use the database collation, you need to add "COLLATE database_default" to every text column in a temporary table definition.
I was on a roll at this point, happily making all these changes, and testing them. All looked good but the load crashed immediately after this, in a new way, suddenly SQL identifiers started to become case-sensitive. WTF? I was thinking. After a weak attempt to just live with it, I came across a case where the data was derived from Excel spreadsheets, and the case-sensitive identifiers were not going to work. So, I needed to find the problem.
Turns out MSSQL switches to case-sensitive SQL when the database has a default collation that is case-sensitive. Yes, if you need case-sensitive data matching (or want to avoid specifying it for every single column) you have to live with case-sensitive SQL. Here is what the documentation said:
In earlier versions of SQL Server, system object and system type names are matched against the collation of the master database. In SQL Server 2005, system object names and system type names are automatically cast to correspond to the collation of the current database. If references to these objects in your script or applications do not match how they appear in the catalog and the current database has a case-sensitive collation, the script or application may fail. For example, the statement EXEC SP_heLP will fail if the current database has a case-sensitive collation.
So, why does MSSQL use the server collation for temporary tables, and the database collation for its syntax? What were the "designers" thinking?? I would probably not have noticed this if I hadn't had to deal with the whole backup-and-move thing, but this is unbelievable!
My solution is simple: I'm adopting the convention of making all SQL identifiers lowercase. I do this in my code anyway, but we have a lot of legacy code that is inconsistent in its use of case, and now I need to go and change virtually every single SQL statement in that legacy code. All because of that truly stupid design decision made by MSSQL.
I am not especially anti complementary medicine. I'm skeptical. Come up with the evidence, and I'll buy it. And I am happy for people to come up with the evidence.
If you doubt me, you can read a book chapter I jointly wrote (in http://books.google.ca/books?hl=en&lr=&id=un9KoRWYTxIC&oi=fnd&pg=PR8&ots...), where we discussed homeopathy, and in particular the way that Jacques Benveniste was treated by the editors of Nature. Irrespective of the actual science, his work was described as "delusion". Using the language of mental health concerns reduced the scientific critique to a personal one, and was (I still believe) inappropriate - especially from the editors of a respectable scientific journal. Benveniste might have been wrong, but still deserved to be treated with respect, and to have his findings discussed scientifically.
The core our analysis was simple: people assess theories in ways that fit their existing beliefs. The editors of Nature specialized in the physical sciences, so homeopathy's basis in extreme dilution crashed against those core beliefs.
The same effect works the other way around. If someone's core beliefs are that pharmaceutical companies are money-grubbing bastards (and in some cases this is not unreasonable) this can be used to distort the beneficial effect that some of their products actually have. Some products of the pharmaceutical companies do save lives.
Today, a very strong-willed guy survived a pretty nasty attack from a health group, because he criticized people for making medical recommendations to others, without being a medical practitioner, of a treatment that is essentially bleach. I'm filled with admiration for the way he handled himself, and as to the story - @rhysmorgan tells it better than I ever could. The story has become known as 'Bleachgate'. For the video version, see http://www.twitvid.com/Z7TOH, or read it at: http://thewelshboyo.wordpress.com/2010/08/10/bleachgate/.
Despite my open-mindedness for complementary medicine, there is a line which should not be crossed - that of reckless endangerment, and Bleachgate crossed that line. Big time.
Examples of the kind of endangerment I mean include:
People die because of these claims.
I will be honest, the Bleachgate treatment, which is essentially related to hypochlorite, has some plausibility. Hypochlorite is used by the immune system - although in a targeted way. The original tests for this began as a "cure for balaria" (http://miraclemineral.org/aboutauthor.php). There was even a trial, of sorts, although it is pretty clear it was no very well controlled.
Here are a few excerpts from the description available at http://miraclemineral.org/part1.php.
The assistant medical technician … arranged for us to do the clinical trials. We slipped him a few dollars at several different times and he was quite cooperative. He was actually quite cooperative even before we slipped him a few dollars, but he was such a nice man that we thought it would be nice to help him out a bit. (p92)
Bribery is not really considered good practice when running a trial. They also paid the salary of the technician conducting the blood tests. These two gathered the data, so t the very least there was a conflict of interest. The clinical trial was also run in a prison. These are not conditions which would normally be conducive to solid, impartial evidence. For example, if I went in offering plain water - with a little bad flavouring to make it seem like medicine of a kind - and if I paid everyone's salary, would I get an effect? There is a reason why it is sometimes a idea to run trials in a double-blinded way.
It is my belief that the reason the Pharmaceutical Medicines and Poisons Board so readily accepted our MMS as a mineral supplement, rather than a drug, was that so many officials drank it without hesitation when we told them that it was not a drug. (p93)
Well, this is just a lie. It's a chemical. It has an effect. Of course it's a drug! (I know what you meant: they would not have accepted a treatment offered by a large pharmaceutical company. However, it is still a drug!)
You may not believe it, but for years the U.S. FDA has been suppressing all real cancer cures, as well as information concerning how vitamins prevent heart attacks, and all other information regarding products that may in any way reduce the income of the large pharmaceutical companies (Big PHARMA). Please don’t take my word for it; become informed. Read the information available on the Internet. Just go to any search engine and search on "FDA Suppression." (p117)
Now we head into the lands of conspiracy theories. Unless it's true, of course. Why the US FDA would support large pharmaceutical companies from outside the US (as many are) is unclear. But to be honest - all the information seems to be badly suppressed. I can buy books on any of them from thousands of people around the Internet, often top-ranked on search engines. Also, all the theories that are being suppressed are very different. Sometimes it is vitamins, sometimes a "miracle" cure, sometimes it is simply prescription information available elsewhere in the world. What, really, is being suppressed here?
Given a choice between believing that all large pharmaceutical companies are in cahoots with large governments to suppress information about viable treatments, or believing that those treatments don't currently exist, I want to believe they exist.
However, I personally feel that ingesting bleach is generally a Bad Thing. I would do it if a doctor advised me, and there was evidence that it would help me. I probably wouldn't enjoy it, but I'd do it.
What is simply unacceptable is lambasting somebody for questioning the evidence, and for resisting aggressive postings to at last ensure that important communications are visible to everyone, so they can make an informed decision. Anyone who claims that drinking bleach can cure these serious diseases had better have some damn good evidence, and if not, they'd better have jail time for serious fraud leading to people dying. The evidence available now looks like an unethical and mismanaged trial.
As to the group, their handling of this case is a case of sour grapes. In the original meaning, from Aesop's Fables:
Driven by hunger, a fox tried to reach some grapes hanging high on the vine but was unable to, although she leaped with all her strength. As she went away, the fox remarked, 'Oh, you aren't even ripe yet! I don't need any sour grapes.' People who speak disparagingly of things that they cannot attain would do well to apply this story to themselves.
(from http://en.wikipedia.org/wiki/The_Fox_and_the_Grapes).
We want the cure, but we can't have it. It seems the criticism of large pharmaceutical companies is more due to the fact that the "cure" isn't within reach. Claims of suppression by big pharma and the US FDA are simply souring the grapes.
One of the most annoying barriers to Perl portability is fork(). Let's face it, fork() is UNIX through and through. Windows doesn't do fork, you have to use some awful Win32::Process::CreateProcess incantation, and pass it the name of the executable you want.
So after last night's excellent Toronto Perl Mongers meet, we talk a bit about Plack. Plack is cool, Plack would do just what I needed if I could install the darned thing. Unfortunately, because of fork(), I can't.
I don't use ActiveState's Perl, or Strawberry Perl. I made my own, using MinGW, and I turned off threading, as I want it to run a bit faster. It does run faster, but it doesn't have fork() emulation, so all those modules that assume fork for testing (and it usually is testing) typically fail. Not many fall victim to this (WWW::Mechanize and its friends are most of them). Unfortunately, Plack is another one.
Part of me feels that a Perl API could actually allow something more like CreateProcess, which is pretty easy to emulate using fork() and exec(). The reverse is not true. Maybe I should just contribute a module which could replace Test::TCP which didn't need to fork to do its tests, maybe just passing a script file to a newly created child process.
In the meantime, please, if you are writing a module, don't assume everyone will have fork().
One of things that really frustrates me about Internet communications is getting the level of communication right. I'm a member of a number of mailing lists, and occasionally post messages elsewhere. In these discussions, one of the following invariably seems to happen.
The first one is frustrating, but I can live with it. What I do find hard to deal with is the second. Obviously, this happens much less in real life as people can see you, so they can estimate your age, and time extent even your level of experience. As it is, I hold a PhD, and until fairly recently, taught Masters and PhD students in computing.
Now to some extent there is a legitimate point. Now I've worked in industry I realize that much of what is common practice in industry is not commonly known to a generation of academics. Patterns are a good example. Many of my former colleagues knew little about them, but us artificial intelligence people used patterns extensively; see Tansley and Hayball's book, for example, which dated back to 1993. I was using object-oriented programming with multiple inheritance in 1984. So yes, I have been doing true OOP (with "Flavors" on a Symbolics Lisp machine if you want to know) since before many people on internet mailing lists were born. I have built several implementations myself, and I do not need to have inheritance/OOP explained to me.
Harry Collins discusses these issues in his work on the sociology of science and knowledge - he stresses the importance of the 'repair work' in social interaction, the tacit knowledge and bridging inferences people use to understand one another. The brevity of many mailing-list type conversations limits the amount of repair, and the lack of a shared context (in some cases) makes it even harder to make the required repairs. These can easily result in the kinds of misunderstanding hinted at above. One example that cropped up recently was a posting on the use of shared memory in Perl threads running web server worker processes. A fair number responses went into explaining the basics of worker processes (which was not the question) and missed the subtlety of shared memory in Perl threads (which is both subtle and slightly strange, even for Perl).
My plea? When you are discussing something with someone, and you feel yourself making assumptions about them, maybe try to stop and think about those assumptions. You could always ask them for clarification. If you are in about the other person's level of knowledge, just ask.
Let's be honest, Perl's threading model is not that good. In fact, it is for me the weakest area of the language. Generally, one of the main reasons I build and use my own Perl on Windows is so that I can remove it. And I remove it on UNIX too, but there I don't even miss it because at least I still have fork().
The issue cropped up recently on the Catalyst mailing list. Catalyst is often used with Apache, and can prefork a number of worker processes, using the threading system. In theory this is great: the code is parsed and loaded once, and then the worker processes are forked from it. The biggest drawback of not using threads on Windows is that each worker process needs to start from a clean interpreter, and this can take a good few seconds to get the application started. Doing this ten times over (if you have ten worker processes) is, to put it bluntly, crazy.
The fact that Windows benefits from fork() emulation is one thing: the issue is, should it be the same thing as threads. Threads are normally light, yet in Perl, using them essentially clones the entire interpreter state, not necessarily using the elegant copy-on-write semantics of a modern UNIX fork(). Threads are normally a good way of sharing stuff between processes, and even that is somewhat clunky in Perl.
Perl's use of 'multiplicity' is a great base - it allows multiple interpreter contexts, which really is the important bit. It would be nice to have that and a kind of object-based threading system that dispenses with the whole fork() emulation style threading crap. It would also be good for threads to have some OS basis, so that really they tapped into the benefits of whatever OS support you have -- that could be harder to deliver, but worth a try.
Sure, this means a little more programming and a little more care. Possibly even a little less portability. Probably better performance for the most part, but I'm more concerned that this runs counter the design approaches that Perl usually follows.
The thing about Perl is: it really taps into the underlying system -- that's the nature of the language. Threading seems to me a big false step -- it fakes a part of the underlying system to make systems appear more uniform that they actually are. That would be a Lisp- or Java-like approach, not a Perl-like approach.
As web applications go, the one I am developing is not lightning fast. It's not bad, and we have improved it, but you wouldn't want to scale it to tens of thousands of users as it stands. A small server is capable of handling maybe ten requests a second or so, which is fine for the kinds of client we have. However, we have hit a significant stumbling block at one particular client, who is getting much worse performance, and for no obvious reason.
First of all, things are a lot better than they were. When I joined the team, the "application" consisted entirely of Perl 5.005 style code, with no object-orientation, no web framework. It was basically a big, bad, and bloated CGI script. It was typically deployed as an ActiveState compiled executable, so Perl was being started for each request. Believe me, if you think it is bad now, you should have seen it then.
Now we use Catalyst. The old CGI parts are wrapped into a controller that handles some parts of the interface, buying us time while we migrate over to a proper framework. The old CGI layer wasn't completely bad - it had factored data access into a separate module. Unfortunately, it uses global variables everywhere, so there is very little opportunity to do caching or other performance enhancements. I'll come back to this later.
Catalyst is on the heavy side, but it is architecturally very nice. It gives us all the tools we need to contain the problem. With DBIx::Class we get an ORM which allows us to make the new code simpler and clearer. We use Template::Toolkit for most of the front end, which is very much nicer than generating HTML through the old CGI function calls. That code, the old formatting code for CGI, is truly awful. You have no idea how bad it actually is. Everything I was ever taught about how to write code is done wrongly there.
Anyway, back to the performance. We use the FastCGI engine, as most clients use Windows and IIS. Because we now have persistent worker processes, each worker process can typically deliver around 10 requests a second on a good-ish server, so with (say) 10 worker processes, you can easily handle a decent-sized department doing intensive stuff. Since the application is for browsing, searching, and analysing source code, this is pretty good.
So all was going well until we encountered one client. They wanted to run everything virtualized, so being cautious, we tested everything under VMware Server, and we lost a little performance, say 15-20%, but not a showstopper. We were not prepared for our application running consistently 3-4 times more slowly on their server compared to our VMware version.
The problem is a resilient one. We started by using basic web tests, which my colleagues like (I don't). They give a reasonable statistical picture, but hide the pattern, as they are all run through IIS, which makes profiling more or less impossible. We know a few things were a problem: McAfee was bad -- it was checking everything, even the temporary files created by the database we use. But McAfee is a giveaway, if you see "McShield" showing any CPU usage, you know it will be affecting performance. A few related tunings like this boosted performance by say, 25-30%. Useful, but nowhere near enough.
We also checked the system: they're running VMware ESX 3.5 with AMD chips, so we thought: AMD 64-bit systems can be an issue with ESX, but even using a 32-bit system makes no difference. Nor does adding memory, or processing cores.
So what about the SAN? Well, MySQL has a query cache, and even when the caches are full, the system is slow. Disk is not being waited on, at least not as far as we can tell, so the SAN doesn't seem to be an issue.
Networks? Nope, not them too. MySQL runs a little faster with a shared memory connection, maybe another 20-25%. Named pipes are also quicker than TCP/IP, but we're still slower. And, of course, our VMware Server baseline is also benefiting from all these improvements.
Perl's amazing Devel::NYTProf showed more detail. The big hit was in the database. DB queries were taking 3-4 times as long on their server compared to ours, oddly enough, even when the query cache is serving all requests and the disk isn't being hit. Now the MySQL query cache is trivial: if the string of the query matches, return the result. How can this be 3-4 times slower??
At this stage, if our application's old CGI data layer was better structured, and at least used parameters rather than globals, we could work around the problem with Memoize or local Perl caching. Even the query cache uses a round-trip to the DB server. Of course, we should probably do this anyway, but that is hardly the point.
I'm usually pretty good at debugging, and I have rarely been completely at a loss. This time, we do not have access to the ESX system, so we are about at the end of what we can do. There are reports that lack of memory affinity can be a problem, especially for in-memory databases (which is what the query cached MySQL is, in part). Both MySQL and our systems (being Perl) are memory intensive.
Having said that, most reports on the Internet are pretty dumb in their analysis, even frustratingly so. They usually say it is a disk latency problem - well, we know it isn't, as if I write a stored procedure and a Perl script doing the same set of queries, one runs 10 times faster than the other. The queries are the same, so it must be the communication between Perl and the database. And, of course, the difference is much smaller, maybe a factor of 4 or 5, on our VMware Server benchmark. Other explanations imply that two cores are bad because CPU scheduling is so hard -- sorry, I don't buy it, it is not that hard that it takes this much of your server.
I am sure we haven't seen the end of this problem yet, but in the process we have probably speeded our system by a factor of two, and maybe prioritizing architectural changes to add caching to Perl will double it again. It's all a worthwhile if rather frustrating exercise.