I've just about calmed down enough to blog about MSSQL.
Today, I had a call from a client who was getting error messages from our system, relating to different collations being applied in a join comparison. I'm used to this kind of problem, so I thought this would be easy. It's not.
Our system does both case-sensitive and case-insensitive comparisons at various stages, so we need to be a little cunning. This is because we handle both case-sensitive languages (like C and Java) and case-insensitive languages (like VB and COBOL). We use collations and the occasional cast to make this work. It's an issue even in MySQL, our default DBMS, but we handle it OK.
Our client has created a database on one server, backed it up, and moved onto a second server. The default collation is attached to a database, so that was brought over correctly. We'd set the database default collation correctly. The surprise was that MSSQL uses (by default) a different default collation for temporary tables, defined by the server. If you want to use the database collation, you need to add "COLLATE database_default" to every text column in a temporary table definition.
I was on a roll at this point, happily making all these changes, and testing them. All looked good but the load crashed immediately after this, in a new way, suddenly SQL identifiers started to become case-sensitive. WTF? I was thinking. After a weak attempt to just live with it, I came across a case where the data was derived from Excel spreadsheets, and the case-sensitive identifiers were not going to work. So, I needed to find the problem.
Turns out MSSQL switches to case-sensitive SQL when the database has a default collation that is case-sensitive. Yes, if you need case-sensitive data matching (or want to avoid specifying it for every single column) you have to live with case-sensitive SQL. Here is what the documentation said:
In earlier versions of SQL Server, system object and system type names are matched against the collation of the master database. In SQL Server 2005, system object names and system type names are automatically cast to correspond to the collation of the current database. If references to these objects in your script or applications do not match how they appear in the catalog and the current database has a case-sensitive collation, the script or application may fail. For example, the statement EXEC SP_heLP will fail if the current database has a case-sensitive collation.
So, why does MSSQL use the server collation for temporary tables, and the database collation for its syntax? What were the "designers" thinking?? I would probably not have noticed this if I hadn't had to deal with the whole backup-and-move thing, but this is unbelievable!
My solution is simple: I'm adopting the convention of making all SQL identifiers lowercase. I do this in my code anyway, but we have a lot of legacy code that is inconsistent in its use of case, and now I need to go and change virtually every single SQL statement in that legacy code. All because of that truly stupid design decision made by MSSQL.