TPC-H tests

After reading about Nick Haslam’s tests using TPC-H data I thought I’d see how quick the new server I was building was (before it went into production!).

The server is an IBM x3650 M3 with eight disks installed: four small SSDs and four bigger SATA and SAS drives. Each of the disks is a RAID1 mirror of identical pairs.

Results

Type DB Build Data Load Total
Data and log separate SSDs 0:02 02:24 02:26
Data and log on single SAS 0:07 02:28 02:35

 

The initial test was fast but not as fast as Nick Haslam’s load to multiple files on a single C (presumably SATA) drive. I had expected to be faster but of course a bulk data load requires mainly write operations, and SSDs are notoriously slower on write operations. In addition these are running as a RAID1 array so each write requires two write operations, and the controller is set for write-through (ensure the data is written before complete).

As a comparison I reran the load with all the database files loading on the same SAS drive (a 10k rpm Savvio RAID 1 pair). As this is on the same controller it would provide a reasonable comparison of SSD vs SAS write performance.

So not as fast as writing to unmirrored disks but not massively slow either.

Advertisements

Querying XML fields in LINQ to SQL

A while back I wrote about the limitations of LINQ to SQL when querying data in XML columns. It was frustrating but I hadn’t figured out an elegant solution at the time.

Then at then end of 2010 I attended a DDD8a session which re-opened my eyes to SQL CLR functions and how much better these were than SQL Scalar functions.

And today I linked the two problems in my head – the solution is right there! Write some SQLCLR functions which allow you to query an XML column at the server, rather than having to retrieve the full XML value and search it on the client.

Here is the VB.NET code I wrote as a first test to get the first element (or child element) of a given name and return the value.

    'Get value of an Xml column element (including child elements)
    <Microsoft.SqlServer.Server.SqlFunction()> _
    Public Shared Function XmlGetElementValue(ByVal column As SqlXml,
                                              ByVal elementName As SqlString) As SqlString
        'Check for nulls
        If column.IsNull Then Return SqlString.Null
        If elementName.IsNull Then Return SqlString.Null

        'Create an Xelement object from the XML
        Dim e As XElement = XElement.Parse(column.Value)
        'Try to find an element of given name
        Dim found = e.Descendants(elementName.Value).FirstOrDefault
        If found IsNot Nothing Then
            'return value
            Return New SqlString(found.Value)
        Else
            Return SqlString.Null
        End If
    End Function

Spread the Word! SQL User-Defined Scalar Functions are Evil!

While at DDD8a ‘Modern.NET’ recently, I attended the session "Things you should know about SQL as a developer” session hosted by Simon Sabin.

While I’m not DBA-level when it comes to SQL Server, I thought I knew most of the performance issues in SQL and how to avoid them. Turns out (happily) that I was wrong, and it’s given me an almost ‘free’ performance benefit.

The culprit is SQL User Defined Scalar functions. I use only a few of these, but they are very useful. For example, when working with dates in data, I often want to group rows by a month rather than the date and time value.

Indeed it’s really weird to my mind that this fairly common pattern isn’t implemented in T-SQL – or indeed even in the .NET framework! It’s not hard to create your own of course, you can use the year and month value to return a date for the first of the month. In .NET I have a DateExtensions library that has System.DateTime extension methods such as .MonthOnly, .LastMonth and .NextMonth – which is so heavily used that I now include the namespace at project level.

So for five years I’ve had a SQL scalar function dbo.MonthOnly() which takes a SQL DateTime and returns the first day of the date’s month. This is really useful with LINQ to SQL as this function can be integrated into LINQ queries, so that the grouping takes place on the server and not the client, reducing the amount of data passing from the server to the client.

Realisation Dawns

So listening to Simon talking about these useful functions and saying they are a performance nightmare was a bit of a surprise. Turns out these functions cannot be parallelised, so any query using them is automatically throttled. Secondly they are not compiled in the same way as stored procedures, so there is an interpretation hit. You can read more about it on his post here.

The solution says Simon is to use .NET CLR functions and having tried out the performance difference I can testify that the difference is amazing.

Example:

I have a database table of phone call records, with a SQL datetime column that stores and full date and time value. If I wanted to summarise by day (i.e. remove the time component of each call date) I would use my scalar DateOnly function. This table has just over 900,000 rows of real data in it.

Test

TSQL scalar function

CLR function

Group by DateOnly()

3 seconds

sub-1 second

Group by MonthOnly()

10 seconds

sub-1 second

 

I have to say that writing the SQL CLR functions was terribly easy, and (of course) I also created some unit tests for these so I had 100% confidence in the results I was getting.

Many thanks Simon!

SQL server linked server to itself

I had a problem – I wanted to put archive data into separate databases on the same SQL Server 2005 system. When I did this though, I had to map all the accounts I had previously created on the main database into each of my annual archive databases. The meant that a lot of synchronising of accounts and logins across each of the archives, and creating new ones whenever a new archive was created.

I knew that the SQL Server 2005 linked servers system allowed one server to log into a different server using a single login, but I could not see a way to do this when the server was local. Then I realised that I could create a linked server called “localhost” thus:

— add linked server using alias to self!
sp_addlinkedserver @server=’localhost’, @srvproduct=’SQL Server’;
— create a login map to a single account
sp_addlinkedsrvlogin
@rmtsrvname=’localhost’,
@useself=’FALSE’,
@rmtuser = ‘MyArchiveUser’,
@rmtpassword = ‘myarchivepassword’;

SQL Server thinks this that ‘localhost’ is a “remote” server and allows me to use it as if it were. I could then make my archives available as views in the live database as follows:

CREATE VIEW ArchiveTable AS
SELECT * FROM localhost.ArchiveDatabase.dbo.ArchiveTable;

Really pretty simple once I’d figured out how to fool SQL!

SQL varchar ‘bug’

I was having a hard time figuring out why my T-SQL query wasn’t working.

DECLARE @billid int, @cli varchar;
SET @billid =12345;
SET @cli = ‘02077778888’;

SELECT COUNT(*) FROM Records WHERE BillID = @billid AND CLI = @cli;

This query consistently returned a result of zero records, despite my knowing that records existed for this combination, e.g.:

SELECT COUNT(*) FROM Records WHERE BillID = 12345
AND CLI = ‘02077778888’;

would return the correct number.

Solution:

The length of the varchar had not been specified. Although it parses correctly it doesn’t actually work without a length being specified. This simple change fixed things:

DECLARE @billid int, @cli varchar(14);