Contents
Introduction
One of the most repetitive, boring and thus error prone tasks when accessing databases from C# (or other languages) is writing SQL commands. And when there is explicit SQL, no matter whether it's in stored procedures or inside your code, it has to be maintained. For instance, if a column is added to a database, that column must be added to all four commands (INSERT
, SELECT
, UPDATE
, DELETE
- a.k.a. CRUD), and there are often more than one SELECT
s with different filters. There are certainly times when you must write some SQL, but it would be better if machine could create as much as possible.
This article is about writing some code that can do that job. It was made for SQL Server and ADO.NET SqlClient provider.
Update 15/1/2005: I've changed it to support OLEDB too. Wherever in text SqlCommandBuilder
is referenced, it also applies to OleDbCommandBuilder
. You shouldn't have too much problems to customize it to other databases and data providers.
What's already included in ADO.NET
ADO.NET comes with one solution for this problem, SqlCommandBuilder
class. You still have to provide SELECT
command, but INSERT
, UPDATE
and DELETE
are generated for you. Behind the scenes, CommandBuilder
goes to database to fetch the required metadata. CommandBuilder
has lots of issues, discussed in William Vaughn's article "Weaning Developers from the CommandBuilder". In my opinion, the greatest problem is, if for some reason, the code it generates doesn't fit your needs, there's no way to customize or fix it - you have to abandon it completely. For example, if you want to find out the value of autoincrement column in SQL Server after INSERT
, you're out of luck. Another serious problem is that it requires roundtrip to database to fetch metadata, so even Microsoft doesn't recommend using it in production code.
Another possible solution is to use Data Adapter Configuration Wizard (DACW), as suggested in Vaughn's article. It works similar to SqlCommandBuilder
, but in design time. So you can see generated code and tweak it to your needs. Problem is, you have to make same tweaks whenever you generate SQL (e.g., to add retrieval of autogenerated column to INSERT
commands). Once you have tweaked it, you are in manual maintenance mode - the computer can't add new columns to these commands for you.
A word or two about Strongly Typed DataSets
When working with plain, untyped DataSet
s, there are lots of casts and finding columns by name. Both of these activities can't be checked at compile time. If you like, when compiler finds as many bugs in your code as possible, then you must be using Strongly Typed DataSet (typed DS). It's one of my favorite features of .NET. If something changes in the DB, you just update the typed DS, compile, and let the compiler find where your code doesn't work with the new version of DB. Also, when you use typed DS, you have Intellisense (which is hard to live without). Although it affects performance, benefits usually compensate that difference. If your application can afford to use typed datasets, you'll see here how they can reduce the amount of SQL that should be written manually.
Typed DS is created from the database in a similar way to how CommandBuilder
works - database is queried for metadata. Main difference is it's happening on design time instead of run time. This essentially means, on run time, there already exists enough metadata inside typed DS to construct commands - you can find out names of tables, list of columns, their types, which columns are primary keys, etc. If SqlCommandBuilder
knew about them, it wouldn't have to make roundtrip to the database. Also, SqlCommandBuilder
has no way of knowing which columns you really need: if some typed DS represents only a subset of some database table, only columns represented in that DS should be included in commands.
CustomCommandBuilder
Code presented here will work similar to SqlCommandBuilder
, but under our control. Therefore, we can make custom changes. For SELECT
command, we can add filtering or ordering, and still have auto generation of column list. For INSERT
, we can add support for auto increment fields. For UPDATE
command, SqlCommandBuilder
is using "optimistic locking" support, i.e., only rows that are not changed since we fetched them are written to the DB. We can choose other kinds of concurrency controls, e.g., using timestamps or no concurrency control at all.
All relevant classes are in Cogin.Data
and Cogin.Data.DataAccess
namespaces. Class CustomCommandBuilder
is one that generates SQL. It's used by SqlDataAccess
class, which contains low level ADO.NET code. These two classes are not application specific and can be reused in different projects. There are two similar classes for OleDb: OleDbCustomCommandBuilder
and OleDbDataAccess
. DataAccess classes are not used directly, but through IDataAccess
interface, so that application specific classes don't have to be changed when we switch from SqlClient to OleDb data provider. They just have to use DataAccessFactory
to get the appropriate version, which checks config file to see which data provider will be used.
Example of data access classes are CustomersDataAccess
and OrdersViewDataAccess
. These classes are handling all CRUD operations for the sample application, and yet there is no SQL in any of them! That was our goal. Methods you have to write are as simple as:
public void FillOrdersForCustomer( string customerId )
{
dataAccess.SelectWithFilter( dataSet.Orders, "CustomerID=@customerId",
new GenericSqlParameter( "@customerId", customerId ) );
}
public void updateCustomers()
{
dataAccess.fullUpdate( dataSet.Customers );
}
Sample Win Forms application is included. It uses Northwind database which you should already have on SQL Server. It displays customers and their orders (filtered and loaded only when needed) and allows updates, inserts and deletes. Unfortunately, Northwind doesn't have cascade updates/deletes, so you can successfully play only with rows you created. Since that wasn't the point of this article, I didn't bother to write code to cascade changes to all affected tables, e.g., to Order Details.
CustomCommandBuilder
can be used even if you want to use stored procedures instead of plain SQL. It can create calls to stored procedures, and generic procedures as well. Creating these autogenerated stored procs in the DB can be done manually or automatically on application start.
Performance considerations
When working with DB, usually the bottleneck is database itself. However, I was still interested to find out the performance hit CustomCommandBuilder
introduces. In order to minimize DB overhead, the test database was on the same machine as the test program (no network latency). Fastest way is to use stored procedures, so alternatives are compared to that approach. All test cases use typed DS. I didn't try to measure typed DS versus DataReader or some other approach. I tried to answer the following questions with this test:
- Is it slow if SQL commands are generated every time? It's what
CustomCommandBuilder
is doing, and if command generation is too slow, some caching mechanism should be used.
- Is it slow to recreate data access and typed DS classes for each request?
- Does it matter if filters are made using
SqlParameter
s compared to plain text?
Last question was important because you can make filters by concatenating strings, e.g.:
SelectWithFilter( dataSet.Orders, "CustomerId='" + customerId + "'" );
This is slow not because of string concatenation, but because SQL Server receives always different commands and must compile them (parse and determine execution order) every time. Better way to make parameterized filters is:
SelectWithFilter( dataSet.Orders, "CustomerId=@customerId",
new GenericSqlParameter ( "@customerId", customerId ) );
SQL Server can cache this kind of commands, because parameters are transferred separately from command string. As you'll see from benchmark results, this is almost as fast as using stored procedures, but much more flexible. Using SqlParameter
is not only faster, but safer too because you don't have to worry if parameter contains some illegal character, like ' sign. GenericSqlParameter
here is just Facade that allows us to use it for both SqlClient and OleDb.
I created a small benchmark that also uses Northwind database, to fetch 1000 times random row from Orders table. Stored procedure CustOrdersOrders
(which already exists in Northwind) is compared to selecting using generated SQL command, generated again for each request. Variants with and without reusing data access and typed DS classes are measured. Test configuration: Athlon XP 1800+, 512MB RAM, 7200rpm HDD, Win XP SP1, SQL Server 2000. Database is running on the same machine. All tests are performed thrice. Results:

This example is very simplified and limited, but it gives some hints. You should perform tests on your setup with typical real life load, to get relevant data for your case.
As you can see, differences are rather small, and worst case is only 25% slower than best case. I consider 25% to be a small difference because in real world, database is on a separate server, queries are much more disk-intensive (during this benchmark, disk activity was practically nonexistent because very soon entire Orders table was cached in system memory), so differences should be much smaller. You should notice that "plain text filters" have largest deviation. That's because they are very dependent on "randomness" of required CustomerID, i.e., whenever same filter condition is repeated, previously compiled SQL command is used. This also means that this kind of filtering is trashing SQL Server's compiled commands cache.
If this performance difference is significant in your case, or you can not afford additional memory overhead of typed DS, you can go DataReader
/Stored proc way, but that requires much more work and is much harder to maintain. Best advice is to measure before optimizing anything.
If performance is an issue, I would rather focus on a way typed data sets are usually used - as a set of several data tables which directly map to database tables, i.e., without using JOIN
s in queries. Each DataTable
object is filled/updated with separate SQL commands, and each one means roundtrip to the server. This can be improved by packaging several commands together - multiple resultsets will be received with single roundtrip.
Conclusion
CustomCommandBuilder
frees you of writing generic SQL code, and yet allows you to choose how it will be generated. Performance hit is usually minimal in real life. Only requirement is that you're using typed data sets, which I would recommend anyway if you can trade some performance hit for increased productivity. If you're using SQL Server, everything is already provided, but contrary to ADO.NET's SqlCommandBuilder
, you can change it for other servers.