Don’t use stored procedures yet? Must be suffering from NIHS (Not Invented Here Syndrome)

At just about every talk I give I always try to make several consistent statements. One of which is: ‘Whenever possible use stored procedures to access your data’.


The following day I usually receive some email from people that attended the talk – it’s always great to hear from people and how they build their applications. Every once in a while I’ll get a comment such as: ‘I don’t use stored procedures for several reasons, one of which is that it ties me to a particular database’.  I just finished answering several such emails after our last couple of roadshows events and thought – great blog topic!


As soon as you put data in a database, whether its SQL Server, Oracle, or DB2, you just became tied to that particular database — surprised??? Sure, it’s just your data that is in there and you can use ANSI SQL for ‘generic/independent’ access to the data from your application. But claiming that ad-hoc SQL (SQL script embedded directly in your code) is the only way to accomplish database independence is simply incorrect. In fact there are many ways to efficiently solve this problem without creating such a brittle link between the application and the database.


One of the ways we solve this problem in ASP.NET “Whidbey” and the ASP.NET Forums (and now .Text and DotNetNuke too) is through the provider model pattern: a pluggable data access layer component that can easily be swapped out for other data access layer components. The difference is we do everything possible to take advantage of the database we’re running on. This means ensuring that all data access is through stored procedures and that the stored procedures return data to the business logic layer with no wasted cycles and no wasted data, i.e. sort the data in the database and only return records which will be used, no-more no-less.


There are several reasons for using stored procedures to access data, below I’ve attempted to capture the 3 that I always reference:


1. Ad-hoc SQL Script is brittle

You would never put business logic code in your presentation tier, right? Why put ad-hoc SQL in your business logic layer? Embedded SQL script is very brittle; small changes to the database can have severe impacts on the application. Whereas stored procedures afford abstraction between the data and the business logic layer. The data model can be dramatically changed, but the stored procedures can still return identical data.


2. Security

What permissions are required for working with your data? If embedded SQL is used it likely means that the application can execute any INSERT, UPDATE, DELETE, SELECT script it desires. You wouldn’t – hopefully you don’t – run as the administrator (sa) account on your server, so why allow full access to the database itself?


Stored procedures allow for better data protection by controlling how the data is accessed. By granting a database login EXECUTE permissions on stored procedures you can specify limited actions to the application. Additionally, stored procedures are a counter-measure to dangerous SQL Script injection attacks, a susceptibility that applications using embedded SQL are more vulnerable to.


3. Performance

A counter argument for performing data manipulation in a stored procedure, vs. simply retrieving the desired records in a SELECT statement, is it slows SQL Server down. That’s not exactly correct. In many cases you can get better performance by looping and filtering data in SQL Server than you could performing the same loops and filters in the Data Access Layer – databases are intrinsically designed to do this, while you and I have to write our own code (which do you think is going to be faster?). It is, however, important to understand how SQL Server uses indexes and clustered indexes. I’d recommend SQL Server 2000 Performance and Optimization Tuning to learn more about this and other SQL Server performance guidelines.


Beyond sorting and filtering data in the stored procedures you can also batch common work together or retrieve multiple sets of data. For example, retrieve some data, update a datetime stamp, and then insert a new record. If you were to execute these 3 tasks once a second as ad-hoc SQL this would result in 259,200/day independent database request vs. 86,400/day if all were encapsulated in a stored procedure. That’s 172,800 database connections and network IO usages that you no longer require! Consolidating work through stored procedures makes more effective use of a connection (and your system).


There are also internal performance benefits to SQL Server for using stored procedures vs. ad-hoc SQL script. When stored procedures are used SQL Server can cache the ‘execution plan’ that it uses to execute the SQL vs. having to recalculate the execution plan on each request – would you recompile your business logic class on each request? …probably not. While small details such as this appear to be trivial, when you start considering total requests/second that your SQL Server is serving, small performance improvements to repetitive code paths add up very quickly and make your server more efficient and scalable.


It’s also worth mentioning that the easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on. For example, if I were to write an application that used a DB2 database, I would do everything I could to get the maximum performance out of that system vs. trying to write generic data access code. Why? Well the SQL Server team here at Microsoft has some of the best minds in the industry making sure SQL Server has phenomenal performance and scalability – it would be a shame not to take advantage of all that effort, right? We can assume that other database companies hire smart people for their products too — don’t get striken by N.I.H.S. (Not Invented Here S< /STRONG>yndrome) a dreaded&n
bsp;software developer disease where nothing is good unless you wrote every line of code!


Side-note: if you jumped to the conclusion that the point of all this is to advocate all business logic residing in stored procedures, you missed the point. There needs to be balance. My general rule of thumb is: return only the necessary data, no-more no less. The more efficient you can make your data access code by rapidly opening and closing the connection to the database the better; stored procedures provide you with an effective tool for accomplishing this. Performing business logic is frowned upon, i.e. transforming the data or performing calculations on the data. Use the power of the database to thresh the wheat from the chaff. Use your business logic to turn the wheat into dough 😉


To follow this discussion in whole:

Read: this, this, and this.


[listening to: John Mayer, Comfortable]


Edit: 11/18

A couple of people have very correctly pointed out two things:

1. I stated that SQL can ‘pre-compile’ the execution plan. I was using ‘pre-compile’ interchangably with ‘cache’ (the original statement said ‘pre-compile or cache’. SQL Server does not have the ability to pre-compile a stored procedure – I edited the article to remove the work ‘pre-compile’ as it can be misleading.

2. Stored procedures are not the only defense against SQL Script injection attacks. Parameterized ad-hoc SQL works just as well, although stated in the original post.


Edit 11/19

Added details about entire conversation