Hugo Kornelis' SQLBlog article "T-SQL User-Defined Functions: the good, the bad, and the ugly (part 1)" makes an excellent case for avoiding scalar User-Defined Functions (UDF).This article builds on his work to show how inline-table UDF's not only do not have the same penalties as scalar UDF's, they provide significant performance benefits.

There are a lot of factors that make the difference between a SQL process that takes an hour to run and one that takes minutes, or a report that returns instantly and one seemingly runs forever. Speed comes from a combination of well-planned and maintained indexes, file distributions that ameliorate the physical limitations of disk speed, and code that works with the engine, the indexes and the file structure instead of against it.

Kornelis wrote an excellent series of posts on why scalar functions should almost always be avoided. Even though scalar functions offer advantages in code readability and reusability, the performance penalties are too severe to justify their expense. I'm going to build on his work , and demonstrate why inline-table valued functions are not only still okay to use, but they're a great way to both make your code more readable, more reusable, and faster, all at once.

Let's start with Kornelis's table and function:

CREATE FUNCTION dbo.Triple(@Input INT)
RETURNS INT
AS
BEGIN
;
DECLARE @Result INT;
SET @Result = @Input * 3;
RETURN @Result;
END;
GO

CREATE TABLE dbo.LargeTable
(KeyVal INT NOT NULL PRIMARY KEY,
DataVal INT NOT NULL CHECK (DataVal BETWEEN 1 AND 10)
);


WITH Digits
AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS d(d))
INSERT INTO dbo.LargeTable (KeyVal, DataVal)
SELECT 1000000 * sm.d
+ 100000 * ht.d + 10000 * tt.d + 1000 * st.d
+ 100 * h.d + 10 * t.d + s.d + 1,
10 * RAND(CHECKSUM(NEWID())) + 1
FROM Digits AS s, Digits AS t, Digits AS h,
Digits AS st, Digits AS tt, Digits AS ht,
Digits AS sm;
GO

CREATE INDEX NCL_LargeTable_DataVal ON dbo.LargeTable (DataVal);
GO

SET STATISTICS TIME ON;
SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple
FROM dbo.LargeTable AS d;

SELECT MAX(3 * DataVal) AS MaxTriple
FROM dbo.LargeTable AS d;
SET STATISTICS TIME OFF;

This is pretty straightforward – we create a scalar function that triples a value. This is a more simplistic calculation than what normally prompts most developers to encapsulate their code in a function, but it's good for showing the kinds of performance differences between scalar functions and other ways of performing calculations.

Next, we create a table to hold values, in this case, an integer key and a second field that has values between 1 and 10. Then we select the maximum triple value from the table, once using the Triple function and once using an inline calculation. After experimentation with the FROM clause, we find if we first query for the distinct values in LargeTable, and then perform the function or calculation on only the ten rows that are returned, the execution times for function vs. calculation are indistinguishable. Kornelis ‘s article explains the estimated and actual query plans are extremely misleading in assessing the comparative performance of these statements and shows why it's important, when diagnosing performance issues, to not rely solely on the plan, but to also look at the execution statistics. In five test executions, comparing using the function vs. not using the function, and making the calculation over a subquery of distinct values vs. over the entire LargeTable dataset, my results were clear on what offered the best performance:

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT ON;
SET STATISTICS TIME ON;
SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple
FROM dbo.LargeTable AS d;

SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple
FROM (SELECT DISTINCT DataVal FROM dbo.LargeTable) AS d;

SELECT MAX(3 * DataVal) AS MaxTriple
FROM (SELECT DISTINCT DataVal FROM dbo.LargeTable) AS d;

SELECT MAX(3 * DataVal) AS MaxTriple
FROM dbo.LargeTable AS d;
SET STATISTICS TIME OFF;

T-SQL Syntax

Avg CPU Time in ms

Avg Elapsed Time in ms

Function, no distinct subquery 12925.0 14247.8
Function, subquery 853.0 853.8
Inline calculation, subquery 853.2 850.4
Inline calculation, no distinct subquery 0.0 0.0

We can see that running that scalar function on the whole dataset is going to be a very expensive choice, and when we see the way that the numbers on Inline vs. Function are almost identical when we perform them on a single value, we get a pretty good idea why: SQL handles a scalar function like a RBAR, processing each row individually. Numbers like this are why, when I'm asked to take a look at how to speed something up, one of the first things I check for – and replace, if I find them – are scalar functions. What I often replace them with most often though are not inline calculations, they are inline table-valued functions.

CREATE FUNCTION dbo.Triple_tbl (@DataVal INT)
RETURNS TABLE
AS
RETURN
SELECT
@DataVal * 3 Triple
GO

This looks very much like the scalar function we created in step one, but the way SQL calls the function is fundamentally different. With scalar functions, SQL is calling the function once for each row in the result set, before aggregating. With a table function, we don't have this problem. Taking my fastest choices for the scalar udf, and inline calculations respectively, then comparing them to the table-value function, we can see that the table-value function provides us with encapsulation and reusability without sacrificing performance:

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET NOCOUNT ON;
SET STATISTICS TIME ON;

SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple
FROM (SELECT DISTINCT DataVal FROM dbo.LargeTable) AS d;

SELECT MAX(3 * DataVal) AS MaxTriple
FROM dbo.LargeTable AS d;

SELECT MAX(t.Triple) AS MaxTriple
FROM dbo.LargeTable l
CROSS APPLY dbo.Triple_tbl(l.DataVal) t

SET STATISTICS TIME OFF;
GO

In my environment, the table-valued function consistently returns 0ms CPU time, 0ms elapsed time, just like I get for the inline calculation. Not surprisingly, when I check the query plan, I get a good idea of why:

Even though the syntax in the code for the scalar function looks very similar to the syntax for an inline calculation, under the hood, they process very differently. The syntax for the table-valued function reads very differently from the inline calculation, but under the hood, SQL is applying the algorithms encapsulated in the function to the data set in the same way as it would if the calculation was encoded as an inline calculation.

So don't be afraid to take advantage of SQL Table-valued UDFs. They're a great way to get the consistency, readability and reusability that made you reach for a function, but without the performance cost of scalar functions.