Introduction
The
Regular Expressions feature is available in MS
SQL Server 2005/2008. You can use all .NET Framework
Regular Expression stuff via MS
SQL Server CLR integration.
This article describes how to create and use extensions for the
LIKE
(Transact-
SQL) clause that supports
Regular Expressions. Just for the demo, we also have created a text parser that extracts tokens from a text by a given
Regular Expression pattern. Also, there is an overview of the namespaces and libraries required to compile database objects using Microsoft
SQL Server integration with the aid of the .NET Framework Common Language Runtime (CLR).
Background
The stated material could be helpful if you know T-
SQL and C#. In that case, you can make use of the extensive library functionality. If you have mastered T-
SQL only (without C#), you may just use the
RegExpLike
function instead of the standard
LIKE
clause in places where
Regular Expressions functionality is needed.
Performance
Refer to the MSDN site articles to be clear on the performance side:
Developers should view the CLR as an efficient alternative for logic
that cannot be expressed declaratively in the query language. Regular Expressions, for instance.
Microsoft recommends to use CLR Integration in scenarios where
CLR-based programming can complement the expressive power of the T-SQL
query language. Such as a need for embedding procedural logic inside a
query that can be called as a function (LIKE-function). This includes
situations such as:
- Performing complex calculations (that have to be expressed using
procedural logic) on a per-row basis over values stored in database
tables. This can involve sending the results of these calculations to
the client, or using the calculations to filter the set of rows that are
sent to the client.
- Using procedural logic to evaluate tabular results that are then queried in the
FROM
clause of a SELECT
or DML statement.
SQL Server 2000 introduced T-SQL functions (both scalar and table-valued) that enable these scenarios. With SQL
Server 2005, these functions can be more easily written using CLR
languages, since developers can take advantage of the much more
extensive libraries in the .NET Framework. In addition, CLR programming
languages provide rich data structures (such as arrays, lists, etc.)
that are lacking in T-SQL, and can perform significantly better due to the different execution models of the CLR and T-SQL.
Functions are, in general, good candidates to be written using the
CLR, since there is seldom a need to access the database from within a
function: values from the database are usually passed as arguments. This
then plays to the strength of the CLR, which is better at computational tasks than T-SQL.
Using the Code
Part 1. Extension of LIKE clause
At the beginning, you have to allow MS
SQL
Server to use CLR Integration, i.e., to make possible the usage of .NET
assemblies and methods from them (by default, this possibility is
disabled). To do this, use the following script:
Collapse |
sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
If you want to revert to the default state, run this script:
Collapse |
sp_configure 'clr enabled', 0
GO
RECONFIGURE
GO
From here, we create an assembly that is a wrapper for the
Regular Expression .NET classes. To create the user defined function for MS
SQL Server in C#/.NET, you just create a library project, you create a class, and you add
public static
methods that will be
SQL functions in future. And,
SqlFunctionAttribute
must forestall each from these methods. It is used to mark the method definition of a user-defined aggregate as a function in
SQL Server. For our
RegularExpressionLike
method, we have got a method as shown:
Collapse |
public partial class SqlRegularExpressions
{
[SqlFunction]
public static bool Like(string text, string pattern)
{
Match match = Regex.Match(text, pattern);
return (match.Value != String.Empty);
}
}
The next step is assembly building. From now, you have to deploy a given assembly to MS
SQL Server. To do this, run the next script (but you have to indicate the path to the assembly, for your machine):
Collapse |
CREATE ASSEMBLY
SqlRegularExpressions
from 'd:\Projects\SqlRegularExpressions\
SqlRegularExpressions\bin\Release\SqlRegularExpressions.dll'
WITH PERMISSION_SET = SAFE
Bingo! Your assembly is registered, and from now on, we may use its functionality. That is exactly what we plan do.
By the way, to revert this action, you can run the script as follows:
Collapse |
drop assembly
SqlRegularExpressions
To bind the assembly method with a
SQL function, you have to run the script as shown:
Collapse |
CREATE FUNCTION RegExpLike(@Text nvarchar(max), @Pattern nvarchar(255)) RETURNS BIT
AS EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.[Like]
And that is all. Now, you may use the
RegExpLike
function to check for the string matching pattern with the
Regular Expression:
Collapse |
select * from titles
where 1 = dbo.RegExpLike(title, '\b(A\S+)')
The result of the above script running is the following:
This is the function removing the script:
Collapse |
DROP FUNCTION RegExpLike
Part 2. Text Parsing
The next point is extraction of strings from a given text by using a
Regular Expression pattern.
This is the task of getting a table, i.e., the result of function
execution is a table with some data. Let it be the start index in the
text, the length, and the value of the result string. Altogether, there
are three columns in the result table. The CLR represents a streaming
model for table-valued functions which ensures that results can be
consumed immediately after the first row is available, instead of
waiting for the entire table to be populated. It gives a lot of
advantages for the end user and for performance, and it, of course,
causes additional complexity in the implementation. User-defined
table-valued functions require the implementation of two
public static
methods: one is the master method that is called by MS
SQL Server and returns an objects enumeration (
IEnumerable
),
and the other is the secondary method that is called by the first to
fill the table rows. All will be clear if we look at the code below:
Collapse |
public partial class SqlRegularExpressions
{
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern);
}
public static void FillMatch(object obj, out int index,
out int length, out SqlChars value)
{
Match match = (Match)obj;
index = match.Index;
length = match.Length;
value = new SqlChars(match.Value);
}
}
GetMatches
returns all the matched elements as a
MatchCollection
object, and
FillMatch
is called for each object (
Match
)
obtained before to determine the table row fields based on the data
from it. We can see it from the list of parameters: first is the object
reference and the rest is the variables marked by the
out
attribute. This 'rest' determines the possible columns' nature.
Be accurate! You have to indicate the Fill method name in the
SqlFunction.FillRowMethodName
property; on the other hand, it allows some flexibility.
Now, we build the assembly, register it in MS
SQL Server again because its strong name was changed by the building, and create a target table-valued function:
Collapse |
CREATE FUNCTION
RegExpMatches(@text nvarchar(max), @pattern nvarchar(255))
RETURNS TABLE
([Index] int, [Length] int, [Value] nvarchar(255))
AS
EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.GetMatches
GO
And now, we can extract the strings from some text by a certain
pattern. For instance, let's get all the words from the text that starts
with a lower-case 'a':
Collapse |
DECLARE @Text nvarchar(max);
DECLARE @Pattern nvarchar(255);
SET @Text =
'This is comprehensive compendium provides a broad and thorough investigation of all '
+ 'aspects of programming with ASP.Net. Entirely revised and updated for the 2.0 '
+ 'Release of .Net, this book will give you the information you need to master ASP.Net '
+ 'and build a dynamic, successful, enterprise Web application.';
SET @Pattern = '\b(a\S+)';
select * from dbo.RegExpMatches(@Text, @Pattern)
GO
and you get the result below:
Please give attention to that fact that the script does
not
return any 'ASP.NET' token because they start with upper-case 'A'. If
you want to ignore case when matching occurs, you have to have a new
function, or just add one more parameter to the
Regex.Matches()
method as shown:
Collapse |
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern, RegexOptions.IgnoreCase);
}
And now,
RegExpMatches
returns all 'a'-starting words including 'ASP.NET'.
This removes the script for the
RegExpMatches
function:
Collapse |
DROP FUNCTION RegExpMatches
SQL Regular Expressions, Version 2
The SqlServerProject download
consists of a '
SQL
Server Project' type solution instead of a previously created 'Class
Library' type solution. It is a more natural type for developing objects
for a database. There are a couple of files in the project. We use them
because it was boring to do monotonous actions to drop/create
assembly/function during the development in the 'Class Library' type
project. The '
SQL Server Project'
type assumes all these actions without any human participation. Also,
its IDE consists of special menu items to perform database objects
tasks. But, you can enjoy it if you only have VS Pro/Team.
Easy deploying. We have to do three steps to deploy
Regular Expressions to our
SQL Server:
- Setup the project to reference the target database.
- Build the project.
- Deploy the assembly that supports Regular Expressions in SQL Server to our database engine.
And now, step by step:
To setup a project to reference a database where it would be good to use
Regular Expressions power, we do this (see image below):
- We right click on the project in the Solution Explorer window.
- We choose the 'Properties' menu item. The assembly properties window will be expanded.
- On the left side, we choose a bookmark with a 'Database' title.
- We click on the 'Browse...' button. The 'Add database reference' window will appear.
- Next, we choose an existing database reference from the list or add
new. To do this, we click on the 'Add New Reference...' button. The 'New
Database Reference' dialog will be shown.
- We choose a server name and a database name from the respective dropdown lists.
- Then, we click 'OK' - and the 'Add New Reference' dialog will be closed.
- We click 'OK' again, the 'Add Database Reference' dialog will be closed and a new target for the assembly deploying is set.
Now it is time for the final actions:
- Build the project by selecting 'Build SqlRegularExpressions' from the 'Build' menu (we may skip this step if changes are absent, because the assembly is already compiled).
- Select 'Deploy SqlRegularExpressions' from the same 'Build' menu.
This is the result we get:
Collapse |
========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========
In case you have not got Visual Studio IDE, you may install the
Regular Expressions functionality as described in the 'Use of Code' section for the
RegExpLike
function. The assembly DLL is located in the '
Bin' folder of the
SqlServerProject package. One more thing, this is the script for the
RegexMatch
function:
Collapse |
CREATE FUNCTION RegexMatch(@Text nvarchar(max),
@Pattern nvarchar(255), @Options INT) RETURNS BIT
AS EXTERNAL NAME SqlRegularExpressions.UserDefinedFunctions.RegexMatch
Documentation for Assembly Content
RegexMatch, RegexStrMatch
Searches the specified input string for the first occurrence of the
Regular Expression specified in the Regex constructor, and returns true on success (
RegexMatch
), or the matched string (
RegexStrMatch
).
Syntax:
Collapse |
RegexMatch(<input>, <pattern>, <options>) RETURNS BIT
RegexStrMatch(<input>, <pattern>, <options>)
RETURNS nvarchar(max)
Parameters
- input The string to be tested for a match.
- pattern The Regular Expression pattern to match.
- options A bitwise OR combination of the enumeration values. See certain values for this parameter in the table below.
Return value
RegexMatch
: True if inputted string matches the pattern, false otherwise.
RegexStrMatch
: A Regular Expression matched string, or null
if one does not exist.
Remarks
Values for the
options
parameter:
Options |
C# RegexOptions enum member analogue |
Description |
0 -or- null |
None |
Specifies that no options are set. |
1 |
IgnoreCase |
Specifies case-insensitive matching. |
2 |
Multiline |
Multiline mode. Changes the meaning of ^ and $ so they match at the
beginning and end, respectively, of any line, and not just the beginning
and end of the entire string. |
4 |
ExplicitCapture |
Specifies that the only valid captures are explicitly named or
numbered groups of the form (?<name>...). This allows unnamed
parentheses to act as non-capturing groups without the syntactic
clumsiness of the expression (?:...). |
8 |
Compiled |
Specifies that the Regular Expression
is compiled to an assembly. This yields faster execution, but increases
the startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method. |
16 |
Singleline |
Specifies the single-line mode. Changes the meaning of the dot (.)
so it matches every character (instead of every character except \n). |
32 |
IgnorePatternWhitespace |
Eliminates unescaped white space from the pattern, and enables comments marked with #. However, the IgnorePatternWhitespace value does not affect or eliminate white space in character classes. |
64 |
RightToLeft |
Specifies that the search will be from right to left instead of from left to right. |
256 |
ECMAScript |
Enables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase , Multiline , and Compiled values. The use of this value with any other value results in an exception. |
512 |
CultureInvariant |
Specifies that cultural differences in language are ignored. |
For more information, see the MSDN documentation for the
Regex.Match
method.
Examples
A. Following code gets all titles where the title consists of a word that starts with 'A' or 'a'.
Collapse |
select * from titles
where dbo.RegexMatch(title, '\b(A\S+)', 1) = 1
B. Sometimes it happens that you only need a certain part from a string. You can do this extraction using Transact-
SQL, but you can use
Regular Expressions
also. The following code demonstrates how to order author names ('pubs'
database) by last word in the surname, in case the surname consists of
more than one word, for instance, 'del Castillo', 'de Balzak', 'al
Mahdi', etc.
Collapse |
select au_id, (au_fname + ' ' + au_lname) au_name
from authors
order by
dbo.RegexStrMatch(au_lname, '(\w+)', 64)
The result is a sorted list of authors where 'Innes del Castillo' precedes 'Michel DeFrance'.
Source code
Collapse |
using System; using System.Data.SqlTypes; using System.Text.RegularExpressions; using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[SqlFunction(Name="RegexMatch", IsDeterministic=true, IsPrecise=true)]
public static SqlBoolean RegexMatch(SqlString input,
SqlString pattern, SqlInt32 options)
{
if (input.IsNull)
{
return SqlBoolean.Null;
}
if (pattern.IsNull)
{
pattern = String.Empty;
}
if (options.IsNull)
{
options = 0; }
try
{
Match match = Regex.Match((string)input,
(string)pattern, (RegexOptions)(int)options);
if (match.Value != String.Empty)
{
return SqlBoolean.True;
}
}
catch
{
throw;
}
return SqlBoolean.False;
}
};
Could you please notify me if I have missed some functionality which you wish? Thank you!
Points of Interest
This article is copied from here to spread the knowledge