Introduction
The Regular Expressions feature is available in MS SQL Server 2005/2008. You can use all .NET Framework Regular Expression stuff via MS SQL Server CLR integration.This article describes how to create and use extensions for the
LIKE
(Transact-SQL) clause that supports Regular Expressions. Just for the demo, we also have created a text parser that extracts tokens from a text by a given Regular Expression pattern. Also, there is an overview of the namespaces and libraries required to compile database objects using Microsoft SQL Server integration with the aid of the .NET Framework Common Language Runtime (CLR).Background
The stated material could be helpful if you know T-SQL and C#. In that case, you can make use of the extensive library functionality. If you have mastered T-SQL only (without C#), you may just use theRegExpLike
function instead of the standard LIKE
clause in places where Regular Expressions functionality is needed.Performance
Refer to the MSDN site articles to be clear on the performance side:Developers should view the CLR as an efficient alternative for logic that cannot be expressed declaratively in the query language. Regular Expressions, for instance.
Microsoft recommends to use CLR Integration in scenarios where CLR-based programming can complement the expressive power of the T-SQL query language. Such as a need for embedding procedural logic inside a query that can be called as a function (LIKE-function). This includes situations such as:
SQL Server 2000 introduced T-SQL functions (both scalar and table-valued) that enable these scenarios. With SQL Server 2005, these functions can be more easily written using CLR languages, since developers can take advantage of the much more extensive libraries in the .NET Framework. In addition, CLR programming languages provide rich data structures (such as arrays, lists, etc.) that are lacking in T-SQL, and can perform significantly better due to the different execution models of the CLR and T-SQL.
- Performing complex calculations (that have to be expressed using procedural logic) on a per-row basis over values stored in database tables. This can involve sending the results of these calculations to the client, or using the calculations to filter the set of rows that are sent to the client.
- Using procedural logic to evaluate tabular results that are then queried in the
FROM
clause of aSELECT
or DML statement.
Functions are, in general, good candidates to be written using the CLR, since there is seldom a need to access the database from within a function: values from the database are usually passed as arguments. This then plays to the strength of the CLR, which is better at computational tasks than T-SQL.
Using the Code
Part 1. Extension of LIKE clause
At the beginning, you have to allow MS SQL Server to use CLR Integration, i.e., to make possible the usage of .NET assemblies and methods from them (by default, this possibility is disabled). To do this, use the following script:sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
If you want to revert to the default state, run this script:sp_configure 'clr enabled', 0
GO
RECONFIGURE
GO
From here, we create an assembly that is a wrapper for the Regular Expression .NET classes. To create the user defined function for MS SQL Server in C#/.NET, you just create a library project, you create a class, and you add public static
methods that will be SQL functions in future. And, SqlFunctionAttribute
must forestall each from these methods. It is used to mark the method definition of a user-defined aggregate as a function in SQL Server. For our RegularExpressionLike
method, we have got a method as shown:/// <summary>
/// Class that allows to support regular expressions
/// in MS SQL Server 2005/2008
/// </summary>
public partial class SqlRegularExpressions
{
/// <summary>
/// Checks string on match to regular expression
/// </summary>
/// <param name="text">string to check</param>
/// <param name="pattern">regular expression</param>
/// <returns>true - text consists match one at least,
/// false - no matches</returns>
[SqlFunction]
public static bool Like(string text, string pattern)
{
Match match = Regex.Match(text, pattern);
return (match.Value != String.Empty);
}
//...
}
The next step is assembly building. From now, you have to deploy a given assembly to MS SQL Server. To do this, run the next script (but you have to indicate the path to the assembly, for your machine):
Collapse |
CREATE ASSEMBLY
--assembly name for references from SQL script
SqlRegularExpressions
-- assembly name and full path to assembly dll,
-- SqlRegularExpressions in this case
from 'd:\Projects\SqlRegularExpressions\
SqlRegularExpressions\bin\Release\SqlRegularExpressions.dll'
WITH PERMISSION_SET = SAFE
Bingo! Your assembly is registered, and from now on, we may use its functionality. That is exactly what we plan do.By the way, to revert this action, you can run the script as follows:
Collapse |
drop assembly
--assembly name for references from SQL script
SqlRegularExpressions
To bind the assembly method with a SQL function, you have to run the script as shown:
Collapse |
--function signature
CREATE FUNCTION RegExpLike(@Text nvarchar(max), @Pattern nvarchar(255)) RETURNS BIT
--function external name
AS EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.[Like]
And that is all. Now, you may use the RegExpLike
function to check for the string matching pattern with the Regular Expression:
Collapse |
-- get all titles where title consists word that starts by 'A'
select * from titles
where 1 = dbo.RegExpLike(title, '\b(A\S+)')
The result of the above script running is the following:This is the function removing the script:
Collapse |
DROP FUNCTION RegExpLike
Part 2. Text Parsing
The next point is extraction of strings from a given text by using a Regular Expression pattern.This is the task of getting a table, i.e., the result of function execution is a table with some data. Let it be the start index in the text, the length, and the value of the result string. Altogether, there are three columns in the result table. The CLR represents a streaming model for table-valued functions which ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It gives a lot of advantages for the end user and for performance, and it, of course, causes additional complexity in the implementation. User-defined table-valued functions require the implementation of two
public static
methods: one is the master method that is called by MS SQL Server and returns an objects enumeration (IEnumerable
),
and the other is the secondary method that is called by the first to
fill the table rows. All will be clear if we look at the code below:
Collapse |
/// <summary>
/// Class that allows to support regular expressions
/// in MS SQL Server 2005/2008
/// </summary>
public partial class SqlRegularExpressions
{
// this is place of Like() method
/// <summary>
/// Gets matches from text using pattern
/// </summary>
/// <param name="text">text to parse</param>
/// <param name="pattern">regular expression pattern</param>
/// <returns>MatchCollection</returns>
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern);
}
/// <summary>
/// Parses match-object and returns its parameters
/// </summary>
/// <param name="obj">Match-object</param>
/// <param name="index">TThe zero-based starting
/// position in the original string where the captured
/// substring was found</param>
/// <param name="length">The length of the captured substring.</param>
/// <param name="value">The actual substring
/// that was captured by the match.</param>
public static void FillMatch(object obj, out int index,
out int length, out SqlChars value)
{
Match match = (Match)obj;
index = match.Index;
length = match.Length;
value = new SqlChars(match.Value);
}
}
GetMatches
returns all the matched elements as a MatchCollection
object, and FillMatch
is called for each object (Match
)
obtained before to determine the table row fields based on the data
from it. We can see it from the list of parameters: first is the object
reference and the rest is the variables marked by the out
attribute. This 'rest' determines the possible columns' nature.Be accurate! You have to indicate the Fill method name in the
SqlFunction.FillRowMethodName
property; on the other hand, it allows some flexibility.Now, we build the assembly, register it in MS SQL Server again because its strong name was changed by the building, and create a target table-valued function:
Collapse |
CREATE FUNCTION
--function signature
RegExpMatches(@text nvarchar(max), @pattern nvarchar(255))
RETURNS TABLE
([Index] int, [Length] int, [Value] nvarchar(255))
AS
--external name
EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.GetMatches
GO
And now, we can extract the strings from some text by a certain
pattern. For instance, let's get all the words from the text that starts
with a lower-case 'a':
Collapse |
-- RegExpMatches sample
DECLARE @Text nvarchar(max);
DECLARE @Pattern nvarchar(255);
SET @Text =
'This is comprehensive compendium provides a broad and thorough investigation of all '
+ 'aspects of programming with ASP.Net. Entirely revised and updated for the 2.0 '
+ 'Release of .Net, this book will give you the information you need to master ASP.Net '
+ 'and build a dynamic, successful, enterprise Web application.';
SET @Pattern = '\b(a\S+)'; --get all words that start from 'a'
select * from dbo.RegExpMatches(@Text, @Pattern)
GO
and you get the result below:Please give attention to that fact that the script does not return any 'ASP.NET' token because they start with upper-case 'A'. If you want to ignore case when matching occurs, you have to have a new function, or just add one more parameter to the
Regex.Matches()
method as shown:
Collapse |
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern, RegexOptions.IgnoreCase);
}
And now, RegExpMatches
returns all 'a'-starting words including 'ASP.NET'.This removes the script for the
RegExpMatches
function:
Collapse |
DROP FUNCTION RegExpMatches
SQL Regular Expressions, Version 2
The SqlServerProject download consists of a 'SQL Server Project' type solution instead of a previously created 'Class Library' type solution. It is a more natural type for developing objects for a database. There are a couple of files in the project. We use them because it was boring to do monotonous actions to drop/create assembly/function during the development in the 'Class Library' type project. The 'SQL Server Project' type assumes all these actions without any human participation. Also, its IDE consists of special menu items to perform database objects tasks. But, you can enjoy it if you only have VS Pro/Team.Easy deploying. We have to do three steps to deploy Regular Expressions to our SQL Server:
- Setup the project to reference the target database.
- Build the project.
- Deploy the assembly that supports Regular Expressions in SQL Server to our database engine.
To setup a project to reference a database where it would be good to use Regular Expressions power, we do this (see image below):
- We right click on the project in the Solution Explorer window.
- We choose the 'Properties' menu item. The assembly properties window will be expanded.
- On the left side, we choose a bookmark with a 'Database' title.
- We click on the 'Browse...' button. The 'Add database reference' window will appear.
- Next, we choose an existing database reference from the list or add new. To do this, we click on the 'Add New Reference...' button. The 'New Database Reference' dialog will be shown.
- We choose a server name and a database name from the respective dropdown lists.
- Then, we click 'OK' - and the 'Add New Reference' dialog will be closed.
- We click 'OK' again, the 'Add Database Reference' dialog will be closed and a new target for the assembly deploying is set.
- Build the project by selecting 'Build SqlRegularExpressions' from the 'Build' menu (we may skip this step if changes are absent, because the assembly is already compiled).
- Select 'Deploy SqlRegularExpressions' from the same 'Build' menu.
Collapse |
========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========
In case you have not got Visual Studio IDE, you may install the Regular Expressions functionality as described in the 'Use of Code' section for the RegExpLike
function. The assembly DLL is located in the 'Bin' folder of the SqlServerProject package. One more thing, this is the script for the RegexMatch
function:
Collapse |
--function signature
CREATE FUNCTION RegexMatch(@Text nvarchar(max),
@Pattern nvarchar(255), @Options INT) RETURNS BIT
--function external name
AS EXTERNAL NAME SqlRegularExpressions.UserDefinedFunctions.RegexMatch
Documentation for Assembly Content
RegexMatch, RegexStrMatch
Searches the specified input string for the first occurrence of the Regular Expression specified in the Regex constructor, and returns true on success (RegexMatch
), or the matched string (RegexStrMatch
).Syntax:
Collapse |
RegexMatch(<input>, <pattern>, <options>) RETURNS BIT
RegexStrMatch(<input>, <pattern>, <options>)
RETURNS nvarchar(max)
Parameters
- input The string to be tested for a match.
- pattern The Regular Expression pattern to match.
- options A bitwise OR combination of the enumeration values. See certain values for this parameter in the table below.
Return value
RegexMatch
: True if inputted string matches the pattern, false otherwise.RegexStrMatch
: A Regular Expression matched string, ornull
if one does not exist.
Remarks
Values for theoptions
parameter:Options | C# RegexOptions enum member analogue | Description |
---|---|---|
0 -or- null |
None |
Specifies that no options are set. |
1 | IgnoreCase |
Specifies case-insensitive matching. |
2 | Multiline |
Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string. |
4 | ExplicitCapture |
Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>...). This allows unnamed parentheses to act as non-capturing groups without the syntactic clumsiness of the expression (?:...). |
8 | Compiled |
Specifies that the Regular Expression
is compiled to an assembly. This yields faster execution, but increases
the startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method. |
16 | Singleline |
Specifies the single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). |
32 | IgnorePatternWhitespace |
Eliminates unescaped white space from the pattern, and enables comments marked with #. However, the IgnorePatternWhitespace value does not affect or eliminate white space in character classes. |
64 | RightToLeft |
Specifies that the search will be from right to left instead of from left to right. |
256 | ECMAScript |
Enables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase , Multiline , and Compiled values. The use of this value with any other value results in an exception. |
512 | CultureInvariant |
Specifies that cultural differences in language are ignored. |
Regex.Match
method.Examples
A. Following code gets all titles where the title consists of a word that starts with 'A' or 'a'.
Collapse |
select * from titles
where dbo.RegexMatch(title, '\b(A\S+)', 1) = 1
B. Sometimes it happens that you only need a certain part from a string. You can do this extraction using Transact-SQL, but you can use Regular Expressions
also. The following code demonstrates how to order author names ('pubs'
database) by last word in the surname, in case the surname consists of
more than one word, for instance, 'del Castillo', 'de Balzak', 'al
Mahdi', etc.
Collapse |
select au_id, (au_fname + ' ' + au_lname) au_name
from authors
order by
dbo.RegexStrMatch(au_lname, '(\w+)', 64)
The result is a sorted list of authors where 'Innes del Castillo' precedes 'Michel DeFrance'.Source code
Collapse |
using System; //String
using System.Data.SqlTypes; //SqlString, SqlInt32, SqlBoolean
using System.Text.RegularExpressions; //Match, Regex
using Microsoft.SqlServer.Server; //SqlFunctionAttribute
public partial class UserDefinedFunctions
{
/// <summary>
/// Searches the input string for an occurrence
/// of the regular expression supplied
/// in a pattern parameter with matching options
/// supplied in an options parameter.
/// </summary>
/// <param name="input">The string to be tested for a match.</param>
/// <param name="pattern">The regular expression pattern to match.</param>
/// <param name="options">A bitwise OR combination
/// of RegexOption enumeration values.</param>
/// <returns>true - if inputted string matches
/// to pattern, else - false</returns>
/// <exception cref="System.ArgumentException">
/// Regular expression parsing error.</exception>
[SqlFunction(Name="RegexMatch", IsDeterministic=true, IsPrecise=true)]
public static SqlBoolean RegexMatch(SqlString input,
SqlString pattern, SqlInt32 options)
{
if (input.IsNull)
{
return SqlBoolean.Null;
//if input is NULL, return NULL
}
if (pattern.IsNull)
{
pattern = String.Empty;
//""
}
if (options.IsNull)
{
options = 0; //RegexOptions.None
}
try
{
Match match = Regex.Match((string)input,
(string)pattern, (RegexOptions)(int)options);
if (match.Value != String.Empty)
{
return SqlBoolean.True;
}
}
catch
{
throw;
}
return SqlBoolean.False;
}
};
Could you please notify me if I have missed some functionality which you wish? Thank you!Points of Interest
This article is copied from here to spread the knowledge
No comments:
Post a Comment