Sunday, September 30, 2007

Weighted Average: An Example of Enhancing SQL Server Functionality

This entry discusses how to enhance SQL Server functionality by adding in a simple aggregation function, the weighted average. Ideally, we would want to use the function as follows:

SELECT AVG(purban), WAVG(purban, population)
FROM zipcensus zc

(The zipcensus table contains summaries of census information by zip code and is available at the companion page for Data Analysis Using SQL and Excel.)

This contains two unfortunatelies. First, SQL does not contain a weighted average function, so we have to define one ourselves. Second, aggregation functions cannot take two arguments, so we have to work around this. The call looks like this in practice:
SELECT AVG(purban), ud.dbo.WAVG(wval)
FROM (SELECT zc.*,
ud.dbo.CreateWeightedValue(purban, population) as wval
FROM zipcensus zc) zc

This posting references two files:
The second is the code for the new objects. The first loads these into the database.

How Wavg is Implemented
The Wavg() aggregation function is implemented as a user-defined aggregate in blog1.dll. As we'll see in the next posting, an aggregation is really a class in C# (or other .NET language) that has certain specific characteristics.

The weighted average requires two values -- the value and the weight -- to do the calculation. Unfortunately, SQL Server does not allow aggregations to take more than one argument, even user defined aggregations. Fortunately, we can define a new type, WeightedValue, that contains these two elements. This type define the following methods (attributes and functions):
  • Null (an attribute) is the NULL value for the type.
  • Value (an attribute) returns the value component.
  • Weight (an attribute) returns the weight component.
  • ToString() (a function) converts it to a string.
  • Parse() (a function) converts the string back into the type.
Note that once the type is defined, it can be used as a colum in a table or as a T-SQL variable.

The type itself does not define a function that creates the type. For this, we need a user defined function, CreateWeightedValue(). This function takes two arguments, which are both FLOATs -- one for the value and one for the weight. SQL Server converts any numeric argument to a FLOAT, so this function can take integers, floats, numerics, and so on as arguments.


Assemblies, Functions, Types and All That Jazz
SQL Server loads DLLs (dynamic link libraries) into an object called an assembly. The code for the DLL is actually placed in the database. Fortunately, the .NET architecture makes it possible to load the same code, even if the code is written on a different machine from the one where it is being run. The DLL code is not actually machine code, but a low-level language that is quickly recompiled as it is run.

An assembly can contain objects such as the following:
  • User defined functions (such as CreateWeightedValue());
  • User defined types (such as WeightedValue());
  • User defined aggregates (such as Wavg());
  • User defined procedures; and,
  • User defined table functions.
User defined types and aggregates are defined as classes in C#. User defined functions and procedures are defined as methods within a single class. For convenience, all user defined functions are defined in a class called UserDefinedFunctions.

Loading an Assembly The First Time

Although an assembly contains definitions for the objects in it, creating an assembly does not load the objects. The following are the steps for loading an assembly:

1) Create the assembly using the CREATE ASSEMBLY statement:

CREATE ASSEMBLY ud
FROM 'c:\\gordon\blog\blog1.dll'
WITH PERMISSION_SET = SAFE;

This loads the DLL stored in the location "c:\\gordon\blog\blog1.dll" into the database.

The clause "WITH PERMISSION_SET = SAFE" simply says that the DLL is very well-behaved and should not introduce any problems. This is good, because you need special admistrator priviledges to load unsafe code into the database. And, you cannot fool SQL Server (at least easily). It validates the code to see whether any unsafe features (such as unions or pointers or file i/o) are being used.

(2) Define the objects in the DLL:

CREATE TYPE WeightedValue
EXTERNAL NAME ud.WeightedValue;

GO

CREATE FUNCTION CreateWeightedValue (@val float, @wgt float)
RETURNS WeightedValue as
EXTERNAL NAME ud.UserDefinedFunctions.CreateWeightedValue
GO

CREATE AGGREGATE WAvg (@val WeightedValue)
RETURNS float
EXTERNAL NAME ud.WAvg
GO

All of these use the EXTERNAL NAME option for the CREATE command. This "links" the user defined function in SQL to the code that defines it. The external name is of the form . (for types and aggregates) or .. (for functions).


Loading an Assembly Again
Unfortunately, once an assembly is loaded and the objects defined, the following command returns an error:

DROP ASSEMBLY ud

The error is:

DROP ASSEMBLY failed because 'ud' is referenced by object 'CreateWeightedValue'.

Oops. SQL Server is smart enough to know that objects in the database rely on the assembly. Alas, it is not smart enough to drop them. For that, we need code such as the following:

IF OBJECT_ID('Wavg') IS NOT NULL
DROP AGGREGATE Wavg
GO

IF OBJECT_ID('CreateWeightedValue') IS NOT NULL
DROP FUNCTION CreateWeightedValue
GO

IF (SELECT COUNT(*) FROM sys.types WHERE UPPER(name) = 'WEIGHTEDVALUE') > 0
DROP TYPE WeightedValue
GO

IF (SELECT COUNT(*) FROM sys.assemblies WHERE UPPER(name) = 'UD') > 0
DROP ASSEMBLY ud
GO

This code checks to see if each object exists before deleting the object. This prevents errors from occuring. Notice that the TYPE is dropped after the AGGREGATE and FUNCTION. This is because the TYPE is used by them, and cannot be dropped while they reference it. All this code is in blog1-load.sql.


Using Wavg()

As mentioned at the beginning of the post, the right way to use this function is as follows:


SELECT AVG(purban), ud.dbo.WAVG(wval)
FROM (SELECT zc.*,
ud.dbo.CreateWeightedValue(purban, population) as wval
FROM zipcensus zc) zc

The innermost subquery reads from the table zipcensus, returning all the values in the row, plus a new column containing the weighted value. This is passed to the outermost query and used as the argument to the WAVG function.

By the way, the function can be used on any numeric type (integer, float, numeric, and so on), because SQL Server will convert all numeric values to FLOAT (the type of the arguments to CreateWeightedValue()).

We can verify that weighted value works, using code such as:

SELECT wval.Value, wval.Weight, wval.ToString()
FROM (SELECT ud.dbo.CreateWeightedValue(3, 4.5) as wval) a

This returns the values placed into the function.

Although this example does not require the functionality, user defined types can be placed in tables and used as variables in T-SQL code.


8 Comments:

Blogger MrFrijole said...

How do you make this work with null values? Meaning how do you declare a null WeightedValue type? I need this functionality, and you're the only one that I've found has a solution.

December 16, 2008 at 2:38 PM  
Blogger Iggy said...

if I had to do a subquery for weighted average, wouldn't it be easier to just sum(weight*value)/sum(weight)?
I've been searching on the net for a way to simplify that query into wavg(weight,value). Alas, thanks for the post, it was informative.

October 20, 2009 at 4:15 PM  
Blogger Steve Roger said...

The term "low-code app development" didn't exist until a few years ago but the concept isn't a new one. business users who see an opportunity to optimize a process and take it upon themselves to create their own apps. Rather than spend the time and manual effort to code an app from scratch that is made up of common features and components, low code development platforms let the developers work from existing templates and drag prebuilt elements, forms, and objects together to get a particular department or team the simple working app they need with a lot less hassle.

March 6, 2020 at 7:42 AM  
Blogger Haider Jamal Abbasi (iAMHJA) said...

Steam Names
Discord Names
Highest Paying AdSense Niche

October 20, 2020 at 12:53 PM  
Blogger Collabloud Limited said...

Great Post By a Great Author. Through the post, I really get some important knowledge about SQL server which is really mind-blowing. Thanks for posting it. For best Database Replication Tools Online, you can get at Repstance.

October 30, 2020 at 11:22 AM  
Blogger Olivia bush said...

It's a nice article, Which you have shared here about the Logic Functions In Excel. Your article is very informative and I really liked the way you expressed your views in this post. Thank you.Logic Functions In Excel Online

February 9, 2021 at 12:15 PM  
Blogger Olivia bush said...

Thanks for sharing the best information and suggestions, it is very nice and very useful to us. I appreciate the work that you have shared in this post. Keep sharing these types of articles here.How To Use Search Function In Excel

February 9, 2021 at 12:47 PM  
Blogger sri said...

Thanks for sharing this.,
Leanpitch provides online training in Scrum Master during this lockdown period everyone can use it wisely.
Join Leanpitch 2 Days CSM Certification Workshop in different cities.
Scrum master certification
csm certification

March 18, 2021 at 3:43 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home