Wednesday, September 26, 2007

Extending SQL Server to Support Some Statistical and Data Mining Functionality


My most recent book, Data Mining Using SQL and Excel (order here), is about combining the power of databases and Excel for data analysis purposes. From working on that book, I have come to feel that SQL and data mining are natural allies, since both are about making sense of large amounts of data.

A surprising observation (at least to me) is that SQL operations are analogous to data mining operations. In many ways, aggregating data -- summarizing it along dimensions -- is similar to building models, since both are about capturing underlying structure in the data. And, in some cases, joining tables is similar to scoring models, since joining takes information from one row and "adds in" new information.

This idea has intrigued me since finishing the final draft. So, I decided to embark on an adventure. This adventure is to extend SQL functionality to include various types of models. My goal is to make data mining functionality a natural part of using SQL. Okay, that is a bit ambitious, because any SQL extension tends to look "grafted" onto the basic language. However, it is possible to add the concept of a "statistical model" to SQL and see where that goes.

The purpose of this blog is to capture the interesting ideas that I learn and put them in one place. I have already learned a lot about SQL, statistics, C#, and .NET programming by starting this endeavor. In addition, I also want to make the code available to other people who might find it useful.

For various reasons that I discuss in my first technical post, I have decided to implement this scenario using .NET (that is, C# and Microsoft SQL Server). By the way, this is not because of a great love for Microsoft development environments; I have very painful memories of trying to use very buggy release versions of Microsoft Visual C++ in the late 1980s. I am learning this environment "as I go", since I had never programmed in C# before April of this year.

I already have some ideas for upcoming posts:
  • Introduction to .NET for Extending SQL Server
  • Adding A Useful Function: Weighted Averages
  • Two More Useful Functions: MinOF and MaxOF
  • What is a Marginal Value Model?
  • Implementing A Basic Marginal Value Model
  • What is a Linear Regression Model?
  • Implementing A Linear Regression Model
  • Model Management and the Marginal Value Model
  • What is a Naive Bayesian Model?
  • Implementing a Naive Bayesian Model
  • What is a Survival Model?
  • Implementing a Survival Model
I do not have a schedule in mind, but this is an adventure and I'm very curious where it will lead.

4 Comments:

Blogger aliyaa said...

Our basic vision is to promote statistical analysis and data mining method for large data specially when it become uncountable for the people. This provide best result of collecting data in orderly.

June 9, 2016 at 2:00 PM  
Blogger Sankar said...

Great Article
Data Mining Projects IEEE for CSE
Project Centers in Chennai

JavaScript Training in Chennai
JavaScript Training in Chennai

January 4, 2019 at 1:31 PM  
Blogger Victoria said...

Very relevant to small business data backup , enjoyed reading.

November 26, 2019 at 3:26 AM  
Blogger Priya Jai said...

Such a nice article thanks for sharing this with us. Really so impressible and interesting post. You’re doing a great job Man, Keep it up.
Excel Training in Chennai
Excel Course in Chennai
Tableau Training in Chennai
Linux Training in Chennai
Oracle Training in Chennai
Advanced Excel Training in Chennai
Graphic Design Courses in Chennai
Oracle DBA Training in Chennai
Pega Training in Chennai
corporate training in chennai
Power BI Training in Chennai
Excel Training in Anna Nagar

March 19, 2020 at 5:10 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home