Tuesday, March 20, 2012

Plug-in algorithm in data mining using sql server 2005-- modification for association , classifi

managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.

This package includes the source code for a sample plug-in algorithm written in C#.

in this source code all .cs files are modified for clustering algorithm

if my plugin algorithm is of association or classification type then what modifications are requried in source code?

Both for classification and for association, you would have to remove the Clustering specific methods from the algorithm object ( ClusterMembership and CaseLikelihood)

If you want to implement an associations algorithm that behaves (in terms of modeling and querying) like Microsoft_Association_Rules, make sure that your plug-in supports:

- nested tables (both input and outputs) (in Metadata.GetSupInputcontentTypes and GetSupPredictContentTypes)

- PredictAssociation (start by enabling this in Algorithmmetadata.GetSupportedStandardFunctions)

For details on how to actually handle nested tables, check the "Attributes and modeling", "The Case processor" and "The Mining Case Object" articles in the CHM file coming with the sample, as well as the documentation for the Predict method of the AlgorithmBase object (same CHM file) and the comments in the sample implementation of this method

Hope this helps

bogdan

|||

is it possible to configure whole code for any type of algorithm

|||

You can have any type of algorithm that works with attribute value pairs. E.g. an attribute "Gender" can have the discrete values "Male","Female", etc., and an attribute Age could have a continuous value such as 45. The majority of data mining algorithms fit this pattern - e.g. association, classification, etc.

If you need direct access to strings e.g. BLAST, this approach does not work, and is not currently supported by our plug-in API's (there are still BLAST implementations for SQL Server, however). Note that most string-based problems can be solved by first preprocessing the strings into attributes and then using a traditional algorithm approach. There is some flexibility that you could have you algorithm examine input strings during the training phase, but this approach is not generally recommended and exact strings that were not encountered during the training phase will not be available during the prediction phase in this version.

|||

why there is a requriment of making changes in source code?

i want all in dll

like algorithm should be provided in dll

application will configure itself and add new algorithm

hence i want to generalize all

|||You only need to create the source code for your algorithm. I'm very confused by the direction of the question. The framework allows you to create a DLL or assembly that encapsulates your algorithm and you can plug it into the server directly. The framework is generalized to handle any algorithm that works with attributes and values, e.g. association, classification, estimation, etc.|||

what modifications required for "time series" algorithm ?

is there any modifications we need to do in framework for adding "time series plugin algorithm"

|||

If you want to use the same Time Series function as Microsoft Time Series (i.e. PredictTimeSeries), then:

- your class implementing the IDMAlgorithm should also implement IDMTimeSeriesAlgorithm (both interfaces are defined in the same file, dmalgo.h/idl)

- your metadata class should add DMSF_PredictTimeSeries to the list of supported functions

You could also choose not to use our infrastructure. In this case, the requirements mentioned above do not apply: you can write your own algorithm function to perform forecasting

No comments:

Post a Comment