Monday, February 20, 2012

PLEASE HELP, I am stuck

Hi

everybody and happy new year to all of you.

I am asking

for your help because I am really not getting anywhere with my project. So you

guys are my last hope.

I am a

newbie to data mining and I am doing my best to learn the basics so that I get

started with my project.

I guess my

problem, is the fact that I am new to data mining and I have no clue how to get

started.The approach I have been

following is trying to read books about data mining and machine learning so

that I can understand and compare all the "numerous" algorithms out

there and then try to find to the one(s) that would apply to my case.

The first

problem with this approach is that I did not find any good resources (either

the subjects are treated on the surface or they are overly made complex and

hard to follow through). The second problem is that it is incredibly time

consuming.So I am wondering if I should

continue in this path or if I should try to proceed differently.

I am sure

that a lot of you guys have been in the same position and some of you have

struggled with this problem just like me.

So, I am hopping that you guys would suggest a method that would help me

get started.

The project

I am working on is related to the field of agriculture and has as objective to

try to find the best values of all the parameters that affect the outcome (the

amount of meat produced) of an animal production (could be dairy, poultry,

porch, etc...)

So as I

said, the approach is to run one or more algorithms on historical data for a

certain type of production (poultry for example) and trying tofind what should be the best values for the

operating conditions that would maximize the growth of the animals (weight), while trying to minimize the

production costs. A few examples of the questions that this project is trying

to solve are as follows: when is the best time and how long should the barns be

light? When and how much food should we give the animals? What is the best

operating temperature set point? When and how much cooling/heating should be

done? , etc....

As you

noticed, all these questions are concerned with the optimization of the

operating conditions but most importantly, the reduction of operating costs.

Huge amounts (10's of Go) of historical data for these operating conditions are

to be used for this purpose.

I hope that

you guys would kind enough to help me work my way trough this. I would

appreciate your help and advice and I thank in advance all of you who took the

time to read this lengthy post

Cheers.

This is kind of a complex problem to be answered simply in a forum. First of all, you won't be giving 10's of gigs of data to the algorithms - not that they can't take it, but rather because it's likely not a valid solution, or they don't need it.

What you need to do is to clarify the problem you are trying to solve - e.g. maximize animal weight - determine the variables you want to check - e.g. the operating conditions - and run models on those. You should be able to generate a representative sample of data that is small enough for the models to run in a reasonable time.

There are many approaches you could take - for instance you could run a clustering model against all variables and see if weight is evenly distributed or is discriminated among other variables - it would be good to limit the variables for such an approach. You can use trees, naive bayes, neural nets, and even association rules to see the patterns of how variables are related. Most importantly, you need to be specific about your problem and your inputs.

If it makes a difference, you may want to create new variables based on existing inputs. For example, maybe the difference between the daily high and low temps is more predictive than the actual temperatures themselves? There seem to be many hypotheses to test and it sounds like fun!

Good luck and feel free to ask any additional questions in this forum or the newsgroup. Tips and tricks and other articles can be found at www.sqlserverdatamining.com

No comments:

Post a Comment