![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
RE: DM: Datamining tools ... black boxes...From: Marten den Uyl Date: Tue, 22 Feb 2000 12:22:40 +0100 Can't resist to join in when one of my favorite debates 'neural networks are black boxes!!' pops up again. Remember that the black box got its bad reputation back in de fifties and sixties when early cognitive psychologists criticised behaviorists' stubborn refusal to postulate any kind of mental machinery/information processing mechanisms in their theories. Behaviourists sticked to modelling only input/output, stimulus-respons contingencies, because they felt the machinery of the metaphorical 'black box' (i.e. the human mind) could not be uncovered by scientifically valid (testable) procedures. The original debate is about shallow (statistical contingencies) vs. deep (causal mechanisms) modelling of behaviour. Now to the current issue, which was really hot in the early nineties, are 'a trained Neural Network or the Memory Based Reasoning algorithm' (ANN and kNN) black boxes? Black box to whom? is the first question to ask, as Clay observes. People have different perspectives on algorithms, tools and their outcomes. Do you want to build a DM tool, or do you want to select prospects for your next marketing campaign? (Design vs. functional stance, Dennett; this roughly coincides with experts vs. laymen; developers vs. users). Warren is absolutely right, from a designer -or quality auditor- point of view: black boxishness of tools should only be rated by -lack of- tests and documentation on their workings, i.e. the algorithms applied. What flavor the algorithms are -fuzzy-genetic-neural-rules-neighbours- doesn't matter much in principle. >From the user point of view it is an entirely different story. First of all, users generally don't care about black boxes -unless you make it sound like a dangerous disease that could plague your business -black pox?. People happily use computers, VCRs, mobiles, stereos, information appliances, etc, all mostly black boxes with workings not fully understood by many. Sergei understates the issue in his contribution, since in fact not just some 'Data Mining tools are presented like black boxes' but in fact all current Data Mining tools really are black boxes to almost all their intended users. Data mining tools automatically perform shallow data analysis, they may find interesting patterns and dependencies in the data, but they rarely propose theories, causal models or testable hypotheses. And very few people really now what the tool is doing inside. Lastly, most users don't care about DM algorithms, they want to understand their customers. Ronny and Sergei -and many others in the field- are dead wrong in believing that some algorithms -e.g. rules and trees- lead to inherently less black boxish tools and applications than others e.g. neural nets and neighbours. From a designer point of view there certainly are differences in what can -easily- be done in an application with different flavors of algorithms, and personnally, I really appreciate it when people can get partisan about the taste of abstract algorithms. It is relatively easy to explain the mechanics of rules and decision trees, and of nearest neighbours by the way, and relatively difficult to explain why neural networks, genetic algorithms or rough sets are good function approximators. But a sensible user couldn't care less. Getting good explanations is what users really want from their ideal DM tool. Generating good explanations automatically is however quite difficult no matter what your algorithmic preferences are, since it involves lots of knowledge and common sense -and a bit of psychology. Rules and trees are off to a bad start for generating good explanations to users for two reasons: first, while short rules are simple, long rules are incomprehensible to human beings. It is, however, not easy to capture real life in short rules. Secondly, so called 'induced rules' are in fact only statistical descriptors (partionings of a given data set). However, since 'rules' and 'trees' bear a superficial similarity to common models of natural explanation (hierarchical specification, prototype and deviation) they tend to be enthusiastically mis-interpreted as explanations. (Statistics I: inductive vs. deductive statistics.) If users accept that an awkward statistical descriptor such as 'sex=males^age= 25 - 35^in-basket=diapers: in-basket=beer +x%' is sold to them as an explanation, they are getting short-selled. But then again, if that is the best there is around? Make your algorithms not black or white boxes, make them transparent, hide the mechanics from the user and picture the world -customers, markets, behaviours - as it is, or appears to be as a best estimate on current data. > -----Original Message----- > From: Ronny Kohavi [mailto:ronnyk@bluemartini.com] > Sent: Wednesday, February 16, 2000 10:39 AM > To: Warren Sarle > Cc: Nautilus Data Mining List > Subject: Re: DM: Datamining tools ... black boxes... > > > Warren Sarle wrote: > > > > "Some Data Mining tools are presented like black boxes" > simply because > > > they ARE black boxes. It is not possible to make much > sense of say, a > > > trained > > > Neural Network or the Memory Based Reasoning algorithm > results. You have > > > to believe the underlying math in order to trust their > predictions. > > > > That is not what "black box" means. "Black box" means you know what > > is supposed to go into it and you know what is supposed to come out > > of it, but you you don't know how the computation is done > inside the > > "box". It has nothing to do with interpretability of results. With > > respect to commercial software, "black box" means that the > algorithms > > used inside the software are undocumented. > > > > I beg to differ here, Warren; I agree with Sergei. > A black box means that you know the characteristics of the > construct/box, > but it's "black" because the internals are unspecified or not > understood > by the person looking at the box. > > For example, a black box that makes product recommendations > at a web site > can have clear input/outputs: > input=shopping basket and prior purchases, > output=top 3 products to recommend > You can study its input/outputs to try and glean insight, but that's > going to be very hard unless you can open that box and find something > you can understand. > > If the recommendation engine uses a neural network or > memory-based/nearest-neighbor > approaches, good luck explaining its behavior to a business > user (it will > remain a > black box). > If, however, the box makes recommendations based on decision rules or > trees, a business > user might understand what's inside, turning it from > black-box to white-box. > > -- Ronny >
|
MHonArc
2.2.0