RE: DM: Datamining tools ... black boxes...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]
RE: DM: Datamining tools ... black boxes...

From: Marten den Uyl
Date: Tue, 22 Feb 2000 12:22:40 +0100

Can't resist  to join in when one of my favorite debates 'neural networks
are black boxes!!' pops up again. Remember that the black box got its bad
reputation back in de fifties and sixties when early cognitive psychologists
criticised behaviorists' stubborn refusal to postulate any kind of mental
machinery/information processing mechanisms in their theories. Behaviourists
sticked to modelling only input/output, stimulus-respons contingencies,
because they felt the machinery of the metaphorical 'black box' (i.e. the
human mind) could not be uncovered by scientifically valid (testable)
procedures. The original debate is about shallow (statistical contingencies)
vs. deep  (causal mechanisms) modelling of behaviour.

Now to the current issue, which was really hot in the early nineties, are 'a
trained Neural Network or the Memory Based Reasoning algorithm' (ANN and
kNN)  black boxes? Black box to whom? is the first question to ask, as Clay
observes. People have different perspectives on algorithms, tools and their
outcomes. Do you want to build a DM tool, or do you want to select prospects
for your next marketing campaign? (Design vs. functional stance, Dennett;
this roughly coincides with experts vs. laymen; developers vs. users).

Warren is absolutely right, from a designer -or quality auditor- point of
view: black boxishness of tools should only be rated by -lack of-  tests and
documentation on their workings, i.e. the algorithms applied. What flavor
the algorithms are -fuzzy-genetic-neural-rules-neighbours- doesn't matter
much in principle.

 >From the user point of view it is an entirely different story. First of all,
users generally don't care about black boxes -unless you make it sound like
a dangerous disease that could plague your business -black pox?. People
happily use computers, VCRs, mobiles, stereos,   information appliances,
etc, all mostly black boxes with workings not fully understood by many.

Sergei understates the issue in his contribution, since in fact not just
some  'Data Mining tools are presented like black boxes' but in fact all
current Data Mining tools really are black boxes to almost all their
intended users. Data mining tools automatically perform shallow data
analysis, they may find  interesting patterns and dependencies in the data,
but they rarely propose theories, causal models or testable hypotheses. And
very few people really now what the tool is doing inside.

Lastly, most users don't care about DM algorithms, they want to understand
their customers. Ronny and Sergei -and many others in the field- are dead
wrong in believing that some algorithms -e.g. rules and trees- lead to
inherently   less black boxish tools and applications than others e.g.
neural nets and neighbours. From a designer point of view there certainly
are  differences in what can -easily- be done in an application with
different flavors of algorithms, and personnally, I really appreciate it
when people can get partisan about the taste of abstract algorithms. It is
relatively easy to explain the mechanics of rules and decision trees, and of
nearest neighbours by the way, and relatively difficult to explain why
neural networks, genetic algorithms or rough sets are good function
approximators. But a sensible user couldn't care less.

Getting good explanations is what users really want from their ideal DM
tool. Generating good explanations automatically is however quite difficult
no matter what your algorithmic preferences are, since it involves lots of
knowledge and common sense -and a bit of psychology.

Rules and trees are off to a bad start for generating good explanations to
users for two reasons: first, while short rules are simple, long rules are
incomprehensible to human beings. It is, however, not easy to capture real
life in short rules.
Secondly, so called 'induced rules' are in fact only statistical descriptors
(partionings of a given data set). However, since 'rules' and 'trees'  bear
a superficial similarity to common models of natural explanation
(hierarchical specification, prototype and deviation) they tend to be
enthusiastically mis-interpreted as explanations. (Statistics I: inductive
vs. deductive statistics.)

If users accept that an awkward statistical descriptor such as
'sex=males^age= 25 - 35^in-basket=diapers: in-basket=beer +x%' is sold to
them as an explanation, they are getting short-selled.

But then again, if that is the best there is around?

Make your algorithms not black or white boxes, make them transparent, hide
the mechanics from the user and picture the world -customers, markets,
behaviours -   as it is, or appears to be as a best estimate on current
data.



 > -----Original Message-----
 > From: Ronny Kohavi [mailto:ronnyk@bluemartini.com]
 > Sent: Wednesday, February 16, 2000 10:39 AM
 > To: Warren Sarle
 > Cc: Nautilus Data Mining List
 > Subject: Re: DM: Datamining tools ... black boxes...
 >
 >
 > Warren Sarle wrote:
 >
 >  > > "Some Data Mining tools are presented like black boxes"
 > simply because
 >  > > they ARE black boxes. It is not possible to make much
 > sense of say, a
 >  > > trained
 >  > > Neural Network or the Memory Based Reasoning algorithm
 > results. You have
 >  > > to believe the underlying math in order to trust their
 > predictions.
 >  >
 >  > That is not what "black box" means. "Black box" means you know what
 >  > is supposed to go into it and you know what is supposed to come out
 >  > of it, but you you don't know how the computation is done
 > inside the
 >  > "box". It has nothing to do with interpretability of results. With
 >  > respect to commercial software, "black box" means that the
 > algorithms
 >  > used inside the software are undocumented.
 >  >
 >
 > I beg to differ here, Warren; I agree with Sergei.
 > A black box means that you know the characteristics of the
 > construct/box,
 > but it's "black" because the internals are unspecified or not
 > understood
 > by the person looking at the box.
 >
 > For example, a black box that makes product recommendations
 > at a web site
 > can have clear input/outputs:
 >     input=shopping basket and prior purchases,
 >     output=top 3 products to recommend
 > You can study its input/outputs to try and glean insight, but that's
 > going to be very hard unless you can open that box and find something
 > you can understand.
 >
 > If the recommendation engine uses a neural network or
 > memory-based/nearest-neighbor
 > approaches, good luck explaining its behavior to a business
 > user (it will
 > remain a
 > black box).
 > If, however, the box makes recommendations based on decision rules or
 > trees, a business
 > user might understand what's inside, turning it from
 > black-box to white-box.
 >
 >     -- Ronny
 >
Prev by Date: DM: Customer Retention Analysis using Induction of Decision Tree
Next by Date: DM: PKDD-2000 Workshops and Symposia Program - CALL FOR PROPOSALS
Prev by thread: RE: DM: Datamining tools ... black boxes...
Next by thread: RE: DM: Datamining tools ... black boxes...
Index(es):
- Date
- Thread