Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: RE: information retrieval


From: Ben Houston
Date: Sun, 28 May 2000 16:36:00 -0500

I believe there is quite a bit of work in this area.  As Harald Klien
already mentioned this area is called "web content mining".

A paper that I read this morning that has a good introduction to this field
as well as other aspects of data mining on the web is "Web Mining:
Information and Pattern Discoveery on the World Wide Web" by Cooley,
Mobasher and Srivastava

There is classic clustering done on search engine results in order to
provide more structured answers.  For example check http://i.am/grouper or
(an old idea of mine)http://www.exocortex.org/~ben/trendanalysis2.html or a
heirarchy clustering result generator called HyPursuit.

There is also classification that can be done on pages using the Yahoo
ontology as a training set and then trying to classify other pages into the
Yahoo categories.  I am sure there is a paper somewhere on this topic.

Also there exists a lot of work in the learning of user interests through
mining the types of content people prefer or matching similarity between
users.  For example there are the WebWatcher, Syskill & Webert, Fish &
Shark, GroupLens.

Also a somewhat new system that exists is the realtime associative
recommendation system called Kenjin (www.kenjin.com).

There are clustering algorithms created for automatically organizing your
bookmarks based on page content. (i.e. Maarek & Ben Shaul)

This is ignoring the aspect of data mining the link structure with engines
such as Google have had success with.

Also there is a whole field of search engines referred to as Learning
Bayesian Networks.  None are yet in full service but you might find the
research papers interesting.

Also if you go further there is word sense disambiguation techniques from
Statistical Natural Language Process which are starting to become practical
for use in large scale systems.

And automatically suggested query elaboration or refinemet based on global
document set cluster analysis and result set intersection.

Hope that gets you going.
-ben houston
---------------------------------------------------------
www:    http://www.exocortex.org/~ben
email:  ben@exocortex.org
Phone:  1(416)889-8249
---------------------------------------------------------




-----Original Message-----
From: owner-datamine-l@nautilus-sys.com
[mailto:owner-datamine-l@nautilus-sys.com]On Behalf Of Grace
Sent: Saturday, May 27, 2000 2:28 AM
To: Datamining list
Subject: DM: information retrieval



Hi, dear all,

My research work is about search engine.
I'm wondering to know whether some papers or researchers apply data =
mining techniques
on  information retrieval/extraction from text documents.
Anyway, thanks in advances!!

Regards,
Grace Hwang




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0