Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: information retrieval


From: Harald Klein
Date: Sat, 27 May 2000 19:30:14 +0200
Grace wrote:



 > My research work is about search engine.
 > I'm wondering to know whether some papers or researchers apply data =
 > mining techniques
 > on  information retrieval/extraction from text documents.

Oh yes, there are some, but these people often do content analyses of
text. Nowadays content analysis of internet stuff is called web content
mining. With search engines one can find appropriate web sites, but many
search engines just process the meta-tags in head of a document, but
leave the rest untouched. What you really must do is to download the
sites you are interested in, you can use an offline-reader for that,
also IE 5.x supports this as far as I know.

I am currently working in this field, I wrote a special offline-reader
named TextGrab that downloads the texts and prepares them for the
analysis of my text analysis program TextQuest. That means, it auto
segments the text into units (the single file), assignes values to
external variables (e.g. file name, date, etc), and writes everything in
one single outpt file. This is required by most ext analysis software,
because ordinary offline-reader just copy the data structure with all
direcotires/folders and their files. For more information look at
http:/www.intext.de

Regards

Harald Klein
============================
Dr. Harald Klein
Social Science Consulting
Brückengasse 12
07407 Rudolstadt
Germany
Tel/Fax: +49 3672 488494
www.intext.de

 > Anyway, thanks in advances!!
 >
 > Regards,
 > Grace Hwang




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0