Re: DM: information retrieval

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: information retrieval

From: Harald Klein
Date: Sat, 27 May 2000 19:30:14 +0200

Grace wrote:

 > My research work is about search engine.
 > I'm wondering to know whether some papers or researchers apply data =
 > mining techniques
 > on  information retrieval/extraction from text documents.

Oh yes, there are some, but these people often do content analyses of
text. Nowadays content analysis of internet stuff is called web content
mining. With search engines one can find appropriate web sites, but many
search engines just process the meta-tags in head of a document, but
leave the rest untouched. What you really must do is to download the
sites you are interested in, you can use an offline-reader for that,
also IE 5.x supports this as far as I know.

I am currently working in this field, I wrote a special offline-reader
named TextGrab that downloads the texts and prepares them for the
analysis of my text analysis program TextQuest. That means, it auto
segments the text into units (the single file), assignes values to
external variables (e.g. file name, date, etc), and writes everything in
one single outpt file. This is required by most ext analysis software,
because ordinary offline-reader just copy the data structure with all
direcotires/folders and their files. For more information look at
http:/www.intext.de

Regards

Harald Klein
============================
Dr. Harald Klein
Social Science Consulting
Brückengasse 12
07407 Rudolstadt
Germany
Tel/Fax: +49 3672 488494
www.intext.de

 > Anyway, thanks in advances!!
 >
 > Regards,
 > Grace Hwang

Prev by Date: DM: Bagging, Boosting, Bundling capabilities
Next by Date: Re: DM: AW: Classification problem
Prev by thread: Re: DM: information retrieval
Next by thread: DM: Final call for papers: Application of AI in industry
Index(es):
- Date
- Thread