[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: information retrievalFrom: Harald Klein Date: Sat, 27 May 2000 19:30:14 +0200 Grace wrote: > My research work is about search engine. > I'm wondering to know whether some papers or researchers apply data = > mining techniques > on information retrieval/extraction from text documents. Oh yes, there are some, but these people often do content analyses of text. Nowadays content analysis of internet stuff is called web content mining. With search engines one can find appropriate web sites, but many search engines just process the meta-tags in head of a document, but leave the rest untouched. What you really must do is to download the sites you are interested in, you can use an offline-reader for that, also IE 5.x supports this as far as I know. I am currently working in this field, I wrote a special offline-reader named TextGrab that downloads the texts and prepares them for the analysis of my text analysis program TextQuest. That means, it auto segments the text into units (the single file), assignes values to external variables (e.g. file name, date, etc), and writes everything in one single outpt file. This is required by most ext analysis software, because ordinary offline-reader just copy the data structure with all direcotires/folders and their files. For more information look at http:/www.intext.de Regards Harald Klein ============================ Dr. Harald Klein Social Science Consulting Brückengasse 12 07407 Rudolstadt Germany Tel/Fax: +49 3672 488494 www.intext.de > Anyway, thanks in advances!! > > Regards, > Grace Hwang
|
MHonArc
2.2.0