[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: RE: information retrievalFrom: Ben Houston Date: Sun, 28 May 2000 16:36:00 -0500 I believe there is quite a bit of work in this area. As Harald Klien already mentioned this area is called "web content mining". A paper that I read this morning that has a good introduction to this field as well as other aspects of data mining on the web is "Web Mining: Information and Pattern Discoveery on the World Wide Web" by Cooley, Mobasher and Srivastava There is classic clustering done on search engine results in order to provide more structured answers. For example check http://i.am/grouper or (an old idea of mine)http://www.exocortex.org/~ben/trendanalysis2.html or a heirarchy clustering result generator called HyPursuit. There is also classification that can be done on pages using the Yahoo ontology as a training set and then trying to classify other pages into the Yahoo categories. I am sure there is a paper somewhere on this topic. Also there exists a lot of work in the learning of user interests through mining the types of content people prefer or matching similarity between users. For example there are the WebWatcher, Syskill & Webert, Fish & Shark, GroupLens. Also a somewhat new system that exists is the realtime associative recommendation system called Kenjin (www.kenjin.com). There are clustering algorithms created for automatically organizing your bookmarks based on page content. (i.e. Maarek & Ben Shaul) This is ignoring the aspect of data mining the link structure with engines such as Google have had success with. Also there is a whole field of search engines referred to as Learning Bayesian Networks. None are yet in full service but you might find the research papers interesting. Also if you go further there is word sense disambiguation techniques from Statistical Natural Language Process which are starting to become practical for use in large scale systems. And automatically suggested query elaboration or refinemet based on global document set cluster analysis and result set intersection. Hope that gets you going. -ben houston --------------------------------------------------------- www: http://www.exocortex.org/~ben email: ben@exocortex.org Phone: 1(416)889-8249 --------------------------------------------------------- -----Original Message----- From: owner-datamine-l@nautilus-sys.com [mailto:owner-datamine-l@nautilus-sys.com]On Behalf Of Grace Sent: Saturday, May 27, 2000 2:28 AM To: Datamining list Subject: DM: information retrieval Hi, dear all, My research work is about search engine. I'm wondering to know whether some papers or researchers apply data = mining techniques on information retrieval/extraction from text documents. Anyway, thanks in advances!! Regards, Grace Hwang
|
MHonArc
2.2.0