[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: Call for Participation: KDD-CUP-98From: Ismail Parsa Date: Wed, 1 Jul 1998 18:25:47 -0400 (EDT) +--------------------------------------------------------------------+ | CALL FOR PARTICIPATION | | | | KDD-CUP-98 | | | | The Second International Knowledge Discovery and | | Data Mining Tools Competition | | | | Held in Conjunction with KDD-98 | | | | The Fourth International Conference on Knowledge | | Discovery and Data Mining | | [www.kdnuggets.com] or | | [www-aig.jpl.nasa.gov/kdd98] or | | [www.aaai.org/Conferences/KDD/1998] | | | | Sponsored by the | | | | American Association for Artificial Intelligence (AAAI) | | Epsilon Data Mining Laboratory | | Paralyzed Veterans of America (PVA) | +--------------------------------------------------------------------+ KDD-CUP is a knowledge discovery and data mining (KDDM) tools competition held in conjunction with the International Conference on Knowledge Discovery and Data Mining. Last year, the CUP enjoyed worldwide participation of 45 data mining tools. The Gold Miner award was jointly shared by UCSD's BNB (Boosted Naive Bayes Classifier) software and Urban Science's GainSmarts software. SGI's MineSet was the runner-up and has earned the Bronze Miner award. For more information on KDD-CUP-97, please refer to the URL: www.epsilon.com/new. Some of the highlights from last year's competition are as follows: o The success of the Naive Bayes algorithm (used by 2 of the top 3 contestants) o No clear evidence backing the hypothesis that there are "real" returns to incremental data preprocessing activity. KDD-CUP-98 will follow on the success of last year's competition. The CUP is again open to all KDDM tool vendors, academics with research prototypes and corporations with significant applications. Attendance to the KDD-98 conference is not required to participate in the CUP. +--------------------------------------------------------------------+ | KDD-CUP Process and Important Dates | +--------------------------------------------------------------------+ o Registration and signing of the NDA (Non-Disclosure Agreement) July 1-15, 1998 o Release of the datasets (learning and validation), related documentation and the KDD-CUP questionnaire July 16, 1998 o Return of the results and the KDD-CUP questionnaire August 14, 1998 o KDD-CUP Committee evaluation of the results August 15-25 o Individual performance evaluations send to the participants August 25, 1998 o Public announcement of the winners and awards presentation during KDD-98 in New York City August 29, 1998 +--------------------------------------------------------------------+ | KDD-CUP Data Set | +--------------------------------------------------------------------+ The data set for this year's Cup has been generously provided by the Paralyzed Veterans of America (PVA). PVA is a not-for-profit organization that provides programs and services for US veterans with spinal cord injuries or disease. With an in-house database of over 13 million donors, PVA is also one of the largest direct mail fund raisers in the country. Participants in the CUP will demonstrate the performance of their tool by analyzing the results of one of PVA's recent fund raising appeals. This mailing was dropped in June 1997 to a total of 3.5 million PVA donors. It included a gift "premium" of personalized name & address labels plus an assortment of 10 note cards and envelopes. All of the donors who received this mailing were acquired by PVA through premium-oriented appeals like this. The analysis data set will include: o A subset of the 3.5 million donors sent this appeal o A flag to indicate respondents to the appeal and the dollar amount of their donation o PVA promotion and giving history o Overlay demographics, including a mix of household and area level data. Unlike least year, all available information about the fields will be made available in the project documentation. The objective of the analysis will be to identify response to this mailing -- a classification or discrimination problem. +--------------------------------------------------------------------+ | Performance Evaluation Criteria | +--------------------------------------------------------------------+ The CUP is aimed at recognizing the most accurate, innovative, efficient and methodologically advanced data mining tools in the marketplace. The participants will again be evaluated based on the performance of their algorithm on the validation or hold-out data set. The KDD-CUP program committee will consider the following metrics in their evaluations: o Lift curve or gains table analysis listing the cumulative percent of targets recovered in the top quantiles of the file o Receiver operating characteristics (ROC) curve analysis and the area under the ROC curve o Several statistical tests to ensure the robustness of the results. Last year, the performance in the top 10 percent of the file was considered as a measure of precision while the performance in the top 40 percent of the file was considered as a measure of stability and marketing coverage. The average performance up to the 40th percentile was also looked at as a measure of overall performance. +--------------------------------------------------------------------+ | KDD-CUP-97 Program Committee | +--------------------------------------------------------------------+ o Vasant Dhar, New York University, New York, NY o Tom Fawcett, Bell Atlantic, New York, NY o Georges Grinstein, University of Massachusetts, Lowell, MA o Ismail Parsa, Epsilon, Burlington, MA o Gregory Piatetsky-Shapiro, Knowledge Stream Partners, Boston, MA o Foster Provost, Bell Atlantic, New York, NY o Kyusoek Shim, Bell Laboratories, Murray Hill, NJ +--------------------------------------------------------------------+ | REGISTRATION BROCHURE | +--------------------------------------------------------------------+ All participants are required to complete the application form below and send it in plain ASCII format to (e-mail preferred): +-----------------------------+ | Ismail Parsa | | | | Epsilon | | 50 Cambridge Street | | Burlington MA 01803 USA | | | | E-MAIL: iparsa@epsilon.com | | V-MAIL: (781) 273-0250*6734 | | FAX: (781) 272-8604 | +-----------------------------+ The participants will receive the NDA (non-disclosure agreement) before the July 15, 1998 deadline. Please contact Ismail Parsa if you did not receive the NDA before July 15. Last year, the KDD-CUP program committee publicly announced the names of only the top 3 performing tools. The names of the 45 participants were not released. This year, although we will again only announce the names of the top 3 performing tools, we will make the list of participants publicly available UNLESS THE PARTICIPANTS INDICATE THAT THEY WILL PRESERVE THEIR ANONYMITY BY CHECKING THE APPROPRIATE BOX IN THE REGISTRATION BROCHURE. We think it's fair for everyone to know who they are competing with. -------------------------------- cut --------------------------------- KNOWLEDGE DISCOVERY CUP (KDD-CUP-98) Registration Brochure Name of software/product/tool/research prototype:_____________________ Name of vendor/institution:___________________________________________ KDD-CUP program committee will only announce the names of the top 3 performing tools. However, we intend to make the list of participants publicly available based on the box checked below. Please check the appropriate box: (_) List my tool's name as a participant (_) Do not list my tool's name as a participant. I wish to stay anonymous. Status of software/product/tool/research prototype: (_) Alpha (_) Beta (_) Production Release date of software/product/tool/research prototype (in YYMM or year/month format):___________________________________________________ Platform availability (check all that apply): (_) PC (_) UNIX (_) Mainframe (_) Parallel (SMP/MPP) (_) Other Systems architecture (check all that apply): (_) Client/Server (_) PC client only (_) UNIX Client only (_) PC/UNIX server only Built-in knowledge discovery and data mining methodology/technology (check all that apply): (_) Graphical User Interface (GUI) (_) Data Access to RDBMSs (_) Data Management (data processing, SQL, merge, summarize, aggregate, sorting, ranking, etc.) (_) Data Selection (random sampling, Nth selection, etc.) (_) Data Preprocessing (missing value/outlier treatment, symbol mapping, binning/discretization, normalization, etc.) (_) Exploratory Data Analysis (descriptive statistics, data/ knowledge visualization, etc.) (_) Collinearity Screening/Redundancy Elimination (_) Variable Subset Selection (_) Link Analysis (Associations, Sequences, etc.) (_) Clustering or Segmentation (K-means, Kohonen clustering, etc.) (_) Time Series Analysis (_) Classification or Discrimination (for categorical/symbolic targets) (_) Prediction or Regression (for continuous/numeric targets) (_) Multiple Learned or Combined Models (boosting, arching, bagging, etc.) (_) Data Postprocessing (model deployment/scoring, modeling project manager, model performance tracking, link to business process etc.) (_) Other, specify:______________________________________________ Data mining algorithms (check all that apply and specify the algorithm(s) in the space provided): (_) Supervised Neural Networks (MLP, RBF, etc.):_________________ (_) Statistical Methods (Logistic, OLS, MARS, PPR, GAM, Nearest Neighbors, etc.):____________________________________________ (_) Decision Trees and Rules (ID3, C4.5, CHAID, CARTŪ , etc.):_______________________________________________________ (_) Hybrid Systems (Neuro-fuzzy systems, GA optimized neural/ decision tree systems, etc.):________________________________ (_) Case-Based Reasoning (_) Other Supervised Methods (Bayesian methods, decision tables, etc.):_______________________________________________________ (_) Unsupervised Algorithms (Kohonen networks, K-means clustering, SOM, etc.):______________________________________ (_) Associations and Sequence Discovery:_________________________ (_) Other, specify: _____________________________________________ Note: The numbers requested below will only be used to compute participant summary statistics and do not serve any other purpose. Is your software/product/tool/research prototype: A freeware: (_) Yes (_) No Commercially available for purchase: (_) Yes (_) No If 'yes' to above, Price (in US$):___________________________ Number of sites installed:_______________________________________ Other relevant information:___________________________________________ PRIMARY CONTACT: Name..........................: E-mail Address................: Phone Number..................: FAX Number....................: Title.........................: Name of Company/Institution...: Mailing Address...............: SECONDARY CONTACT: Name..........................: E-mail Address................: Phone Number..................: Title.........................: Name of Company/Institution...: Mailing Address...............: ---------------------------------- cut ---------------------------------
|
MHonArc
2.2.0