SMP: Mining Low-support Discriminative Patterns from Dense and High-dimensional Data

Submitted by on Dec 15 2014 } Suggest Revision
By: Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach and Vipin Kumar
From: Data Mining for Biomedical Informatics Group, University of Minnesota
Resource Type:
Data Format:


SupMaxK is†a family of†anti-monotonic measures of discriminative power, which can be used for† the efficient discovery of discriminative patterns from biological data with high density and high dimensionality (e.g. Gene Expression data), and especially for the discovery of those patterns with relatively low-support but high discriminative power (e.g. odds ratio, information gain, p-value etc), which complements existing discriminative pattern mining algorithms. Several experiments on a cancer gene expression dataset demonstrate that there are low-support patterns that can be discovered using SMP (SupMaxPair), but not by existing approaches, and that these patterns are statistically significant and biologically relevant.
Post comment