We have developed a new general formulation of the relative expression algorithm that is geared towards classification using small numbers of measured features (e.g. microRNAs). This is called the top-scoring ‘N’ algorithm (TSN), and it uses a flexible classifier size to expand the permutation and combination space available for classification. TSN relies on an unusual number system known as factoradics to easily translate between permutations of features and decimal numbers. The TSN algorithm has been tested with a number of microarray cancer datasets, demonstrating that the size of the classifier can yield statistically significant differences in cross validation accuracy. In addition, the TSN algorithm has been tested with the Microarray Quality Control II datasets, demonstrating comparable accuracy and very low overfitting when compared to support vector machines, Bayesian methods, logical regression and many other state-of-the-art classification schemes.
Data and Software File(s):
Contains old code and data
TSN code (w/update)