Perturbations of the p53 pathway are associated with more aggressive and therapeutically refractory tumours. We preprocessed the data using Robust Multichip Analysis (RMA). Dataset has been truncated to the 1000 most informative genes (as selected by Wilcoxon test statistics) to simplify computation. The genes have been standardized to have zero mean and unit variance (i.e. z-scored).

breastcancer

Format

A data frame with 250 observations on 1001 variables. The first 1000 columns are numerical variables; the last column (named code) is a factor with levels case and control.

Source

Chris Holmes, c.holmes@stats.ox.ac.uk

Details

The factor code defines whether there was a mutation in the p53 sequence (code=case) or not (code=control).

References

Miller et al (2005, PubMed ID:16141321)

Examples


data(breastcancer)
bc <- breastcancer
pairs(bc[,1:5], col=bc$code)


train <- sample(1:nrow(bc), 50)
table(bc$code[train])
#> 
#>    case control 
#>      13      37 
if (FALSE) {
library(MASS)
z <- lda(code ~ ., data=bc, prior = c(1,1)/2, subset = train)
pc <- predict(z, bc[-train, ])$class
pc
bc[-train, "code"]
table(pc, bc[-train, "code"])
}