Grouped variable selection in high dimensional partially. We derive the distribution of the minimal depth and use it for highdimensional variable selection using random survival forests. Survival data screening in highdimensional data feature selection of ultrahighdimensional covariates with survival outcomes. Selected papers from the 46th statistical computing conference, 2023 july 20, reisenberg, germany, pp. Variable selection methods for rightcensored timeto. Bayesian variable selection in semiparametric proportional. We propose a method to tackle these challenges simultaneously and obtain a robust estimation of detecting significant genes related to survival outcomes based on accelerated failure time aft. An overview on variable selection for survival analysis. Sterken, and in accordance with the decision by the college of deans. Bayesian variable selection in high dimensional survival. For highdimensional data applications, however, computing these measures as resubstitution statistics on the same data used for model development results in highly biased estimates. First, a cox elastic net encox approach is outlined that is based on the cox proportional hazards model and utilizes modifications of the algorithms proposed by tibshirani 1997 and gui and li 2005b.
Two example applications on prostate and breast cancer confirmed these results. Automatic model selection for highdimensional survival analysis special issue. Comparison of variable selection methods for highdimensional survival data with competing events. Pdf highdimensional variable selection for survival data. Using a real case study and 4 simulated data, we have discussed the main advantages and limits of each approach. In this paper, we propose a bayesian variable selection scheme for a bayesian semiparametric survival model for right. Highdimensional variable selection for ordinal outcomes. In this paper, two elastic net based variable selection methods for highdimensional low sample size timetoevent data are presented. Bin nan in this dissertation, we aim to solve important highdimensional variable selection problems with either structured multivariate or discrete survival outcomes, with. Wall columbia university we propose a multiple imputation random lasso mirl method to select important variables and to predict the outcome for an epidemiological study of eating and activity in teens in the presence of. Marginal screening for highdimensional survival data. Analytic for datadriven decisionmaking in complex high.
Nonparametric independence screening and structure identification for ultrahigh dimensional longitudinal data cheng, mingyen, honda, toshio, li, jialiang, and peng, heng, annals of statistics, 2014. Highdimensional variable selection for multivariate and survival data with applications to brain imaging and genetic association studies by yanming li chair. The challenges, however, in ultrahigh dimensional space are not only to reduce the dimensionality of the data, but. Variable selection for high dimensional data has recently received a great deal of attention. Our method gives consistent variable selection under certain conditions. We present an approach that addresses both aspects in highdimensional survival models. We formulate the problem as a partially linear additive cox model with highdimensional data.
Read about our statistical work in variable selection, organized by data type. Data used in highdimensional variable selection methods download. We compare the performance of these approaches with six other variable selection techniquesthree are generally used for censored data and the other three are correlationbased greedy methods used for highdimensional data. Building prognostic models of clinical outcomes is an increasingly important research task and will remain a vital area in genomic medicine. In this thesis work, we investigate the impact of a penalized cox regression procedure on regularization, parameter estimation, variable group selection, and nonparametric modeling of nonlinear eects with a timetoevent outcome. Highdimensional variable selection for multivariate and. This paper presents a multipurpose analytic model and practical nonparametric methods to analyze rightcensored timetoevent data with highdimensional covariates. The two frameworks share a general idea of augmenting the data with artificial null variables to serve as benchmarks for the purpose of variable selection. Quantile adaptive modelfree variable screening for high. In order to reduce redundant information and to facilitate practical interpretation, variable inefficiency in failure time is determined for the specific field of application. Robust model selection and estimation for censored.
We propose new approaches to variable selection for censored data, based on aft models optimized using regularized weighted least squares. Variable selection for survival data analysis poses many challenges because of complicated data structure, and therefore receives much attention. Variable selection in high dimensional space has challenged many contemporary statistical problems from many frontiers of scientific disciplines. Additive risk models for survival data with highdimensional covariates. Use of microarray technology often leads to highdimensional and lowsample size hdlss data settings. Many authors have proposed various variable selection criteria and procedures for linear regression models miller 2002. Citeseerx document details isaac councill, lee giles, pradeep teregowda. Ups delivers optimal phase diagram in highdimensional variable selection ji, pengsheng and jin, jiashun, annals of statistics, 2012.
Variable selection is fundamental in highdimensional statistical modeling. In this article, we propose a multistudy aft model, which accounts for heterogeneity among studies. The minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree. Variable selection for survival data with a class of. Rankingbased variable selection for highdimensional data. The variable selection problem is discussed in the context of highdimensional failure time data arising from the accelerated failure time model. To address these challenges, we propose a new highdimensional mediation analysis procedure for survival models by incorporating sure independent screening and minimax concave penalty techniques for variable selection, with the sobel and the joint method for significance test of indirect effect. In big p and small n problems where p is the dimension and n is the sample size, the distribution of the minimal depth reveals. Pdf marginal screening for highdimensional survival data.
Among standard variable selection methods shown both to have good predictive accuracy and to be. In addition, reliable detection of multivariate biomarkers with high predictive power feature selection is of particular interest in clinical settings. Highdimensional data analysis the methodology center. Highdimensional variable selection in metaanalysis for.
There are at least two different goals when using these methods. In research applications involving highdimensional genetic data, the use of cars scores for marker selection is a favorable alternative to cox scores even when correlations between covariates are low. Variable selection in a loglinear birnbaumsaunders. Contribute to younghhksoftware development by creating an account on github. Several methods have been developed lately for highdimensional linear regression such as the lasso tibshirani 1996, lars efron et al. Computational biotatistics and survival analysis shariq.
Highdimensional variable selection for coxs proportional. Automatic model selection for highdimensional survival analysis. Recent technological advances have made it possible to collect a huge amount of covariate information such as microarray, proteomic and snp data via bioimaging technology while observing survival information on patients in clinical studies. Lauer the minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree.
Prognostic models of clinical outcomes are usually built and validated utilizing variable selection methods and machine learning tools. Highdimensional variable selection for coxs proportional hazards model by jianqing fan. A survival risk model m 1 is developed using only the data in t 1 for variable selection. Automatic model selection for highdimensional survival. In this article, we propose a simultaneous parameter estimation and variable selection procedure in a loglinear bs regression model for highdimensional survival data. Highdimensional variable selection for survival data. The birnbaumsaunders bs distribution is broadly used to model failure times in reliability and survival analysis. In addition, variable selection and classification procedures are an integral part of data analysis where the information revolution brings larger datasets with more variables and it has become more difficult to process the streaming highdimensional timetoevent data in traditional application approaches, specifically in the occurrence of. Introduction survival analysis is a commonlyused method for the analysis of failure time such as biological death, mechanical failure, or credit default. Statistical methods for highdimensional variable selection based on rightcensored surviv al. Variable selection and estimation procedures for high.
We are developing broadly applicable techniques for highdimensional variable selection. Research output not available from this repository, contact author. Power and sample size calculations in survival data. Combined performance of screening and variable selection. Highdimensional variable selection for survival data article pdf available in journal of the american statistical association 105489. To our knowledge, there are no methods currently available for formally combining data from multiple studies in conducting fast highdimensional variable selection for survival outcomes. The main conclusion for this work is that rsf and boosting. Highdimensional variable selection for survival data hemant ishwaran,udayab. Recent technological advances have made it possible to collect a huge amount of covariate information such as microarray, proteomic and snp data via bioimaging technology while observing survival. Cars scores are implemented in the r package carsurv. Thus, it is crucial that models are not overfitted and give accurate results with new data. Highdimensional mediation analysis in survival models. A variety of approaches have been proposed for variable selection in this context.
However, only a small number of these have been adapted for timetoevent data where censoring is present. When relating genomic data to survival outcomes, there are three main challenges that are the censored survival outcomes, the highdimensionality of the genomic data, and the nonnormality of data. Highdimensional variable selection for glms and survival models phd thesis to obtain the degree of phd at the university of groningen on the authority of the rector magnificus prof. This thesis will be defended in public on monday 10 july 2017 at 09. Variable selection and prediction with incomplete highdimensional data by ying liu, yuanjia wang, yang feng, and melanie m. Variable selection in high dimensional cancer genomic studies has become very popular in the past decade, due to the interest in discovering significant genes pertinent to a specific cancer type. The accelerated failure time aft models have proved useful in many contexts, though heavy censoring as for example in cancer survival and high dimensionality as for example in microarray data cause difficulties for model fitting and model selection. In this paper, we propose a bayesian variable selection scheme for a. Censored survival data is the main data structure in such studies and performing variable selection for such data type requires certain methodology.
The approaches are evaluated on microarray and by simulation. Regularization and variable selection via the elastic net. Variable selection for highdimensional genomic data with. We also developed proc scad, a pair of sas procedures using the scad penalty for highdimensional variable selection. Highdimensional variable selection for survival data, journal of the american statistical association, american statistical association, vol. A data augmentation approach is employed in order to deal with censored survival times and to facilitate priorposterior conjugacy. Rankingbased variable selection for highdimensional data rafal baranowski 1, yining chen, and piotr fryzlewicz 1department of statistics, columbia house, london school of economics, houghton street, london, wc2a 2ae, uk.
389 197 229 1034 360 286 1338 575 747 873 1253 1394 807 336 1385 1056 1392 525 1479 856 601 464 1366 1408 117 384 1284 563 1289 336 1357 1048 1384 422 1074 1212 1439 711 1403