سال انتشار: ۱۳۸۵
محل انتشار: هشتمین کنفرانس آمار ایران
تعداد صفحات: ۱۴
K. Khorshidian – Stat. Dept., Shiraz Univ., Shiraz, Iran.
In the present article we review the QSAR, a right path to the statistical data analysis in Chemistry. We deal with the actual problems which are encountered in applying statistical methods to real data in the area of chemical research. We try to solve the noisy broblems by categorizing statistical methods and building a well defined sequence of procedures that would be consider and applied as an algorithm, the QSAR, Quantitative structure-activity relationships (QSARs), as one of the most important areas in chemometrics, give information that are useful for drug design and medicinal chemistry, . The object of constructing the QSAR models is finding one or more molecular descriptors that represent variation in the structural property of the molecules. The main problem in applying usual statistical techniques to chemical data is that in almost all cases the sample size is too small relative to the number of variables (descriptors) in the model. Always sample size is less than hundred, but descriptors are of rank of hundreds and in many cases more than thousand, reverse to what we desire. Multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression and singular value decomposition are the mostly used modeling techniques in QSARs, when different types of cross-validation (CV) and bootstrapping procedures are applied to above mentioned techniques as iterative methods in order to converge to the optimal model. The application of these techniques require precise variable selection for building well fitted models. In addition to the above, nowadays genetic algorithm (GA) is well known as an interesting and more widely used variable selection method. A GA is a stochastic method to solve optimization problems defined by a fitness criterion applying evolution hypothesis of Darwin and different genetic functions, i.e. cross-over and mutation. A real research in drug chemistry with QSAR modeling is over-reviewed as an example, when sample size is 735 and size of variable pool is 1355.