سال انتشار: ۱۳۸۷

محل انتشار: دومین کنفرانس داده کاوی ایران

تعداد صفحات: ۹

نویسنده(ها):

Vahid Nassiri – Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic)
Mina Aminghafari –

چکیده:

Multiple regression is a useful statistical method which has many applications in the field of data mining, such as prediction, missing value imputation, pattern recognition, etc. There are some shortcomings in using ordinary least square regression methods when 1. the number of available observations is less than the number of variables, 2. there is significant experimental or nonexperimental noise in raw data, and 3. there exist linear relations between explanatory variables which known as multicollinearity problem and, 4. there are some latent variables which just a mixture of them is observed. There are many methods developed to overcome previous shortcomings. Two famous
models are principal component regression (PCR) and partial least square regression (PLSR). These two methods try to find an uncorrelated representation of observed sample but it is not adequate in many cases. Recently some authors used a powerful multivariate statistical tool called independent component analysis (ICA) which finds independent components of a dependent observed sample.
Wavelets are strong mathematical tools which born in the territory of harmonic analysis but very soon are used in many aspects for the real world problems. One of the most famous applications of them is in Denoising procedures. In this paper we introduce a novel regression method which overcomes all of the shortcomings which denoted above. This novel wavelet-denoising regression method using ICA (WICR) can model noisy and dependent data better than other methods. Its Denoising property makes it very useful for real world data, since there always a kind of noise exists. Our method can discover latent variables more accurately which play an important role in data mining tasks.