Odewale et al Greener Journal of Agricultural Sciences Vol. 3 (2), pp.164-175, February 2013 ISSN: 2276-7770 Research Paper Manuscript Number:122112341 DOI: http://dx.doi.org/10.15580/GJAS.2013.2.122112341 Study of Some Fruit and Seed Traits Relationship and Assessment of Multicollinearity in Date Palm (Phoenix Dactylifera L) Accessions of Nigeria by Correlation and Principal Component Analysis Odewale J.O., Agho Collins, Ataga C.D., Odiowaya G., Hamza A., Uwadiae E.O. and Ahanon M.J. Plant Breeding Division, Nigerian Institute for Oil Palm Research (NIFOR).P.M.B 1030, Benin City, Edo State, Nigeria. Corresponding Author’s Email: collinsagho @ yahoo. com Abstract: Collinearity (or multicollinearity) is the undesirable situation where the correlations among the independent variables are strong. Multicollinearity misleadingly inflates the standard errors and results in incorrect conclusions about relationships between dependent and predictor variables. Thus, it makes some variables statistically insignificant while they should be otherwise significant. It is like two or more people singing loudly at the same time. One cannot discern which is which. They offset each other. Most explanatory variables in the biological sciences tend to correlate and this leads to incorrect identification of the most important predictors. Just like other crops, in date palm the seed characters most times are used as predictors of the quality and quantity of the fruits such as the fruit weight and size which are important pricing index of date palm fruits and so, it becomes important to develop a model to explain the relationships between these two data sets without collinearity. The changing environments accompany by floods and high temperature coupled with the increase in population in Nigeria and Africa as a whole is a concern to agriculturist. Famine is fast approaching and crop modelling is one of the solutions. The main purpose of this study is to show how we can use multivariate analysis based on principal component scores to establish a model that can explain the relationship between the fruit and seed traits of date palm and compare its ability to reduce multicollinearity with the method of the ordinary least square while also studying the variability that exist among the germplasm collections for genetic improvement. The result of the descriptive statistics which indicated the values of the coefficient of variation for the different characters of the fruit and seeds of date palm reveals the possibility of genetic improvements of these characters. Principal component analysis (PCA) was applied to predictor variables to address the problem of multicollinearity. The result of the principal component analysis indicated that the contribution of the first two factors with Eigen value greater than unity accounted for 86.5 % of the total variation which was well above average and thus, explains the use of PCA in data reduction. The results showed that the principal component regression (PCR) was sufficient in eliminating multicollinearity with a variance inflation factor (VIF) and tolerance value (TOL) of unity and P < 0.05 and it was possible to explain a high percentage of the total variance with a reduced number of principal components as two principal components (PRIN 1 and PRIN 2) accounted for most of the variability in seed characters observed among the date palm germplasm collections from different locations. There was a high level of variation in some of the seed traits studied which could serve as the basis for genetic improvement of date palm fruit. There was a direct positive relationship between the groove width and the fruit traits in the multiple regression analysis and the principle component analysis, thus, indicating the importance of groove width in the genetic improvement of date palm fruits. Key words: date fruit, tolerance value, variance inflation factor, multicollinearity, principal component regression and principal component analysis. Return to Content View [Full Article – PDF] [Full Article – HTML] [Full Article – EPUB] Reference: Ataga, CD (2009). Genetic diversity among Nigerian collections of oil palm Elaeis guineensis jacq as revealed by principal component analysis and minimum spanning tree. Journal of Agriculture, forestry and fisheries (volume 10). Fievez V, Vlaeminck B, Dhanoa MS, Dewhurst RJ (2003). Use of principal component analysis to investigate the origin of heptadecenoic and conjugated linoleic acids in milk. Journal of Dairy Science. 86: 4047- 4053. Hair JR, Anderson RE, Tatham RL, Black WC (1995). Multivariate data analysis with readings. Prentice Hall, Englewood, NJ. Hoe JS, Kim DS (2004). A new method of ozone forecasting using fuzzy expert and neural network systems. Science of the Total Environment. 325:221-237. Johnson JD (1991). Applied Multivariate Data Analysis. Springer-Verlag New York, USA. Johnson RA, Wichern DW (1982). Applied Multivariate Statistical Analysis (5th edition). Upper Saddle River, NJ:Prentice Hall. Joliffe I (2002). Principal Component Analysis, 2nd ed., Springer. Kaiser HF (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement. 20:141-151. Kortse P Aloho, Oladiran A Johnson (2012). Effects of fruit size on the quality of ‘Egusi-itoo’ melon (Cucumeropsis mannii Naudin) seed. Advances in Applied Science Research. 3 (4):2192-2195. Liu Y, Lyon BG, Windham WR, Lyon CE, Savage EM (2004). Principal component analysis of physical, color and sensory characteristic of chicken breast deboned at two, four, six and twenty-four hours postmorten. Poultry Science. 83: 101-108. Maji AT, Shaibu AA (2012). Application of principal component analysis for rice germplasm characterization and evaluation. Journal of Plant Breeding and Crop Science Vol. 4(6): 87-93. Malau-Aduli AEO, Aziz MA, Kojina T, Niibayashi T, Oshima K, Komatsu M ( 2004). Fixing collinearity instability using principal component and ridge regression analyses in the relationship between body measurements and body weight in Japanese Black cattle. Journal of Animal and Veterinary Advances. 3: 856–863. Marquardt,D.W.(1970).Generalized inverse,ridge regression,biased estimation, and nonlinear estimation.Technometrics 12: 591- 612. Mendes M (2009). Multiple linear regression models based on principal component scores to predict slaughter weight of broilers. Archive Gefligelkünde. 73(2):139-144. Montgomery,D.C. and Peck, E.A.(1981).Introduction to Linear Regression Analysis.John Wiley and Sons,New York,pp.504. Neter J,WassermanW, Kutner MH (1983). Applied linear regression models. Richard D. Irwin Inc., Homewood, pp.547. Ofori I (1996). Correlation and path-coefficient analysis of components of seed yield in Bambara groundnut (Vigna subterranean). Euphytica .91: 103-107. Posta J, Komlosi I, Mihok S (2007). Principal component analysis of performance test traits in Hungarian Sport horse mares. Archives of Animal Breeding. 50: 125-135. Radhadrishan VV, Priya P, Menon KJ, Madhusoodanan KM, Kuruvilla, Thomas (2004). Factor analysis in cardamom (Elettaria cardamomum Maton) J. Spices Aromatic Crops.13:37-39. Raick C, Beckers JM, Soetaert K, Gregoire M (2006).Can principal component analysis be used to predict the dynamics of a strongly non-linear marine biogeochemical model? Ecological Modelling. 196: 345-364. Rajan S, Yadava LP, Ram Kumar, Saxena SK (2005). Selection possibilities for seed content: A determinant of fresh fruit quality in guava (Psidium guajava L.). Journal of Applied Horticulture. 7(1):52-54. Sharma S (1996). Applied multivariate techniques. JohnWiley & Sons, Inc., Canada. Sousa S IV, Martins FG, Alvim-Ferraz MCM, Pereira MC (2007). Multiple linear regression and artificial neural Networks based on principal components to predict ozone concentrations. Environmental Modelling and Software. 22: 97-103. Spark RS, Zucchini W, Coutsourides D (1985). On variable selection in multivariate regression. Communications in Statistics – Theory and Methods. 14(7):1569-1587. Tabachnick BG and Fidell LS (2001). Principal components and factor analysis. In BG Tabachnick & LS Fidell, Using multivariate statistics. (4th ed.). Needham Heights, MA, Allyn & Bacon. pp. 582 – 633. Tabachnick BG and Fidell LS (2001). Using Multivariate Statistics. Allyn and Bacon A Pearson Education Company Boston, U.S.A. Thompson ML, Reynolds J, Cox LH, Guttorp P, Sampson PD (2001). A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmospheric Environment. 35: 617-630.