Odewaleet Al

Odewale et al

Journal of  Agricultural Sciences

Vol. 3 (
2), pp.164-175,
February 2013

 ISSN: 2276-7770 




DOI: http://dx.doi.org/10.15580/GJAS.2013.2.122112341


Study of Some Fruit and Seed Traits Relationship and
Assessment of Multicollinearity in Date Palm (Phoenix Dactylifera L) Accessions of Nigeria by Correlation and
Principal Component



Odewale J.O., Agho Collins, Ataga C.D., Odiowaya G.,


, Uwadiae E.O. and Ahanon M.J.


Plant Breeding Division, Nigerian Institute for Oil Palm
Research (NIFOR).P.M.B 1030, Benin City, 

Edo State,


Corresponding Author’s
Email: collinsagho @ yahoo. com


Collinearity (or multicollinearity) is the undesirable
situation where the correlations among the independent
variables are strong. Multicollinearity misleadingly
inflates the standard errors and results in incorrect
conclusions about relationships between dependent and
predictor variables. Thus, it makes some variables
statistically insignificant while they should be otherwise
significant. It is like two or more people singing loudly at
the same time. One cannot discern which is which. They
offset each other. Most explanatory variables in the
biological sciences tend to correlate and this leads to
incorrect identification of the most important predictors.
Just like other crops, in date palm the seed characters most
times are used as predictors of the quality and quantity of
the fruits such as the fruit weight and size which are
important pricing index of date palm fruits and so, it
becomes important to develop a model to explain the
relationships between these two data sets without
collinearity. The changing environments accompany by floods
and high temperature coupled with the increase in population
in Nigeria and Africa as a whole is a concern to
agriculturist. Famine is fast approaching and crop modelling
is one of the solutions. The main purpose of this study is
to show how we can use multivariate analysis based on
principal component scores to establish a model that can
explain the relationship between the fruit and seed traits
of date palm and compare its ability to reduce
multicollinearity with the method of the ordinary least
square while also studying the variability that exist among
the germplasm collections for genetic improvement. The
result of the descriptive statistics which indicated the
values of the coefficient of variation for the different
characters of the fruit and seeds of date palm reveals the
possibility of genetic improvements of these characters.
Principal component analysis (PCA) was applied to predictor
variables to address the problem of multicollinearity. The
result of the principal component analysis indicated that
the contribution of the first two factors with Eigen value
greater than unity accounted for 86.5 % of the total
variation which was well above average and thus, explains
the use of PCA in data reduction. The results showed that
the principal component regression (PCR) was sufficient in
eliminating multicollinearity with a variance inflation
factor (VIF) and tolerance value (TOL) of unity and P < 0.05
and it was possible to explain a high percentage of the
total variance with a reduced number of principal components
as two principal components (PRIN 1 and PRIN 2) accounted
for most of the variability in seed characters observed
among the date palm germplasm collections from different
locations. There was a high level of variation in some of
the seed traits studied which could serve as the basis for
genetic improvement of date palm fruit. There was a direct
positive relationship between the groove width and the fruit
traits in the multiple regression analysis and the principle
component analysis, thus, indicating the importance of
groove width in the genetic improvement of date palm fruits.
Key words: date fruit, tolerance value, variance
inflation factor, multicollinearity, principal component
regression and principal component analysis.


Ataga, CD
(2009). Genetic diversity among Nigerian collections of oil palm Elaeis
guineensis jacq as revealed by principal component analysis and minimum
spanning tree. Journal of Agriculture, forestry and fisheries (volume 10).

Fievez V,
Vlaeminck B, Dhanoa MS, Dewhurst RJ (2003). Use of principal component analysis
to investigate the origin of heptadecenoic and conjugated linoleic acids in
milk. Journal of Dairy Science. 86: 4047- 4053.

Hair JR,
Anderson RE, Tatham RL, Black WC (1995). Multivariate data analysis with
readings. Prentice Hall, Englewood, NJ.

Hoe JS,
Kim DS (2004). A new method of ozone forecasting using fuzzy expert and neural
network systems. Science of the Total Environment. 325:221-237.

Johnson JD
(1991). Applied Multivariate Data Analysis. Springer-Verlag New York, USA.

RA, Wichern DW (1982). Applied Multivariate Statistical Analysis (5th edition).
Upper Saddle River, NJ:Prentice Hall.

Joliffe I
(2002). Principal Component Analysis, 2nd ed., Springer.

Kaiser HF
(1960). The application of electronic computers to factor analysis. Educational
and Psychological Measurement. 20:141-151.

Kortse P
Aloho, Oladiran A Johnson (2012). Effects of fruit size on the quality of
‘Egusi-itoo’ melon (Cucumeropsis mannii Naudin) seed. Advances in Applied
Science Research. 3 (4):2192-2195.

Liu Y,
Lyon BG, Windham WR, Lyon CE, Savage EM (2004). Principal component analysis of
physical, color and sensory characteristic of chicken breast deboned at two,
four, six and twenty-four hours postmorten. Poultry Science. 83: 101-108.

Maji AT,
Shaibu AA (2012). Application of principal component analysis for rice
germplasm characterization and evaluation. Journal of Plant Breeding and Crop
Science Vol. 4(6): 87-93.

AEO, Aziz MA, Kojina T, Niibayashi T, Oshima K, Komatsu M ( 2004). Fixing
collinearity instability using principal component and ridge regression
analyses in the relationship between body measurements and body weight in
Japanese Black cattle. Journal of Animal and Veterinary Advances. 3: 856–863.

inverse,ridge regression,biased estimation, and nonlinear
estimation.Technometrics 12: 591- 612.

Mendes M
(2009). Multiple linear regression models based on principal component scores
to predict slaughter weight of broilers. Archive Gefligelkünde. 73(2):139-144.

and Peck, E.A.(1981).Introduction to Linear Regression Analysis.John Wiley and
Sons,New York,pp.504.

J,WassermanW, Kutner MH (1983). Applied linear regression models. Richard D.
Irwin Inc., Homewood, pp.547.

Ofori I
(1996). Correlation and path-coefficient analysis of components of seed yield
in Bambara groundnut (Vigna subterranean). Euphytica .91: 103-107.

Posta J,
Komlosi I, Mihok S (2007). Principal component analysis of performance test
traits in Hungarian Sport horse mares. Archives of Animal Breeding. 50:

VV, Priya P, Menon KJ, Madhusoodanan KM, Kuruvilla, Thomas (2004). Factor
analysis in cardamom (Elettaria cardamomum Maton) J. Spices Aromatic

Raick C,
Beckers JM, Soetaert K, Gregoire M (2006).Can principal component analysis be
used to predict the dynamics of a strongly non-linear marine biogeochemical
model? Ecological Modelling. 196: 345-364.

Rajan S,
Yadava LP, Ram Kumar, Saxena SK (2005). Selection possibilities for seed
content: A determinant of fresh fruit quality in guava (Psidium guajava L.).
Journal of Applied Horticulture. 7(1):52-54.

Sharma S
(1996). Applied multivariate techniques. JohnWiley & Sons, Inc., Canada.

Sousa S
IV, Martins FG, Alvim-Ferraz MCM, Pereira MC (2007). Multiple linear regression
and artificial neural Networks based on principal components to predict ozone
concentrations. Environmental Modelling and Software. 22: 97-103.

Spark RS,
Zucchini W, Coutsourides D (1985). On variable selection in multivariate
regression. Communications in Statistics – Theory and Methods. 14(7):1569-1587.

BG and Fidell LS (2001). Principal components and factor analysis. In BG
Tabachnick & LS Fidell, Using multivariate statistics. (4th ed.). Needham
Heights, MA, Allyn & Bacon. pp. 582 – 633.

BG and Fidell LS (2001). Using Multivariate Statistics. Allyn and Bacon A
Pearson Education Company Boston, U.S.A.

Thompson ML, Reynolds
J, Cox LH, Guttorp P, Sampson PD (2001). A review of statistical methods for
the meteorological adjustment of tropospheric ozone. Atmospheric Environment.
35: 617-630.

Leave a Reply

Your email address will not be published. Required fields are marked *