Cross-eyed PhD Tip 8: Mashing the data for an exploratory factor analysis!

posted in: Wealth of learning | 0
“Piled Higher and Deeper” by Jorge Cham,


One of the biggest challenge in handling data is extracting inferences from it. Factor analysis is a multi-variate statistical technique in which there is no distinction between the dependent and independent variables. In factor analysis all the variables under observation are analysed together to extract the underlying factors.

In my PhD, I generated tons of data – the problem was extracting the meaning from it. Academic rigor demanded that each stage from the design of the questionnaire to the collection of the data – to the interpretation and documenting the results had to follow an academic rigor.  Factor analysis was a fundamental tool that was to be used to extract the factors behind the purchase intent of my targeted consumer profile  – it would reduce a large number of variables to a few manageable factors. The factor itself is a linear combination of variables – and is a construct that is not directly observable but needs to be interpreted from the input variables.

As discussed in tip 5, one of the key steps in research is to develop a concise multiple item questionnaire – essentially, a reliable scale for measuring the test constructs we are researching. Exploratory factor analysis is key to developing the same as it will help in reducing redundant items in your initial scale.

Attitude measurement starts with the researcher generating a large number of items (statements) relating to the items being measured. These items are extracted from literature survey and exploratory research – but they need to be tested for establishing a reliable, valid scale in the immediate research context. Factor analysis can reduce the set of statements to a concise instrument and at the same time ensure that the retained statements adequately represent the critical aspects of the constructs being measured. In my research, I started with a 130 item questionnaire and the exploratory factor analysis helped reduce it to a 25+ item scale!

This method reduces multiple input variables into grouped factors. If a set of items measure price sensitivity of a purchase intent, then they can be grouped together into a price-value factor. Alternately, if another set of items relate to buyer propensity to be among the first users of the product, then the factor can shape a psycho-graphic profile. The factor analysis can be used to do identify market segments by identifying different attributes of a product/ brand that influence the buyer’s purchase decision.

Factor analysis requires the use of metric data. In  a survey, a 5 or 7 point Likert scale is used to measure potent variables. Typically, the size of the sample respondents is at least 5 times the number of items ( number of statements). It also requires that there is a high correlation between various items of a factor. This is measured by two tests – Bartlett’s test of sphericity and Kaiser-Meyer-Olkin (KMO) statistics. When both these tests are significant, we can proceed with Factor analysis of the sampled data.

The steps in a typical factor analysis are

  1. Extraction of Factors and
  2. Rotation of Factors

The factor extraction can be done by a number of methods and the most popular method is the principal component analysis. Factors are linear combinations of variables which are supposed to be highly correlated and the mathematical form for the same will be:




Xi = ith variable

Fi = estimate  of ith factor

Wi = Weight or factor score for ith variable and r = number of variables

The principal component method searches for those values of W that the first factor explains the largest portion of total variance. This is called the first variance. Then it extracts the second principal factor from the residual matrix. If we start with 20 items scale, each will have an eigen value of 1 and total variance will be 20. However, after the extraction the factors will have an eigenvalue which will be in different weights – and the operating rule is to select those factors which have an eigen value greater than or equal to 1.

The next step in the factor analysis is the rotation of the initial factor solutions. This is because the initial factors may be difficult to interpret. Therefore, the initial solution is ‘rotated’ so  as to yield a solution that can be interpreted easily. Varimax rotation is one of the most popular rotation method -and it maximises the variance of the loadings within each factor. By this process, the variable items are sorted across factors in a manner that one gets variables with high loadings on a specific factor. A variable that is  loaded on one factor does not appear in another factor.

At the end of this exercise, one has achieved a basic factor analysis and can even interpret the extracted factors by studying the variables assigned to each factor.

While this exploratory factor analysis may be a practical place to conclude a field research,  PhD process demands a lot more academic rigor. Each of the extracted factors are tested rigorously in a confirmatory factor analysis / Structured Equation Modeling to ensure that the final model is a robust, rigorous and statistically acceptable. We will review this in a following tip – shared by the guest editor and contributor – Sanjay Malla – in tip 10.