Cross-eyed PhD: Tip 7 Analyzing the Data-Inferential Statistics

posted in: Wealth of learning | 0
“Piled Higher and Deeper” by Jorge Cham,

Automation helps! You possibly saved days of onerous data capture by using my trick with the OMR forms in previous tips!
But – no, you are not obsolete! The real PhD work is in the analysis of data.

When data does not make any sense…we will have to resort to statistics! 


Of course, you do start by making sure that you have the right data – that there is no bias and the data captured is correct and reliable (ref tip 6). However, once you have good data, you have a way to check your hypothesis.


In my research on the purchase intent of the residential solar PV buyer in India, the literature survey indicated different perspectives from different studies. One set of researchers maintained that the buyer intent was a function of buyer income – another maintained that it did not! Who was correct? If we asked most of our industry people, they would subscribe that the buyer income was a key determinant of the purchase intent. Surprisingly, once we ran the inferential statistics we found that there was no statistical linkage between the buyer income and his purchase intent!


Moreover, this got coherently substantiated with the statistical model that we tested. Confirmatory factor analysis found that financial self-efficacy was not the key factor driving the purchase intent of the residential rooftop solar buyer!


It is important to realize that the real-life results show a distribution of data. Sometimes, we form a perception basis an outlier data, other times it is our gut-feel judgment. However, if we have to make sense of data then we really need to understand the statistical fundamentals. Inferential statistics are used to draw inferences about a population from a given sample. How confident are we that the results are statistically valid? We normally look at 95% or higher confidence levels in our research studies with errors +- 5% or less.


Inferential statistics is is subdivided into parametric or non-parametric tests. Parametric tests are applied if the data is interval or ratio data, your sample is randomly drawn from the population and your sample is from a population that is normally distributed. Social Science uses Likert scales for their tests – to some extent this assumes that the granularity of Likert scale simulates an interval scale; P-P / Q-Q plots or distribution parameters like skewness/Kurtosis are used to verify the normality of the data. ( You may find that Likert scale data does fail Shapiro-Wilkes test in SPSS on many occasions)!


Hypothesis testing is one of the main applications of inferential statistics. You start with a hypothesis and test if statistically the sampling data validates it or rejects it. You reject a null hypothesis if significance p<=0.05 otherwise you cannot reject a null hypothesis ( do note that the wording is important – you cannot say ‘accept the null hypothesis’ – the convention among researchers is to say “cannot reject the null hypothesis)!


This is the power of measurement – done properly with appropriate practices and valid, reliable data, it can dispel perceptions and present factual results. As Sir Arthur Conan Doyle says (thru Sherlock Holmes) – after we have eliminated the impossible, whatever remains, however improbable must be the truth!