I'm sure this post isn't going to be revolutionary to anyone else but me. As a complete statistics novice, I've been reading
'Statistics without tears' by Derek Rowntree (great book for non-mathematicians like me) on the short train journey to/from work since starting H809, and I thought I'd just capture my learning so far through the blog to help make it stick - a little boring and not very reflective but hey, here goes ...
Well, it seems that
we all do statistics every day (even me!). We
surmise, generalise, and predict all of the time. By observing people, things, and events all around us we notice differences and similarities, and we make judgements on what may happen from our previous experiences. So, already I am a statistician, phew!
Ok, so what have I really learnt so far? Well, if we make many observations on an individual or many individuals then we soon have a collection of observations or
data. Then, we can start to make connections and patterns by noticing similarities and/or differences.
However, in statistics we have to recognise that
not everything is 100% certain -
probability is the key term here.
So what is statistics?
The methods used to collect or process or interpret quantitative data, or
a set of methods of inquiry.
Ahh, now I come across the terms used in the H809 text -
descriptive statistics and
inferential statistics. So, what are they? Well, descriptive statistics are methods used to
summarise or 'describe' our observations and inferential statistics are when we
use those observations as a basis for making predictions or 'inferences'. Right? Ok, that brings me on to '
samples' and '
populations' as it is these that make the distinction between the two. For example, when researching the learning ability of all white mice we can't possibly study the entire white mice population - we take a sample. So, descriptive statistics is about summarising or describing a sample, and inferential statistics is about generalising from a sample, to make predictions or inferences about the wider population.
Now we come back to 'probability' -
how safe are the generalisations we make? Are the samples truly representative of our population? What about the '
paradox of sampling' where the sample is misleading and not representative of the population? But we cannot always know that a sample is representative, can we?
So, that brings me to the term '
random'. Researchers select at random a sample from their believed population in order for it to be representative of the whole - not always easy to achieve as there's always a possibility of '
bias' creeping into a sample especially with human methods of sample selection, rather than mechanical methods. Still, even with mechanical methods the sample could still end up with 'bias' simply by accident, say by ending up with all male white mice as opposed to a mix of male and female (certainly possible if there are 100 mice - 50 female, 50 male - and a sample of 50 is required). When we end up with this kind of bias then our generalisations can only be applied from say white male mice in our sample to white male mice in the population. We may get a more representative sample using a '
stratified random sample' where we recognise that there are other characteristics that we need in our sample, say white mice with black eyes, and stating and choosing randomly from those groups or '
strata' - it's less likely that the sample will consist of all male white mice, rather we are likely to get a mix of male and female.
To be continued ...