Posterous theme by Cory Watilo

Filed under: usability

Why user research reporting is often flawed

It just occurred to me why UX community often conflicts with traditional marketing point of view. Why stakeholders are perplexed when one study's finding seems to conflicts with another.

The problem
The reason is that the User Experience industry often makes generalisations based on statistically irrelevant samples.

You hear consultants say "Users don't use search, or users don't understand X..."  These statements are inferences from non-causational observations. We are erroneously saying the effect of X is caused by Y. The theory goes like this....

If x is a sufficient cause of y, then the presence of x necessarily implies the presence of y. However, another cause z may alternatively cause y. Thus the presence of y does not imply the presence of x.

When we practice these pseudo-scientific reporting methods we run the risk of discrediting our findings and coming to false conclusions.

Caveat
This article is aimed at research involving small amounts of users where there are insufficient numbers of participants to build reliable behavioural patterns. With these small samples the primary value is in the individual stories and experiences.

Of course where a empirical fact is established, such as a link takes users to the wrong page then it only takes a single user to establish this as a valid fact.

The solution
If we have spotted a pattern or even one occurrence of a behaviour we can form a hypotheses. This needs to be tested to prove or disprove that hypotheses. Alternatively, we could use published studies that have scientifically established behavioural patterns and quote these as evidence.

We are missing the point
The goal of qualitative research is to answer the questions 'why'. As the answer to that question is subjective to the individuals we interview we should, therefore, be telling their subjective stories.

The reason that one person may reject a proposition may be fundamentally different from another persons'. Each separate reason may have design implications, which wouldn't have been manifest with a generalised report. For example, one user may reject a proposition because they think it's too technical, whereas another thinks it would be unaffordable. If the research company only generalises that 'the proposition was unattractive to users,' then no concrete steps can be taken to address the issue.

Only by understanding who each person is, what their motivations are, their mental models, contextual factors and internal & external cognitive variables can we qualify their statements and behaviour.

An example of knowing your participants
A friend of mine in a research and build company was questioned about a report finding that stated their users had disliked a specific function. Based on the research report, the design team had earmarked that function for deletion until client asked to see who each one of the participants were. One by one, he analysed what their financial role was to understand the reason they had rejected that function. As it turned out, these users were a-typical in this one respect. It was only by knowing the participants and the context of their particular jobs could they 'qualify' the finding and make an appropriate design decision.

The generalisation was correct in isolation, but in context it was invalid.

The practicalities
How does this translate into our reports? I suggest that agencies structure findings primarily by participant. They should tell us their name, show us a photo, give us a 'capsule' of that person. Only by knowing the person can we make sense of their behaviour.

The report shouldn't blend together an average of 10 people's experiences, but rather tell each individual journey.

Only at the end of the report should patterns be identified, as without the context and history of these patterns a reader would be unable to judge what they are reading.

Reports should allow the reader to trace findings back to the source material in a user-friendly manner.

The principle
Research reports shouldn't make generalisations from small sample numbers; Instead, it should tell participants' stories. These have validity, whereas non-causational inferences haven't.

Understanding qualitative research

There seems to be some confusion within the industry as to the role of qualitative research, as some consultants are introducing quantitive reporting within qualitative research. Qualitative research runs the risk of being discredited as a result.

I wrote this piece to help explain clarify the difference between quantitative and qualitative research.
-----------------------------------
Overview
This article examines the relative merits of qualitative and quantitative research methods. The aim is to identify their strengths and weaknesses in order to help User Experience Consultants to employ the most appropriate method.

As the audience for this paper mightn't be familiar with qualitative research the focus of this article is on explaining the merits of qualitative research.

Quantitative definition
This method uses data based on a large numbers of participants to yield insight into preferences and behaviour. Roughly speaking, this tells you WHAT people are doing or what they say they might do. This type of research is carried out using surveys of large samples yielding statistics which are often represented as percentages.

Qualitative definition
This method users small numbers of people to uncover WHY people act in a certain way. This yields in-depth understanding of human behaviour and the reasons that govern such behaviour. This research usually involves one-on-one interviews, ethnography and group discussions.

How can you trust qualitative research that only uses a few users?
Our experience from traditional marketing research involves hundreds, if not thousands, of participants. Only interviewing a handful of people seems to be counter intuitive if we went to garner what most people think or do.

Some typical concerns are: 
  • If testing with one user yields some insight then testing with four is even better;  Surely then, the larger the number the better the research? Why would yo limit yourself to only a few participants?
  • If you only talk to six people, surely you run the risk of interviewing 'outliers' that aren't representative of the majority?
  • Our statistics show that thousands of people are completing the purchase process, therefore we know our service is successful and we don't need to ask anyone.

Horses for coursesWhat measurement system is relevant?
What research method you should employ boils down to what question you are trying to answer. Each method has it's strengths and weaknesses. Knowing which scale to choose is critical because it relates to the types of statistics you can use to analyse the data.

Here are the main systems of measurement...

  1. Nominal Data: Male, Female, Race, Political Party (categorical data that cannot be ranked)
  2. Ordinal Data: Degree of Satisfaction at Restaurant (data that can be ranked)
  3. Interval Data: Temperature, Dates (data that has has an arbitrary zero)
  4. Ratio Data: Height, Weight, Age, Length (data that has an absolute zero)

As we can see two of these measurements involve subjective measurement, which lends itself to qualitative interpretation and is impossible to empirically measure. Even if you wanted to use quantitative research it would prove inadequate as it is unable to measure certain subjective attributes.

What quantitative research can't measure
We have established that there are important factors that quantitative research is unable to measure. In addition there are other weaknesses to quantitative research:

  • Quant can only give you the answer to questions you ask
  • Quant you won't learn new things 
  • Asking the wrong questions can lead to erroneous conclusions

How many users do you need?
We have established that you don't necessarily need large numbers, in fact some things absolutely can't be measured using quantities. So how many people should you include for a qualitative study?

There is no straight forward answer to that questions as it depends on several factors such as what you are researching, the quality of your participants, the ability of your facilitator and some luck.

One method developed to solve this problem is called grounded theory. This method involves open investigation where interviews over a period of time yields patterns. These patterns form the basis of a hypothesis that can then be tested with further research. With this method you continue until patterns begin to repeat, therefore there isn't a specific number of users, but the sample is likely to increase with the complexity of the subject.

Jacob Nielsen advocates testing 3-5 users as his statistical model indicated that with 5 users a typical usability test will uncover 90% of the findings. This is highly contingent upon the subject and audience for the product being tested.

As a rule of thumb it is usually recommend to test with 20 users when collecting quantitative usability metrics, which gives a margin of error of +/- 19%.

When one person can be more important that millions; n=1
It may surprise you to hear that the answer could be that the observation of one person can overturn the opinions of the majority; For example, the dominant logic for 1500 years was that the earth was the centre of the universe (geocentrism). The evidence seemed to support this hypothesis and the majority of people were adherents to this received logic. It wasn't until one person Copernicus' published De revolutionibus orbium coelestium in 1543 that overturned the old model. This is called falsifiability, when a claim can be proved wrong by one observation.

We could envisage the scenario that a quant study of 100,000 people found that users thought a banks' website was secure. However, it would only take one person to discover a security flaw to overturn the opinions of everyone else.

Discovering the unknowns
Qualitative methods aim to derive insights from statistics this method can only infer from questions people were asked. This limits the insights to the data set, who's basis is founded in a fixed set of questions. Whereas, qualitative research can uncover unknowns as the researcher is free to explore anything a participant may wish to speak about. In short, quantitative is limited and qualitative is unlimited.

What it feels like
Qualitative research  gives us a very accurate picture of peoples' EXPERIENCE. After all people understand life in the form of an emotional narrative. Our first and foremost reaction to the outside world is emotional, not rational. Therefore, the most compelling insights are those that understand these drives.

For example, A quant study may tell us that 70% of people are able to purchase an application via the Nokia Ovi store. This might seem, on the surface, to be a complete success. However, until we talk to some of those people we don’t know if they struggled through every step (a white knuckle experience) and found the whole process to be tedious or even horrible. As soon as they can, users will switch to a more pleasant service.

The advantages of qualitative research
In summary, here is  a list of the main advantages to using qualitative research:

  • Understanding the ‘WHY’ 
  • Building empathy 
  • Understand users' mental models
  • Sharing with peoples’ experiences 
  • Learning the unknown 
  • Make leaps into areas we hadn’t envisaged 
  • Context of use - the real world conditions where the product must operate

Quantitative and qualitative in harmony
As we have established, there are strengths and weaknesses to both research methodologies. It is, therefore, wise to offset the weaknesses of one with the strengths of the other. Properly implemented, each method can work together to give you insights that form a complete picture.

One theoretical scenario would be to use qualitative interviews with users to uncover needs, behaviours, goals, mental models and their emotional experiences in connection with a particular product. To ascertain if these aspects were borne out in the population as a whole, it would be prudent to run a larger study with a statistically relevant sample of the population that would 'qualify' the smaller sample.

References