Machine Learning and Ethnography: An Unlikely Pair

Posted October 23, 2018 By Riana Johnson
Blog by Riana Johnson

A statewide service territory is filled with different people with different backgrounds, houses, and energy uses. They aren’t the same gender and don’t live in the same place. How can we find their connections? When we look at customers independently, they look dramatically different. We know there must be some similarities; but what are they? And how do they translate into their notions on energy efficiency? We could send out a survey and ask about their demographic information and their current energy practices, but we might miss something from this prescribed method. By combining ethnographic research with the machine learning technique of topic modeling, we make connections in the data that we may have missed using one of these tactics alone.

How do we find the deeper, contextual information?

A statistical model cannot capture the breadth of human behavior – we need another type of research method to unpack some of the omitted variable bias. Human beings are weird and irrational. That’s why we use observational research, conduct in-depth interviews, even do ethnographic research. We use these approaches to understand people’s decision-making and positioning toward energy efficiency through their words and not their demographic characteristics.

Through 22 in-home interviews and 4 focus groups, we accumulated 250,000 transcribed words.

Analysts coded the interview transcripts; themes arose, connections emerged. While some people in the sample were very obviously connected (similar personalities, backgrounds, predilections for energy efficiency and conservation), there were also people who were entirely different. These four people are some of these connectionless participants. Ronald is wealthy and loves tech but isn’t an early adopter. Raj is a techy and embraces living downtown in a major metropolitan area. Becca is incredibly independent but wants to be near family. Dana is liberal, and her family is in the military. The interviewees would not be in the same customer segment. They don’t even live in the same city.

The research process is all about layers – zooming out and zooming in. When we interviewed customers, we were broad and zoomed out. As we started the coding process, we zoomed way back in to look at the minute details of each interview. These steps set up the foundation for the research findings. Once we had this foundation, we zoomed out once more to see if there were connections we missed. At this point, we could bring in other analysts to read transcripts and think about the coding scheme. Or, we could use the machine learning technique of topic modeling to add another layer of analysis.

Can topic modeling be another set of eyes on your data?

Through topic modeling, we can read large amounts of unstructured data, in our case, text from interviews, to find the underlying connections. Unlike other machine learning techniques, it uses qualitative data and compares them across groups. (See Topic Modeling, University of Massachusetts – Amherst, MALLET Machine Learning for Language Toolkit, accessed October 18, 2018.

This model can find the latent connections between the speech patterns of individuals by looking within each document and the documents as a whole. It is a simple way to analyze large volumes of unlabeled text. It identifies a “topic” – a cluster of words that frequently occur together – and, using contextual clues, connects words with similar meanings and distinguishes between uses of words with multiple meanings. Thousands of words and meanings in the interviews can now be systematically checked for their connections. Our brains naturally do this…this model can act as another analyst pointing out connections between each customer’s speech!

This is a representation of the five topics our model found. Family was built from words like child and husband; community was built from words like god and home; DIY was built of words like saving and house.

The model placed each of the documents, or customers, in a topic with some probability based on their latent connections through speech. Each customer’s nuanced placement within the topic structure was the most interesting output from this model. Just like our analysts did when they coded the interviews and compared them to each other, the model coded and grouped each interview using their own speech patterns .

Are things becoming clearer with topic modeling as another analyst on the team?

Looking again at Ronald, Raj, Becca, and Dana, we can see if they are, in fact, connected based on what they said.

They all engage in DIY projects around their house; they all take control of their family’s finances; they are independent and have control of their spaces. Topic modeling, on top of coding and conversations, allowed us to see the connections between these people. We knew about their DIY proclivity but didn’t necessarily make the connection. These were random people more obviously connected to other interviewees. What did we pick up on as researchers? After seeing the topic modeling matrix, we saw why these people were connected.

How can ethnography + machine learning help our industry?

While this model works very well for assessing context and syntactic meaning…it’s still a computer. The importance of the interview still holds. This should not be run instead of a qualitative analysis – the context of the words/interview and the person’s physical reactions and storytelling is hugely important. This analysis should be conducted in tandem as “another analyst” on the team.

Topic modeling is another tool in the tool box. A tool that can help us tailor a more nuanced message to customers about energy efficiency. This tactic can be useful beyond looking at interview transcripts. It could be used on systematic literature reviews or across public evaluations on utility programs. Any place in our industry with large amounts of unstructured data could use topic modeling as a supplemental method.

Sometimes, we can get lost in the weeds of customer research. We can be too zoomed in and lose some of the rich contextual information about our human customers. We can also become too attached to things like demographic and consumer patterns that seem to provide all the answers. To really start to pull back some of the layers, we should recalibrate our research efforts. Starting with deep, contextual information and layering on topic modeling, the connections between our customers become clearer. Ethnography allows us to meet people where they are – topic modeling uses people’s words explicitly to find out how they connect.


For more detail and fun imagery, check out Riana’s BECC presentation.