BIG DATA: TYRANT OR TOOL? / ILLUME Advising, LLC

The energy industry is on the cusp of a big data revolution. More than 50% of households have a smart meter with some regions nearing 100% of households. Deployments of smart meters are projected to reach 70 million in 2016 and 90 million by 2020. Even without Advanced Metering Infrastructure (AMI) data, there have been increasing efforts to use existing data to build models for micro-targeting and segmentation. While these efforts offer benefits to customers, examples of big data use in other industries suggest that we should consider how we use big data modeling critically, and take steps to ensure its applications in the energy space are fair for all customers.

In her recent book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, author and data scientist Cathy O’Neil argues that many of the commonplace applications of big data modeling disproportionately penalize particular segments of the population including the incarcerated, job seekers, and people with poor credit. Through these examples and others, O’Neil identifies three metrics of big data models that threaten equality: opaqueness, scale, and capacity for damage.

Models that are opaque combine data from many sources into an algorithm that is not accessible to the people who are affected by the algorithm and the algorithm is often deemed proprietary. For example, in the world of human resources, this includes models that predict the suitability of applicants for a job opening. Job seekers are typically unaware of the factors used in the algorithm and if they are rejected for a job, are not told why or what they could do to improve their prospects in the future. In a recursive twist, the models often include credit rating as an input, which is, itself, a model.

As these algorithms are packaged into software applications and made widely available relatively inexpensively, increasing numbers of companies use them to sift through the large numbers of applications that are received for each job opening in some industries. This effectively institutionalizes a single approach to job screening on a large scale. And, the approach carries a sheen of objectiveness and authority because it’s based on data.

Job seekers may be rejected from multiple jobs without any feedback as to why. And, as finding a job becomes more difficult, credit ratings suffer, resulting in a negative feedback loop for the job seeker. Furthermore, the job screening model never has a chance to learn. If the rejected job seeker finds a job elsewhere and is successful in that position, the model is never re-calibrated based on that data point.

This is one example of many offered in the book. While in some cases a model may be better than human judgment, we need to remember that humans are programming the models, interpreting the output, and subject to the results. Assuming a model is better than human judgment should not be a substitute for critically looking at the inputs and validity of the model itself.

I challenge all of us working with data in the energy space to view our models critically. Are we unfairly leaving any customers out of energy efficiency program targeting efforts because those customers have not typically participated in the past? Maybe low past participation is related to program communication efforts rather than lack of interest. Are customers who would benefit from energy efficiency upgrades paid for through on-bill financing denied that option due to credit history when those customers who have trouble paying bills are the ones who would most benefit from energy efficiency. In our planning and implementation of big data projects, whatever the field, mathematicians and data scientists must continually use an empathetic lens to question each metric “Are there other possible explanations for the patterns we are seeing?” “Are these metrics amenable to change?” “What are the long-term ramifications of this application of big data?” “Can these data models be paired with other forms of data gathering to create a more complete picture?” As Dr. Liz Kelley has recently argued in her blog post on the March For Science, building cross- disciplinary teams that represent a diversity of thought, experience and training can help reign in the tyranny of run-away big data, and relegate to its proper place as a tool to be leveraged for greater understanding. With careful thought leadership, the energy industry can embrace the big data revolution, improve customer experience and reduce energy use for a more resilient energy future.