How using linear regression models on customer surveys in the retail industry gives you amazing insights

First of all if you do not know what linear regression is, it is a statistical model whose goal is to estimate the relationship between a variable (Y) and several independent variables(X1,X2 …). The goal is to explain how Y’s value is being affected by the variation in values of the independent variables and quantify the strength of the relationship between these independent variables and the output Y.

In consumer surveys, in the retail industry it is very common to ask respondents how they would rate some aspects applicable to the retailer they consume products or services at, for instance prices, opening hours, accessibility etc…

It is also common to ask them how likely they would be to recommend this particular store or service to a friend or colleague, on a scale from 0 to 10 (0 meaning it would be very unlikely they recommend it, and 10 very likely). 

This question is used to calculate what is called in the customer success industry the Net Promoter Score (NPS). This tool aims to measure the loyalty that exists between a provider and a consumer. 

Those who respond with a score of 9 to 10 are called Promoters, and are considered likely to exhibit value-creating behaviors, such as buying more, remaining customers for longer, and making more positive referrals to other potential customers. Those who respond with a score of 0 to 6 are labeled Detractors, and they are believed to be less likely to exhibit the value-creating behaviors. Responses of 7 and 8 are labeled Passives, and their behavior falls between Promoters and Detractors. The Net Promoter Score is calculated by subtracting the percentage of customers who are Detractors from the percentage of customers who are Promoters. A NPS score above 0 is considered good, above 50 excellent and above 70 perfect. 

You might see where we are going with it, we are going to implement a linear regression analysis to estimate the relationship between the NPS score (our Y value) and the ratings of the aspects the consumer rated (our X1,X2…). In other terms, we are gonna see how strongly each of the different ratings are related to the likelihood to recommend a given retailer.

Let’s see this with an example. Let’s say consumers are surveyed about a clothing store, they give ratings out of 5 on prices, staff friendliness, opening hours, products availability and products variety. They also answered the NPS question. A classic consumer survey would just show the raw results of whether consumers are satisfied or not with each of the aspects and act on it. But what if we could highlight which aspect had the more weight on the consumer’s true satisfaction ? Implementing a linear regression model on the whole dataset of respondents will give us super interesting insights.

Our results could look like this : 

Prices : 0.9
Staff Friendliness : 0.6
Opening hours : 0.1
Products availability : 0.4
Products variety : 0.8

Those figures are the weighting coefficients of each aspect that actually impact the given store. In our case that means that an increase of 1 in the consumer rating of the “Prices” aspect should increase the NPS note by 0.9 whereas an increase of 1 in the”Opening hours” rating would only increase the NPS by 0.1. That would mean that the prices of products are nine times more important than the opening hours for the consumer and that the store should put his focus there. The variety of products would be the second criteria to focus on in order to keep improving customer experience and get as many promoters as possible.

This linear regression method gives us non-traditional insights. This shows which of the aspects of a retailer truly matters to the consumer, where the focus should be put at at how strong each aspect is related to the consumer’s satisfaction.