Pricing House Cleaning Assignments

HCA Statistical Analysis with Practical Applications

10.HCA Pricing Formula

10.1.	To meet the clients’ requirements, cleaning companies provide recurring services based on agreed time intervals, often at fixed prices. The choice of the time interval often depends on the customers’ preferences, behavior, and financial status. The two most common intervals are weekly and biweekly. Other common intervals like every three weeks and monthly have not been considered in this analysis, but may be considered in future studies.
10.2.	In terms of the man-minutes required to clean a home, the interval between cleans can have a significant impact, depending on the usage variables. Simply put, when customers are given a longer period of time to mess up their house, it generally takes the cleaners longer to clean it. We capture the marginal effect of frequency, by taking into account quare footage of the house, and number of inhabitants. In performing our analysis, we discovered that applying other factors in measuring the marginal effect of frequency, such as lifestyle factor, and basement square footage did not significantly improve the predictive quality of the formula for two-week assignments.
10.3.	Similarly, we expect that the total contribution of the lifestyle, floor, dusting, clutter, and pet factor, team leader efficiency, client home, and after 4PM factor on the man-minutes should be based on the square footage as well. With the above assumptions, we ran regressions on two models, varying the explanatory variables depending on their significance.
10.4.	The mathematical form of model I and II are shown below; model II is basically model I with the elimination of those variables found to be less significant.

11.A Note on t-statistics and R-squared

11.1.

To provide a background for those who are unfamiliar with statistical concepts, what’s called student’s t statistics, or t-stats, usually presented in regression result tables, are a measure of how confident we can be that the coefficients are different from zero not just because of randomness. The higher a t-stat is, the more likely that the coefficient is really significant in contributing predictive values to the equation. For example, in model I, the t-stat of the Dusting Factor is very low compared with the one associated with the Floor Factor. This means that the Floor Factor does affect man-minutes while the Dusting Factor does not.

11.2.

R-squared is a measure of the goodness-of-fit of the model. The higher R-squared, the better the model fits the data.

12. Discussion of the Statistical Analysis Results

12.1.

Choosing a Model: As shown, the models do not differ so much in terms of the size of the derived coefficients and the goodness of fit. But for ease, our post-regression analysis and the HCA Pricing Tools is based on model II, because aside from yielding slightly more accurate results, it is more parsimonious and easier to apply.

12.2.

Predictive value: The “x % predictive error” term is defined as:

12.3.

Comparing the two models, the results show that model II has a little higher predictive power than model I. Mathematically speaking, for model II, 87% of the observations have an absolute value of the predictive error less than 25%. To investigate this further, below is a plotted graph of the relationship between the percent predictive error and predictive power (measured by the number of observations).

12.4.

Comparing the two alternative models tells us two things:

12.4.1.	Model I and model II have almost the same predictive power, about 80% at 20% error and 86% at 25% error. This should not be surprising because the variables dropped in model II are both statistically and practically insignificant, and the analysis demonstrates that dropping them from the equation results in no major impact on the predictive value of model II.
12.4.2.	The predictive power improves dramatically, from about 45% to 80%, if we allow a 10% increase in predictive error from 10% to 20%. However, the improvement is much less noticeable in the 20% to 30% error range. Therefore, 20% error is a good trade off between predictive power and predictive error.

12.5.

Lifestyle and Floor Factor: these two represent the most important usage factors. They have a moderate effect on man-minutes. A one-unit increase in each of these factors yield in man-minutes about 0.6 times the log of square footage. This means that at the median size of house, 2,400 square feet, having a “sloppy” instead of “normal” lifestyle or “mostly hardwood” instead of “normal mix” floor (each of these changes increasing its factor by 5 points) would result in about 5xlog(2400) = 39 man-minutes. For a large and a small house, the differential associated with these two factors would be 43 and 35 man-minutes, respectively.

12.6.

Pet Factor: A one-unit increase in the pet factor is associated with about half of the logarithm of the square footage increase in man-minutes. To illustrate, having a dog instead of a non-shedding pet in a 2,400 square feet house adds 20 man-minutes to the cleaning time.

12.7.

Dusting and Clutter Factor: these two factors were found to add an inconsequential degree of explanatory power to the model. This is partly because clutter and lifestyle factor are so highly correlated as explained in the previous section. Even without considering their statistical significance, the relative size of the coefficients is so small that dropping them from model II simply did not denigrate the model’s predictive quality. Aside from this, these two factors are subjective to measure and their definitions too often confused with the other factors, so we have dropped them from the HCA Pricing Tools, as explained below.

12.8.

Number of Team Members: The coefficient represents the marginal man-minutes required to clean when the number of team members is less or more than 3 persons. In terms of marginal returns, the regression results indicate a significant penalty of 33 man-minutes associated with performing a certain clean with a 3-person team, as compared to with a 2-person team. In contrast, the analysis showed that there is a much smaller penalty associated with adding the fourth person. It’s unclear why a larger difference was not discovered, and further analysis and more data might clarify this matter.

12.9.

Team Leader Efficiency: our analysis did not discover a statistically significant impact on man-minutes relating to the Team Leader Efficiency. This is counter-intuitive and implies that further analysis may be required regarding this factor. Several ideas come to mind:

12.9.1.	It is possible that the team leaders in our dataset do not actually have substantial differences in their management skills and time efficiency. Although it seems unlikely, it is not out of the question, since they are all evaluated and trained to conduct their assignments in a similar fashion, and particularly in light of the test company’s practice of rotating a Quality and Training Manager among Team Leaders daily.
12.9.2.	It is possible that efficiencies by the team leader might be insignificant as compared to the overall efficiencies in cleaning, and that the team leader cannot influence the efficiency of the team as a whole.
12.9.3.	Less intensive training and evaluation, or a different payroll scheme, might allow for more diversity among Team Leader skills, in which cases this variable might reveal some significance.
12.9.4.	An analysis of efficiency by all team members, not just Team Leaders, may have allowed us to capture some significant element relating to efficiency. While this may be considered for future analysis, further work in this respect was not undertaken as part of this study.
12.9.5.	Toilets with and without Shower in Use: It is counter-intuitive that cleaning a toilet with and without shower in use would take about the same amount of time, approximately 32 man-minutes, and the coefficients are suspected to be biased. This can be explained by the asymmetry of the data in terms of number of showers in use. About half of the observations have only 1 shower in use. Consequently, we might be overestimating the coefficient on the number of toilets without showers in use. We tried to estimate the model on the observations with only more than 1 shower in use and found that the increase in man-minutes corresponding to a one-unit increase of the number of toilets without showers in use decreases to about 16 man-minutes.