# Run a Regression model in Excel or RapidMiner and Answer the Following Questions

Jessica and Paul run a company that provides IT support to local nonprofits and public organizations.To better estimate their staffing needs, they have been recording data on the number of support calls received from their clients per day, as shown: https://drive.google.com/file/d/1iyD02Jy69acQryj2d…

For each day, in addition to the number of calls received, they also recorded the value of several other attributes:

ActiveContracts – the number of clients with whom they are working
AverageClientSize – the median size of their current clients (measured by # of employees)
NewSoftwareReleased – a binary variable specifying whether or not there was a new software release that day that would be relevant to their clients; a 0 indicates no new release, and a 1 indicates a new release.

Run a regression model in either Excel or RapidMiner to predict the number of calls in a given day based on the three other attributes. If using RapidMiner, be sure to change the role of Calls to label when importing, and in the Linear Regression operator, set the feature selection parameter to none. Use your regression model to answer the following questions:

1) What does the coefficient on ActiveContracts tell us? Be as specific as possible.

2) What does the coefficient on NewSoftwareReleased tell us? Be as specific as possible (Note that it is binary; its only possible values are 0 and 1.)

3) Is AverageClientSize significant? How can you tell?

4) Using this model, predict the number of calls on a day in which there are 10 active contracts, the average client size is 25, and no new software was released.

5) Would we get a better model if we removed any of these attributes? If so, which one(s)? If not, why not?

6) Jessica and Paul suspect that calls might be higher on some days of the week than others. They have added an attribute to the dataset specifying what day of the week it is. However, a text attribute can’t be used in a regression model. What should be done to allow the day of the week to be included in the regression model along with the other attributes? How would you do it? (You do not have to actually build the model.)