Assignment Task
Question 1
Imagine that you work for an energy tech company, namely, gSUNSolar. The company is active in selling and installing solar panels for homeowners. The recent increase in the price of oil and gas has increased interest in new renewable energy sources; therefore, gSUNSolar wants to get the most out of this situation.
In this regard, the CEO decided to run an ad campaign on Facebook. The CEO decided to run the following two different types of ad campaigns on Facebook:
1) ‘No targeting’: do not target any specific user on Facebook
2) ‘Targeting’: target users who are interested in one of the following topics (which is provided by Facebook):
- “Vegan food”
- “Saving on energy bills”
- “Saving the planet”
- “Luxury products”
- “Green energy”
The CEO of gSUNSolar is now done with the campaign and is asking you to evaluate the results. In particular, the CEO would like to understand the following:
A) Did the ‘Targeting’ campaign generate more shares (compared to the ‘No targeting’ campaign)? (explain)
B) Which topic(s) should be considered for targeting users in future campaigns (that potentially create more shares)?
To this end, you are provided with a sample of Facebook campaign results, containing around 100,000 users (around 40% were in the ‘Targeted’ campaign and the rest in the ‘No targeting’ campaign; see ‘gSUNSolar_a.csv’ dataset). The dataset includes the following variables:
| Variable Name | Description |
| id | ID of the user (anonymized by Facebook) |
| share | number of times the user shared the ad |
| target | “yes” if the user was in the ‘Targeted’ campaign and “no” otherwise |
| topic | the topic that the user was interested in:= a: “Vegan food”= b: “Saving on energy bills”= c: “Saving the planet”= d: “Luxury products”= e: “Green energy” |
| device | the device type of the user |
| distance | the distance of the user from the location of the company (in miles; basedon the user’s IP address) |
| browsing_time | time (in minutes) spent on the ad |
Use the provided dataset (i.e., ‘gSUNSolar_a.csv’) and run a linear model that helps you to
address CEO’s questions.
In addition to answering the questions raised by the CEO, you conducted a similar analysis (as above) to investigate:
C) Did the ‘Targeted’ campaign generate more profit than the ‘No targeting’ campaign? Discuss ‘why’ based on your results (10 points).
To investigate the question in part C, you use the same dataset as above and replace the
variable ‘share’ with two other variables: ‘profit’ (i.e., expected profit in £ realized from those users that shared the ad) and ‘ad_cost’ (i.e., the money in £ that you paid to Facebook to display the ad to a user).
Notes that you should consider in your answer:
- Include your R code and its respective results in your
- Make sure you clearly explain, justify, and detail all the assumptions and steps in your solution. These might include data cleaning (e.g., dropping variable(s), observation(s), changing type of variable(s), ) or any other assumptions or steps.
- Carefully and completely interpret your results (including all your coefficients).
- Critically evaluate the implications (based on all your results) for gSUNSolar. Make sure that you use specific and concrete examples in your solution.
- The only part not counted in your word limit is your output from the RStudio’s
console. Everything else (e.g., your code, tables, and words in your figures) counts.
Question 2
An apparel retailer would like to understand its current customers better. To this aim, the retailer is asking you to: Identify and group its existing customers into meaningful clusters that individuals within a cluster are similar to each other but different than those individuals in other clusters.
To this end, you put together a dataset (i.e., ‘Retailer.csv’) including the following list of variables:
| Variable Name | Description |
| ID | ID of the customer |
| service.sat | expressed satisfaction with the retailer(0 – 100, where 0 is extremely dissatisfied) |
| sustainable | the number of previous ‘sustainable products’ that the customer purchased |
| male | “yes” if the customer is male, “no” otherwise |
| rent | “no” if the customer owns a house, “yes” otherwise |
| income | the income of the respective customer (in £) |
| child | no. of children in the insurer’s family |
| referral | the number of other customers that the focal customer referred to the retailer |
Notes that you should consider in your answer:
- Based on the structure and the information in the dataset, apply your suggested method using R. Include your R code and its respective results in your
- Make sure you clearly explain, justify, and detail all the assumptions and steps in your These might include data cleaning (e.g., dropping variable(s), observation(s), changing type of variable(s), etc.), your decision (and justification why!) on the number of clusters, or any other assumptions or steps.
- Carefully and completely interpret your results. Your answer should cover but not be limited to explaining why the final solution is appropriate, describing the characteristics of the clusters, and discussing managerial implications.
- Critically evaluate the implications (based on your results). Make sure that you use specific and concrete examples in your
- The only part not counted in your word limit is your output from the RStudio’s
console. Everything else (e.g., your code, tables, and words in your figures) counts.
Question 3
Assume you are the CEO of a real estate company. You would like to gain more insights about the market.
| Variable Name | Description |
| ID | ID of the property |
| price | price of the property in £ |
| type | type of the property |
| bedrooms | number of bedrooms the property has |
| bathrooms | number of bathrooms the property has |
| area_a | area of the property (in m2) |
| area_b | area of the property (in ft2) |
| furnished | the furnishing situation of the property |
| level | the level that the property is located |
| price_b | price of the property converted from £ into $ |
| payment_option | the payment option that the property is available for purchase |
| delivery_term | the delivery situation of the property |
Based on the structure and the information in the dataset
- Find the correlation between ‘price’ and ‘area_a’ and interpret it; Find the correlation between ‘area_b’ and ‘area_b’ and interpret it
- Suggest a tree-based model that allows you to understand what house features affect the property’s price (in £). Apply your suggested method (using R) and explain your results
- Evaluate your model performance in B. (note that you are not required to split your dataset into train and test;
Notes that you should consider in your answer:
- Include your R code and its respective results in your
- Make sure you clearly explain, justify, and detail all the assumptions and steps in your These might include data cleaning (e.g., dropping variable(s), observation(s), changing type of variable(s), etc.) or any other assumptions or steps.
- Carefully and completely interpret your
- Critically evaluate the implications (based on all your results). Make sure that you use specific and concrete examples in your
- The only part not counted in your word limit is your output from the RStudio’s
