Bellabeat
Google Data Analytics Professional Certificate Case Study Using R
By Jennifer Morris
Introduction:
Thank you for taking the time to review my Bellabeat data analysis case study! In this case study I will simulate real world tasks as a professional Data Analyst. You will experience how I follow the Analysis process guideline: Ask, Prepare, Process, Analyze, Share, and Act.
About the Company:
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.
Scenario of the Study:
You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
Business Task:
Bellabeat is looking for opportunities for growth by reviewing data received from open fair use datasets. This analysis focuses on the accuracy of the Bellabeat App. The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
This is the analysis of the accuracy of three comparable devises. Is the target customer concerned with the degree of accuracy as part of their purchasing decision? Should Bellabeat invest in additional technology going to give Bellabeat a boost in the global market?
Preparing the Data
Sršen encourages you to use public data that explores smart device users’ daily habits. She points you to a specific data set:
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius):
https://www.kaggle.com/datasets/arashnic/fitbit
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
The recommended dataset does not pass the ROCCC test. In order to provide a clear, accurate, analysis, another data source is required. Fitbit has discontinued trackers that were available in 2016.
For analysis I will use the 2020 Apple Watch and Fitbit Data from Harvard Dataverse. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZS2Z2J
Description:
Objectives There is considerable promise for using commercial wearable devices for measuring physical activity at the population level. The objective of this study was to examine whether commercial wearable devices could accurately predict lying, sitting, and different physical activity intensity in a lab based protocol. Methods We recruited a convenience sample of 46 participants (26 women) to wear three devices, a GENEActiv, and Apple Watch Series 2, a Fitbit Charge HR2. Participants completed a 65-minute protocol with 40-minutes of total treadmill time and 25-minutes of sitting or lying time. Indirect calorimetry was used to measure energy expenditure. The outcome variable for the study was the activity class; lying, sitting, walking self-paced, 3 METS, 5 METS, and 7 METS. Minute-by-minute heart rate, steps, distance, and calories from Apple Watch and Fitbit were included in four different machine learning models. Results Our analysis dataset included 3656 and 2608 minutes of Apple Watch and Fitbit data, respectively. We test decision trees, support vector machines, random forest, and rotation forest models. Rotation forest models had the highest classification accuracies at 82.6% for Apple Watch and 89.3% for Fitbit. Classification accuracies for Apple Watch data ranged from 72.5% for sitting to 89.0% for 7 METS. For Fitbit, accuracies varied between 86.2 for sitting to 92.6% for 7 METS. Conclusion This study demonstrated that commercial wearable devices, Apple Watch and Fitbit, were able to predict physical activity type with a reasonable accuracy. The results support the use of minute by minute data from Apple Watch and Fitbit combined machine learning approaches for scalable physical activity type classification at the population level.
The use of this information will provide common practice information on the target customer. I do not foresee issues with this dataset.
The following steps have been taken:
- Download dataset from Harvard Dataverse.
- Saved to local computer in appropriate folder.
- The download is a CSV file in long format
Process: Cleaning the Dataset
# Installing Packages:
install.packages(‘tidyverse’)
install.packages(‘lubridate’)
install.packages(‘dplyr’)
install.packages(‘here’)
install.packages(‘skimr’)
install.packages(‘janitor’)
install.packages(‘readr’)
install.packages(‘tidyr’)
install.packages(‘ggplot2’)
install.packages(‘openair’)
install.packages(‘psych’)
install.packages(‘dataexplorer’)
install.packages(‘janitor’)
install.packages(‘venndiagram’)
install.packages(‘formattable’)
library(tidyverse)
library(lubridate)
library(dplyr)
library(here)
library(skimr)
library(janitor)
library(readr)
library(tidyr)
library(ggplot2)
library(openair)
library(psych)
library(janitor)
Importing Activity Dataset
Accuracy <- read_csv("Bellabeat Case Study/aw_fb_data.csv")
head(Accuracy)
colnames(Accuracy)
str(Accuracy)
Cleaning:
1. Observe: Height in cm
2. Observe: Weight in Kg
3. Create a Unique ID by merging Height column E and Weight column F
a. Excel =concatenate(@E:E,@F:F)
4. Observe: 20 men = 1 in the gender column
5. Observe: 26 women = 0 in the gender column
6. Column O, norm_heart, is the difference between the baseline resting heart rate (column m) and column H, hear rate.
7. Column P The Karvonen formula is your heart rate reserve multiplied by the percentage of intensity plus your resting heart rate. For example, a 50-year-old with a resting heart rate of 65 would calculate as follows: 220 - 50 = 170 for HRmax. 170 - 65 = 105 for RHR.
8. Entropy means a thermodynamic quantity representing the unavailability of a system's thermal energy for conversion into mechanical work, often interpreted as the degree of disorder or randomness in the system.
9. Apple vs Fitbit
a. Lying
b. Sitting
c. Self Pace Walk
d. Running 3 meters
e. Running 5 meters
f. Running 7 meters
g. Heart rate
h. Calories
10. No Null duplicate or missing records
Summarizing the Dataset
This study consists of a person wearing multiple devices to compare accuracy while performing various activities. Both devices were able to accurately predict a change in activity from laying, sitting, self-paced walk, running 3 meters, running 5 meters and running 7 meters.
Data Visualization
device <-aw_fb_data$device
activity <-aw_fb_data$activity
hear_rate <-aw_fb_data$hear_rate
aw_fb_data$device <- as.factor(aw_fb_data$device)
aw_fb_data$activity<- as.factor(aw_fb_data$activity)
aw_fb_data$hear_rate <- as.factor(aw_fb_data$hear_rate)
ggbarplot(data=aw_fb_data, x= ‘activity’, y=’hear_rate’, fill= ’device’, add=c(‘mean_sd’))
ggbarplot(data=aw_fb_data, x='activity', y='hear_rate', fill= 'device',add=c("mean_sd"),position = position_dodge(0.8), width=0.5) #
Conclusion
This data was drawn in a controlled lab environment where activity was created to measure accuracy of devices. Fitbit technology predicted activity with more accuracy but is not as popular as the Apple Watch brand. According to Apple Insider.com 30% of iPhone users own an Apple Watch. Globally 150 million watches have been sold as of Q2 of 2022. (Gallagher)
According to Statista.com 111 million Fitbits have been registered for use and 127 million sold.
In 2022, Apple was the leading wearables vendor, occupying 29.7 percent of the market. Prior to Apple becoming the leading wearable devices vendor in 2017, Fitbit was the market leader with market shares close to 40 percent. (Laricchia)
Because Bellabeat is so niche in catering to women, they should gather and examine internal data from their own customers rather than publicly sourced information. They should identify their ideal customer and target them in their marketing plan.
This study proves that the consumer is not most concerned with the accuracy and technology of the wearable but the brand.
Bellabeat has the opportunity to grow and break into the global market by telling their brand story.
Citations:
(Dataset)
Fuller, Daniel, 2020, "Replication Data for: Using machine learning methods to predict physical activity types with Apple Watch and Fitbit data using indirect calorimetry as the criterion.", https://doi.org/10.7910/DVN/ZS2Z2J, Harvard Dataverse, V1
Fitbit products:
https://en.wikipedia.org/wiki/List_of_Fitbit_products