I walk a lot and wear a Fitbit most days. I got interested in seeing trends in my walking and if my iPhone recorded similar measurements. I downloaded the data from Fitbit’s data portal and from my iPhone’s step counter and took a look.
First thing I noticed was how similar the distance measured by each of the iPhone and the Fitbit. Note: In this plot I have removed all days when I wasn’t wearing the Fitbit.
You can see this more clearly if you plot the distance recorded by each device against each other. The red line shows the one-to-one line. In this plot, you can also see a group of Fitbit points at 0 miles. Those are the days that I either ran out of batteries or forgot to wear the Fitbit. This is intersting because the Fitbit has only a step counter, while the iPhone can access the GPS information. It might be interesting to look at the distance per step of each device.
I also found that each device has a very similar relationship between steps and distance (set by my height when I setup the devices). The Fitbit seems to have a floor, with some points rising above this floor (likely when it thinks I am running). The iPhone, on the other hand, has a lot more scatter.
For the Fitbit data, I also download stair and calorie data. The first thing I noticed is that I apparently have climbed a lot more stairs recently.
Each day has two measurements of calories, one just from activies, and the other from everything.
The algorithm that Fitbit uses to figure out calorie burning isn’t public, so I tried to figure it out using a linear regression. I used pandas and statsmodels in Python for the analysis. I first set out to fit the activity calories. I used a simple linear regression to model this, because it captures more than 99% of the variance.
model = sm.OLS(fitbit_data['Activity Calories'], fitbit_data[['Minutes Lightly Active',
'Minutes Fairly Active',
'Minutes Very Active']])
results = model.fit()
print results.summary()
Predicting the total calories is a bit trickier. There is a floor from the basal metabolic rate set by my height, weight, and age. After I subtracted out the minimum value, I was able to fit another simple linear regression. This one explains roughly 98% of the variance in the data.
model = sm.OLS(fitbit_data['Calories Burned']-fitbit_data['Calories Burned'].min(),
fitbit_data[['Steps','Minutes Lightly Active',
'Minutes Fairly Active',
'Minutes Very Active']])
results = model.fit()
print results.summary()
One interesting thing to notice is that I didn’t use the number of floors climbed in this regression. It seems like Fitbit does not use the floor data for calories. Another thing to note is that almost all of my activities include walking. I don’t know if Fitbit assigns different calories to different activities, and someone who has a variety of activities may not be able to predict their calories with as much certainty.