Stop getting fitness tracker studies wrong

Useful? Yes. Flawed? Pretty much most of the time

I’ve been wanting to write something about this subject for a while and when I saw a study last week that revealed the placement of activity trackers affected step tracking accuracy as being some kind of amazing breakthrough, I thought it was time I shared my two cents on these studies.

Now don’t get me wrong, these studies that aim to find out whether fitness-focused wearables are useful and accurate are important because they support a lot of what we already know at Wareable and that’s that most are not without their flaws. Things are getting better, sure, but we also know that there’s still a lot of work to do and tech companies need to keep striving to make things better.

My real problem is that it is really easy to pick out the problems in a lot of the studies we’ve covered. This raises a lot of alarm bells about whether they are actually telling us something that’s actually of great value or insight.

The first big issue is always the choice of devices used in and the data these studies actually seek to find out from these devices. Take the study by the Journal of the American Medical Association that wanted to determine if using a fitness tracker could help people lose weight. It came to the conclusion that fitness trackers did not help keep its test subjects keep off the weight.

Stop getting fitness tracker studies wrong

In the study, 800 participants were split into three groups: control (no tracker); Fitbit (given a Fitbit Zip to track steps); charity (financial incentives given to charity); and cash (participants given financial rewards). A Fitbit Zip? Seriously? This is the most basic of the Fitbit trackers and it would have made a lot more sense and value to make use of one of the company’s more recent, capable trackers. Even a Fitbit spokesperson suggested that the study was undermined by not using some more up to date data.

That tendency to use older devices, some that are no longer available is something that continually crops up. I’ve seen the Microsoft Band (first gen not second gen), BodyMedia, Mio Alpha 2 and the recalled Basis Peak pop up in these studies, but how can we say that devices launched a few years ago really draw the kind of information that represents newer more sophisticated products? There is the obvious problem that these studies require a long testing and analysis period, whether that’s six months, a year, maybe two years, which will have no doubt influenced the selection of the devices, but it’s hard to accept it is really going to provide meaningful insights, when software and hardware has so rapidly improved.

Another problem is whether those carrying out the studies really knowing what these devices are truly capable of. It’s something that cropped up in a recent Stanford study that explored the tracking reliability of smartwatches and fitness trackers and found that the Apple Watch offered the most accurate heart rate monitoring. The study picked the original Apple Watch, Samsung Gear S2, Fitbit Surge, Basis Peak, PulseOn, Mio Alpha 2 and Microsoft Band against FDA approved equipment. 60 volunteers strapped on up to four devices and took part in a total of 80 physical tests. The study also raised questions about the reliability of these wearables accurately calculating energy expenditure.

PulseOn, one of the devices included, had readings that were off by 93% as far as calculating energy expenditure. Not long after that study, Firstbeat, the company that worked on the heart rate based analytics provided by PulseOn got in touch with me and claimed there was major methodological problems with the study. Apparently calories were pulled out from the PulseOn device databased from some intermediate reading, that was not meant to be shown for the user. The calorie estimate was only based on accelerometer motion. From Firstbeat’s point of view, this made the study concerning as far as the reliability of its data.

Executive editor James testing out VO2 Max measuring wearables

Some of the decisions made on how these tests are carried out are questionable as well. We’ve spoken a lot about the whole Fitbit and heart rate accuracy debate and in fairness, it’s not just Fitbit who has problems on this front. Pretty much most optical HR monitors do. When Consumer Reports sought to claim that Fitbit’s heart rate monitoring was accurate, it did so with its test subjects wearing one of its trackers on their wrist and their forearm, with accuracy in comparison to a heart rate monitor chest strap said to have improved. I’d love to know how many people are wearing their Fitbits on their forearms. Facepalm. It’s something that can be levelled at that Stanford study as well. It made people in its study wear up to four devices. When companies have been keen to stress that wrist-based optical heart rate monitors should be worn in very specific places, it raises further doubts about what kind of information those wearables served up.

As I’ve already said, I think these studies are important and can have value. If they are done in a way that you can’t easily pick holes in them. Unfortunately that really hasn’t been the case on most occasions. At Wareable, we don’t always have the resources or high tech lab conditions to be able to the same kind of testing that many of these institutions can, but what we do know and have an understanding of is how these wearables work. We also have the benefit of using them in real-life situations where we think a lot of people will rely on getting that reliable data hit. While it might be easy to throw in an Apple Watch a Garmin or Fitbit into your test group, it pays to know how each works to best deliver the data. I’m not convinced that taking that time to learn about how each individual wearable works is always done and until that happens, my opinion on these studies are not going to change.

Do you agree with the view that fitness tracker studies are flawed? Let us know in the comments below.