In his 2015 talk Haunted By Data, Maciej Ceglowski convincingly argued the case that a lot of the way companies talk about personal or aggregated personal data is almost exactly wrong; rather than thinking about it as a pristine resource that flows in limpid streams, is pooled in data lakes, and then stored in the cloud; it's much more sensible to think about it as toxic waste or radioactive material that we don't yet know how to handle.
This obviously applies to the data collected by social media platforms that users actively choose to share, as we are seeing at the moment with the ongoing debate around the possible use or misuse of personal data. However exactly the same concerns apply to the personal data collected, stored and transmitted by consumer Internet of Things (IoT) devices that more and more of us have within our homes. The qualitative difference between IoT data and the data actively shared by users is that often this data will be collected completely passively and unobtrusively. This means it's much harder for users to raise concerns about how personal data is being used when it's almost impossible to know what is being collected.
What's the damage?
There have been a number of great examples of this over recent years ranging from the sex toy manufacturer whose devices were collecting and storing intimate details of user's sexual habits without their consent, to connected toys or dolls that collected data from millions of children (including images, audio and private chats) and were then unfortunately hacked and these databases sold online. There was also the case of the fitness tracking app that inadvertently leaked the locations and layouts of secret US army bases, as well as a recent slightly troubling story of a dental insurer whose business model appears to include sending free "smart" toothbrushes to end users and strongly encouraging them to install their app and share their brushing data.
Why share at all?
A reasonable response to the above might be to say, why share any data at all? Despite the horror stories above, we at Thingful do still see value in creating aggregated datasets, whether it is to inform travel planning (c.f. Citymapper which was initially built using data provided by the Greater London Authority's open data store), to inform local government on issues like noise or air pollution, but we feel that the decision to contribute data to an aggregated dataset should belong to the user, not whoever happened to provide the user with a connected device.
At Thingful our part of the DECODE project was to try and explore some possible technical means by which control over personal IoT data might be shifted back into the hands of the end user. To do this we might identify the following principles:
there must be a mechanism by which a user can claim and prove ownership of a specific device and control over the data it generates
users must be able to transfer or revoke their ownership of a device
when ownership of a device is transferred the new owner must not be able to access any data from the previous owner.
the provider of the device must not be able to see any data from the device without the user's explicit granted consent, and this may be revoked at any time
devices should use strong encryption techniques, and never store or transmit unencrypted data
users should be able to entitle specific third parties to receive data, but this should also use strong encryption techniques to ensure only the entitled party can read the data
users must be able to revoke access to other recipients at any point, after which point these recipients must not receive any new data.
The approach we hope to take to explore this is to attempt to build a scale model of a system that uses existing tools being developed within the wider DECODE ecosystem to implement a solution that attempts to conform to the principles outlined above.
How might this work?
In order for a user to claim and prove ownership of a device, we will use asymmetrical cryptography in that a device will be configured to not transmit any data to a cloud based data store until it has been configured with a public/private key pair for itself, as well as the public key for "owner" of the device. At this point it would be able to start uploading data encrypted with the owner's public key that only the owner would be able to decrypt. In addition after this association was set up the owner would be able to use other cryptographic primitives in order to prove that they had this link with the device (i.e. sign/verify a token).
Implement a workflow by which a device may be configured with additional public keys representing other data recipients with differential rights of access to the devices data. Data for these entitled additional recipients (representing aggregate datasets the user is choosing to share data with) could then be encrypted just for them and broadcast without to trust all of the parties between the device, and the intended recipient.
Implement a UI for managing these interactions based on Wallet app that users would install on their mobile device. This Wallet UI would have to provide controls that let a user enable or disable access for specific recipient(s).
Our plan in developing the above scale model is to simulate the devices which would actually be generating the IoT data meaning we won't actually have to implement the full cryptographic flow within a device, however we do hope that the scale model we generate will have directly applicable learnings for at least one of the later pilots within the wider DECODE ecosystem.