[Excerpts] Data is the foundation of artificial intelligence. As the importance of A.I. grows in modern medicine, there’s a huge need for data (as well as data annotation) – the latter being one of the most important aspects of the work in building an algorithm. In healthcare, collecting data means utilising existing databases and using images, radiology results, samples, CT or MR scans, patient records and more. The more data you feed the system, the better the results can become.
It’s easy to guess that this data includes your own health-related data: EMRs, smartwatches, genetic reports, wearables and so on are all means to feed the A.I. with datasets. But what if we would never be able to obtain enough data to contribute to the progress of A.I. in healthcare?
Why is data important in healthcare A.I.?
The biggest obstacle to A.I. is the inadequacy of the available data. Without patient data, there is no A.I. in healthcare. On one hand, the amount of data needed for effective algorithms in healthcare is crucial as a huge amount of data is needed to feed the algorithms. On the other hand, data needs to be annotated, drawing lines around tumours, pinpointing cells or designating ECG rhythm strips – that’s why the altruistic role of data annotators is so important.
Above all that, privacy concerns limit the amount of available data in medicine. Working with sensitive patient data is a tricky issue. It seems we cannot keep our privacy intact AND also benefit from A.I.’s advantages in our care. We saw in many cases how sensitive information can get leaked even unintentionally – and we are not even talking about hacking or privacy, just a poorly protected database. New methods like federated learning might make it possible to do this without breaching patients’ privacy, but its scope is limited.
That is where synthetic data could be of help. It can fill in the missing data, making it possible to produce entirely fabricated patient datasets that are just as useful for training A.I. as the real thing, while keeping patient data protected.
Hands-on use
Synthetic data already has a number of practical use cases. A group of researchers in Michigan have developed a computer vision model to help improve pathologist decision support to more accurately diagnose brain tumours. Their challenge was that if they wanted to use brain scans from other institutions, the algorithm’s efficiency dropped as it could not compare the different types of scans.
Click on Synthetic Data if you would like to read more about privacy, quality and bias.