E-commerce giant Amazon has built a system that can automatically remove patients’ identifying Protected Health Information (PHI) from medical images used in research, and assist healthcare professionals with meeting their HIPAA (Health Information Portability and Accountability Act) requirements by implementing the company’s Machine Learning service Rekognition to detect and extract text included in images. In a statement, AWS Senior Healthcare Solutions Architect James Wiggins says that medical images often contain Protected Health Information (PHI) which must be removed to comply with regulatory requirements. Removing this information, however, has historically required images to be manually reviewed and edited, making it time-consuming and expensive to de-identify large datasets.
By using Amazon’s Machine Learning, image analysis service Rekognition to identify and extract text from images, it is then run through Amazon Comprehend Medical, the company’s Natural Language Processing (NLP), EHR-mining software, to help detect PHI. Wiggins continues that in 2017, Amazon Web Services (AWS) announced the ability to easily detect and extract text from images using our machine learning service Amazon Rekognition. In 2018, we announced a new Machine Learning Natural Language Processing (NLP) service for medical text called Amazon Comprehend Medical that can help customers to detect and identify PHI in a string of text. You can use these two services, plus some Python code, as demonstrated in this blog post, to inexpensively and quickly detect, identify, and redact PHI from within medical images.
Moreover, Wiggins added that when using Amazon Comprehend Medical to detect and identify protected health information, note that the service provides confidence scores for each identified entity that indicate the level of confidence in the accuracy of the detected entity.