The Lab of Medical Imaging and Computation (LMIC) at the Massachusetts General Hospital is a collaboration between MGH and Harvard Medical School. It uses innovative technology and deep learning (AI) to develop machine learning applications in medical imaging and other fields of healthcare. AI is a technology that trains machines to perform tasks by processing data and identifying patterns. By essentially feeding the computer with data and answers, one can “train” it to learn how to perform a human-like task but with greater precision. The LMIC has six research fellows who work under the guidance of the principal investigator, Dr. Do. I had the incredible opportunity to work in this lab during my summer internship.
The main component of my internship was to work on a Kaggle competition along with two other interns (Joel Mannaseh and Hannah Daniel). Kaggle is the world’s largest data science and machine learning community, and provides a platform for developing code to solve real-world machine learning problems. The Kaggle site offers a research competition promoted by the Society for Imaging Informatics in Medicine (a healthcare organization whose mission is to advance the field of medical imaging) to develop an accurate AI algorithm to detect pneumothorax (a collapsed lung condition) from chest x-rays. There were 1,444 teams competing to develop the most accurate model.
At the start of the internship, I had to familiarize myself with a variety of coding languages. I realized the ability to adapt to new languages is a fundamental skill of a programmer because different languages are designed to optimize different tasks. For example, I used Python as my base program, Tensorflow to access machine learning functions, and Pandas to use functions for importing data sets. Keras is an incredibly powerful language that I used to translate a pixel image into data readable by a computer.
Before I started working on the competition, I also needed some experience working with large volumes of data. My first project at LMIC was to convert data from an inefficient format to one that was more useful and secure. Hospitals use FHIR (Fast Healthcare Interoperability Resources) standards to exchange healthcare information electronically. Oftentimes, administrators store data in Excel spreadsheets, which are cumbersome to process and not secure. My task was to automate the process of translating patient data in Excel files into the JSON (JavaScript Object Notation) format used by FHIR. My program had the ability to take hundreds of rows and columns of patient data in Excel and convert them into JSON files that were securely and efficiently accessible by FHIR.
I then moved on to working on the Kaggle competition. The other interns had already identified a base kernel (compilation of code) that was able to detect pneumothorax within the data set provided for the competition with 68% accuracy. Our next task was to modify the kernel to improve its accuracy. The data sets came with “masks” which were essentially answer keys that highlighted the area of the x-ray with the disease. I developed a function to crop the chest x-ray images and the masks to remove extraneous portions to better train the computer to zero in on where the disease was located.
The next task I worked on was binarizing the data. The masks in the data set did not have clear margins and had a muted color, which made it difficult to accurately train the computer to recognize the location of the disease. I used Python and Keras to develop code that addressed this issue. By binarizing the data, I targeted the most prominent parts of the mask to be white and converted the rest of the mask to be black. In other words, I made it “black and white” for the computer to understand where the disease was located. By the time I completed my internship, the collective efforts of the team had improved the accuracy of our model to 80.06%.
Working on this research competition was somewhat ironic for me. When I was 2 years old, I was misdiagnosed as having a massive lung tumor when in fact, I had pneumothorax. The failure to detect pneumothorax early led to my being hospitalized for over a week and left me with a permanent pulmonary/respiratory condition. Had this technology been available 15 years ago, I might have been spared from this condition.
Although I spent the majority of my day working independently on research and developing code, I managed to have some interesting interactions that made me feel part of the team. One day, I noticed Dr. Baik and Dr. Kim, two of the research fellows, in deep discussion as a router lay exposed on the ground. I had spent the better part of my childhood fixing the router in my house, so I knew this was the perfect opportunity for me and my freakishly small hands to rise to the occasion. I asked if I could help and ended up snaking a wire through the router box and connecting it to its rightful socket, an accomplishment that allowed us to carry on with our coding that day.
Another day, Dr. Kim popped around the corner and said “I need a human.” I was thrilled because he needed help with the one task I was an expert at: being an error-prone human. Dr. Kim’s research was on improving the accuracy of early breast cancer detection where dense tissue was concerned. That day, he was trying to account for human error in visual detection when developing his complex algorithm. My task was to review thousands of mammogram files and visually locate the densest areas of tissue. Being a tiny part of his important research on cancer diagnosis and learning about what he was trying to accomplish was rewarding, especially given my family’s history of breast cancer.
As a student in my computer science class, the practical applications of what I was learning often eluded me. After having worked in this lab, I realize that even with my basic knowledge of coding, I can be a contributor on a project as vital and relevant as early detection of diseases. The exposure to AI and the extent of its implications in a wide array of uses–from medical diagnosis, to algorithmic trading, to targeted marketing and social media strategies, and even to cyber security and threat prevention–has become apparent to me.
I have also come to appreciate the importance of collaboration in the field of research. During our weekly meetings, we shared the progress made on our projects and the hurdles we were trying to overcome. By observing the research fellows, I gained a better understanding of the research mindset and the importance of creative and persistent problem solving. I also greatly valued the importance of discussion and found their suggestions to be helpful in terms of how I approached my work. Those weekly meetings were a forum to get alternative perspectives and to help me step back and look at my work with a fresh eye.
On a more practical note, I quickly learned that a long commute to work can be highly detrimental in maintaining a healthy work/life balance. It would have been nice to have a commute like Mr. Parsons!
Comments