top of page
  • Writer's pictureThe Rivers School

Brady Johnsen '23: Mass General

This summer, I worked at the Lab of Medical Imaging and Computation at Massachusetts General Hospital in Boston. This lab aims to use machine learning to supplement radiologists’ findings in x-rays and other medical images. Under Dr. Do’s supervision, I worked alongside seven other interns throughout the summer, who ranged in experience from high-school freshmen to Ph.D. candidates.

My main project was to create an image classification model that could differentiate x-rays of various body parts. These AI neural networks are like black boxes, as we can only observe their behavior from their inputs (training images) and outputs (testing performance) rather than looking at their complex internal mechanisms. To train the model, I downloaded a dataset from the lab consisting of about 2500 x-ray images of nine different categories. I worked with another Rivers intern, Nina Minicozzi, to develop our first model and create some performance metrics. It only had an accuracy of about 80 percent, but it was a start.

During the second week, I started an additional project to fix a server that had been inoperable for months due to overheating. I diagnosed the problem as a faulty pump in the CPU cooler, so Dr. Do and I went to Micro Center to source a replacement. We learned later that we needed another power supply cable to work with the new cooler, so I took the Red Line to Cambridge the next day to retrieve the cable from the store. The server was finally able to boot up without throwing an error. Dr. Do advised me to reinstall the operating system; I successfully did so and reconfigured it to work with the network in my free time over the next few weeks. I also synched the server up with the network storage systems and installed all the necessary plug-ins so I could use the server to train models. I had to do most of this work in the office’s server room, where a massive cooling system was blasting frigid air right in my face.

That same week I sought to create a more accurate classification model and better visualizations of its performance. I used a pre-trained neural network called "ResNet" and a "transfer learning" method to train a model to work for this project. Since ResNet had already learned how to look at images, it was much easier for the AI to classify x-rays than my previous model, which was built from scratch. This model had an incredible 99.3% accuracy after I trained it. I also made some heatmaps and graphs from each category for that week’s presentation:

The heatmap shows where the AI is searching to make its decision. The model probably emphasizes the thumb more since that differentiates the hand images from the foot category. On the right is a ROC (Receiver Operating Characteristic) curve, which plots the true positive rate (sensitivity) against the false positive rate (one subtracted by specificity). Essentially, how close the curve is to the top left corner shows how precise and reliable the AI is for this category. My second model killed this.

During the following week, I worked on trying to use this model to rank data. I extracted the model’s confidence in each of its decisions as a percentage, and then I displayed this number along with the image in a grid for each body part in each category:

The same week I also trained the model again, but this time I saved the model at the end of each iteration that the AI sifted through the dataset. I chose one of the x-ray images, created a heatmap of it with each saved model, and displayed it in a grid:

Currently, I’m in my fifth week at the lab and working on ranking the training data by their contribution to the model’s performance. To do this, I excluded one image from a smaller dataset to determine how the model changed in performance and repeated this process 45 times for each image in this dataset. As expected, the accuracy greatly suffered since the training dataset was initially 2200 images. However, the accuracy is only used to compare the value of different images to the model’s performance. A lower accuracy means that the excluded image is more important to the model:

As I near the end of my internship in the coming weeks, I appreciate how much I am discovering about machine learning and generally about working in an office in downtown Boston. Taking what I’ve learned at school in computer science class and applying it in the real world for research, as well as working independently and showcasing what I have created to my peers and Dr. Do has been a very gratifying experience. According to Dr. Do, my work has the opportunity to improve the healthcare of patients and possibly save lives by improving the efficiency of radiologists in under-resourced hospitals. Thank you to Dr. Do, Mr. Schlenker, and my fellow interns for this memorable experience.

bottom of page