This summer, I decided to take on a challenge: learning AI coding and programming, and implementing my knowledge into a real-world project.
Through live group lessons, I teamed up with two others who shared my interest in the implications of AI in medical research and development.
Our project is focused on using convolutional neural networks to enhance the accuracy of lung disease classification. We integrated two datasets from Kaggle consisting of X-ray images of lungs. One dataset included grayscale images depicting various lung diseases, such as COVID-19, tuberculosis, and pneumonia. The second dataset was comprised of nine categories: Normal (No Disease), Pneumonia, Higher Density, Lower Density, Obstructive Pulmonary Diseases (OPD), Degenerative Infectious Diseases, Encapsulated Lesions, Mediastinal Alterations, and Chest Disorders.
Let me walk you through the process of making this idea come to life:
Brainstorming: We collaborated to select lung disease detection as our project topic. Tasks and responsibilities were evenly distributed among team members, and a schedule was created to guide our progress.
Data Collection: We found two datasets on Kaggle, and we prepared the images for pre-processing through data cleaning. We also performed EDA (Exploratory Data Analysis) and ensured uniform data distribution through image augmentation.
Organizing Data: We moved the images into three separate folders: train, test, and validation. Adding functions such as ImageDataGenerator loaded the images, resized them, normalized pixel values and labeled them accordingly.
Creating a Model: We built and experimented with multiple models such as CNN and ANN to fit our data. We found our CNN model performed the best in terms of accuracy, precision and f1 score, as shown by our confusion matrix.
Deploying the Model: We deployed our model using Gradio to create a user interface, making it accessible through platforms like Hugging Face and other spaces for public use.
This project was particularly meaningful to my team and I because of its applications to the real world. We were able to create a model which can assist in early detection of lung diseases, potentially enabling timely intervention and treatment, which can significantly improve patient outcomes. This model can provide more accurate and consistent diagnoses, reducing the risk of human error and the need for defensive medicine. By deploying the model on Hugging Face, we were able to make it publicly accessible, which is beneficially to countries where advanced healthcare is limited.
After multiple rounds of training different models, we found our CNN model to fit our image data the best with an accuracy of 95%. We found this model to recognize and correctly diagnose the images in our dataset the best, and hence decided to deploy it to Hugging Face.
Although this model fits our database accurately, it does have its limitations: its performance is dependent on the quality and diversity of the training data, which may not cover all possible variations of lung diseases, and its accuracy in real-world clinical settings needs further validation.
In the near future, we plan to develop a research paper detailing the step-by-step process and the code that contributed to this project. We aim to make our project publicly accessible by contributing to the broader scientific community and supporting ongoing research in lung disease detection. Our detailed documentation and code can serve as a valuable resource for those seeking practical examples and insights into developing AI models for medical applications. We want to increase the scope of our project by adding patient data to improve results and validate the model further.
One of the goals we set as a team since the beginning was to improve the awareness of AI and its potential in improving healthcare outcomes. We hope to have our paper published in a STEM journal very soon!
This is our official, external outcome of the project: LUNG DISEASE CNN MODEL
Written by Nidhi Kulkarni from MEDILOQUY