Since early diagnosis and treatment of gastric cancer increases the survival rate, regular gastric cancer screening is recommended for the high-risk group of gastric cancer. Objective risk stratification and assessment would be helpful to develop personalized strategies for follow-up screening. Using medical annual check-up data, we aim to develop machine learning (ML)-based risk stratification models for gastric cancer.
Comprehensive medical annual check-up data, including endoscopic findings and blood test results, were collected from 129,223 patients who visited one of the largest medical screening facilities in South Korea. We trained the models using several survival-based ML algorithms (e.g., Extreme Gradient Boosting [XGB] Survival, DeepSurv, Random Survival Forest) as well as a conventional Cox Proportional Hazards (CPH) regression. Our model performance was also compared to previous works' benchmark models and features. We also used SHapley Additive exPlanations (SHAP) analysis to explain the model’s predictions.
The XGBoost Survival model with sixteen clinically explainable features achieved the best performance (avg. c-index: 0.78). Among others, Helicobacter pylori (H. pylori) infection, chronic atrophic gastritis, and intestinal metaplasia are the most significant risk factors contributing to cancer development. Explicit explanations of how models make their predictions are well-aligned with clinical intuitions.
Our model could serve as a basis for creating a clinical decision support system to help clinicians for assessing a patient’s individual gastric cancer risk. We expect that the results of this work will be helpful for further studies on the screening interval of gastroscopy according to the individual gastric cancer risk.