[2-G-3-06] Depth Image Multi-scale Fusion Network for Food Nutrition Estimation
Food nutrient, Nutrition estimation, Food composition, Deep learning
Automatic nutrition estimation methods based on food images using artificial intelligence provide a promising solution for monitoring daily nutritional intake. Although nutrient estimation is convenient, the challenge of limited accuracy remains a significant issue. To solve this problem, we propose a Depth Image Multi-scale Fusion Network (DIMF-Net), a nutrition estimation method which aimed to improve the accuracy of nutrition assessment based on the state-of-the-art method.In DIMF-Net, we used Residual Neural Network to encode RGB images and RGB-Depth images. We designed a Multi-scale Feature Fusion Module (MFFM) to fuse both image features with predicted depth information. And we used the Dual Attention after MFFM. We evaluated the predictive efficiency on five nutrition contents (calories, mass, fat, carbohydrate and protein) using Nutrition5k dataset. We compared the predictive values among the ground truth, the state-of-the-art method and our method by Friedman test and Bonferroni correction. The mean absolute error (MAE) and the percentage means absolute error (PMAE) were used as metrics for determining the accuracy of the evaluation process. There was a significant difference in calories between the ground truth and the state-of-the-art method (P<0.01), but no difference between the ground truth and our method (P=0.33). The mean value of MAE and PMAE of total five contents by our method reached 15.1 and 18.8%, improved 1.0 and 1.4% by compared with the state-of-the-art method, respectively. Especially, the PMAE of fat and carbohydrate by our method reached 23.0% and 20.7%, improved by both 2.0%. These results show that our method outperforms the state-of-the-art method. But our method still faces challenges, such as food obscuration and coverage. We will explore more methods to solve these problems and improve the accuracy.
