Cross-Validated Regularization for Robust Mahalanobis Metric Learning

Mohammed Mohsen Mones

doi:10.47134/ppm.v3i1.2078

Authors

Mohammed Mohsen Mones Mohaghegh Ardabili University

DOI:

https://doi.org/10.47134/ppm.v3i1.2078

Keywords:

Mahalanobis Metric Learning, Regularization Techniques, Cross-Validation, Robust Machine Learning, Generalization Performance

Abstract

Conventional Mahalanobis metric learning (MML) algorithms exhibit significant sensitivity to outliers and noise in training data, leading to biased distance metrics with poor generalization performance on unseen data, to address this limitation, we propose a systematic framework integrating tunable regularization with K-fold cross-validation for robust metric learning. Specifically, we augment standard MML objectives with a Frobenius norm regularization term λ‖M‖²_F to penalize solution complexity and control overfitting. Crucially, we employ K-fold cross-validation as a data-driven mechanism to automatically determine the optimal regularization hyperparameter λ* that maximizes generalization potential, the resulting learned metric M* demonstrates enhanced resistance to noise and superior generalization capability. Empirical evaluation across 12 benchmark datasets (including real-world noisy data like Food-101N and CheXpert) confirms that our approach significantly outperforms non-regularized baselines and manually tuned alternatives: It reduces overfitting to noisy training constraints by 13.8–22.4% and improves test accuracy on distance-based tasks (k-NN classification, clustering) by 10.3–17.2% under severe noise conditions (40% label flips, 30% feature corruption), these results establish that the synergistic combination of mathematical regularization and cross-validated hyperparameter selection provides a principled, effective solution for learning reliable Mahalanobis metrics in noisy real-world environments

References

Acharya, A., Sanghavi, S., Dimakis, A. G., & Dhillon, I. S. (2025). Geometric Median Matching for Robust k-Subset Selection from Noisy Data. arXiv preprint arXiv:2504.00564.

Bang, J., Koh, H., Park, S., Song, H., Ha, J. W., & Choi, J. (2022). Online continual learning on a contaminated data stream with blurry task boundaries. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9275–9284. DOI: https://doi.org/10.1109/CVPR52688.2022.00906

Bertsimas, D., & Stellato, B. (2022). Online mixed-integer optimization in milliseconds. INFORMS Journal on Computing, 34 (4), 2229–2248. DOI: https://doi.org/10.1287/ijoc.2022.1181

Chambon, P., Delbrouck, J. B., Sounack, T., Huang, S. C., Chen, Z., Varma, M., ... & Langlotz, C. P. (2024). Chexpert plus: Augmenting a large chest x-ray dataset with text radiology reports, patient demographics and additional image formats. arXiv preprint arXiv:2405.19538.

Dean, J. (2022). A golden decade of deep learning: Computing systems & applications. Daedalus, 151 (2), 58–74. DOI: https://doi.org/10.1162/daed_a_01900

Ding, Y., Jia, M., Miao, Q., & Huang, P. (2021). Remaining useful life estimation using deep metric transfer learning for kernel regression. Reliability Engineering & System Safety, 212, 107583. DOI: https://doi.org/10.1016/j.ress.2021.107583

Dong, P., Li, L., Tang, Z., Liu, X., Pan, X., Wang, Q., & Chu, X. (2024). Pruner-zero: Evolving symbolic pruning metric from scratch for large language models. arXiv preprint arXiv:2406.02924.

Garnett, R. (2023). Bayesian optimization. Cambridge University Press. DOI: https://doi.org/10.1017/9781108348973

Ghojogh, B., Ghodsi, A., Karray, F., & Crowley, M. (2022). Spectral, probabilistic, and deep metric learning: Tutorial and survey. arXiv preprint arXiv:2201.09267. DOI: https://doi.org/10.1007/978-3-031-10602-6_11

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12 (1), 55–67. https://doi.org/10.1080/00401706.1970.10488634 DOI: https://doi.org/10.1080/00401706.1970.10488634

Huang, Y., Wang, Z., Liu, J., Chen, C., Li, P., Liu, W., & Chen, W. (2025). Metric Learning with LMNN-KSVM for Radar Target Detection. IEEE Transactions on Aerospace and Electronic Systems. DOI: https://doi.org/10.1109/TAES.2025.3591743

Karl, F., Pielok, T., Moosbauer, J., Pfisterer, F., Coors, S., Binder, M., ... & Bischl, B. (2023). Multi-objective hyperparameter optimization in machine learning—An overview. ACM Transactions on Evolutionary Learning and Optimization, 3 (4), 1–50. DOI: https://doi.org/10.1145/3610536

Khaertdinov, B., Ghaleb, E., & Asteriadis, S. (2021). Deep triplet networks with attention for sensor-based human activity recognition. 2021 IEEE International Conference on Pervasive Computing and Communications (PerCom), 1–10. IEEE. DOI: https://doi.org/10.1109/PERCOM50583.2021.9439116

Kurin, V., De Palma, A., Kostrikov, I., Whiteson, S., & Mudigonda, P. K. (2022). In defense of the unitary scalarization for deep multi-task learning. Advances in Neural Information Processing Systems, 35, 12169–12183. DOI: https://doi.org/10.52202/068431-0884

Liao, S., & Shao, L. (2022). Graph sampling based deep metric learning for generalizable person re-identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7359–7368. DOI: https://doi.org/10.1109/CVPR52688.2022.00721

Liao, T., Lei, Z., Zhu, T., Zeng, S., Li, Y., & Yuan, C. (2021). Deep metric learning for K nearest neighbor classification. IEEE Transactions on Knowledge and Data Engineering, 35 (1), 264–275. DOI: https://doi.org/10.1109/TKDE.2021.3090275

Loureiro, B., Sicuro, G., Gerbelot, C., Pacco, A., Krzakala, F., & Zdeborová, L. (2021). Learning gaussian mixtures with generalized linear models: Precise asymptotics in high-dimensions. Advances in Neural Information Processing Systems, 34, 10144–10157.

Luo, X., & Zhuang, X. (2022). $mathcal{X}$-Metric: An N-Dimensional Information-Theoretic Framework for Groupwise Registration and Deep Combined Computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (7), 9206–9224. DOI: https://doi.org/10.1109/TPAMI.2022.3225418

Malyuta, D., Reynolds, T. P., Szmuk, M., Lew, T., Bonalli, R., Pavone, M., & Açıkmeşe, B. (2022). Convex optimization for trajectory generation: A tutorial on generating dynamically feasible trajectories reliably and efficiently. IEEE Control Systems Magazine, 42 (5), 40–113. DOI: https://doi.org/10.1109/MCS.2022.3187542

Martins, M. S., Kalil, R. M. L., & Rosa, F. D. (2021). Sustainable neighbourhoods: applicable indicators through principal component analysis. Proceedings of the Institution of Civil Engineers-Urban Design and Planning, 174 (1), 25–36. DOI: https://doi.org/10.1680/jurdp.20.00058

Neamah, F. M., Aghdasi, H. S., Salehpour, P., & Sorkhabi, A. S. (2024). Proxy-based robust deep metric learning in the presence of label noise. Physica Scripta, 99 (7), 076013. DOI: https://doi.org/10.1088/1402-4896/ad5255

Shi, H., Yang, N., Tang, H., & Yang, X. (2022). aSGD: Stochastic gradient descent with adaptive batch size for every parameter. Mathematics, 10 (6), 863. DOI: https://doi.org/10.3390/math10060863

Tikhonov, A. N. (1943). On the stability of inverse problems. Doklady Akademii Nauk SSSR, 39 (5), 195–198.

Wang, C., Xin, C., & Xu, Z. (2021). A novel deep metric learning model for imbalanced fault diagnosis and toward open-set classification. Knowledge-Based Systems, 220, 106925. DOI: https://doi.org/10.1016/j.knosys.2021.106925

Wang, W., Liang, J., Liu, R., Song, Y., & Zhang, M. (2022). A robust variable selection method for sparse online regression via the elastic net penalty. Mathematics, 10 (16), 2985. DOI: https://doi.org/10.3390/math10162985

Xu, H., Chen, Y., & Zhang, D. (2024). Worth of prior knowledge for enhancing deep learning. Nexus, 1 (1). DOI: https://doi.org/10.1016/j.ynexs.2024.100003

Yang, L., Zhu, D., Liu, X., & Cui, P. (2023). Robust feature selection method based on joint L2, 1 norm minimization for sparse regression. Electronics, 12 (21), 4450. DOI: https://doi.org/10.3390/electronics12214450

Yates, L. A., Aandahl, Z., Richards, S. A., & Brook, B. W. (2023). Cross validation for model selection: a review with examples from ecology. Ecological Monographs, 93 (1), e1557. DOI: https://doi.org/10.1002/ecm.1557

Zabihzadeh, D., Tuama, A., Karami-Mollaee, A., & Mousavirad, S. J. (2023). Low-rank robust online distance/similarity learning based on the rescaled hinge loss. Applied Intelligence, 53 (1), 634–657. DOI: https://doi.org/10.1007/s10489-022-03419-1

Zhou, C., Meng, H., Li, M., & Zhou, Z. (2025). On Learning Label Noise Robust Networks via Regularization: A Topological View. IEEE Transactions on Neural Networks and Learning Systems. DOI: https://doi.org/10.1109/TNNLS.2025.3561368

Zhou, X., Zheng, X., Shu, T., Liang, W., Wang, K. I. K., Qi, L., ... & Jin, Q. (2023). Information theoretic learning-enhanced dual-generative adversarial networks with causal representation for robust OOD generalization. IEEE Transactions on Neural Networks and Learning Systems.