О сложности двух задач поиска кластеров с большой мощностью

Нещадим С. М., Хандеев В. И.

Дискретный анализ и исследование операций, 32, 4, 172-190 (2025)

УДК 519.8
DOI: 10.33048/daio.2025.32.834

Аннотация:

Для конечного множества точек евклидова пространства рассматриваются задачи поиска его непересекающихся подмножеств. В одной из задач мощность каждого из подмножеств должна быть не меньше заданного числа. В другой задаче мощность всех подмножеств одинакова, а их объединение должно совпадать с исходным множеством. В обеих задачах дополнительно требуется, чтобы для каждого подмножества сумма квадратов расстояний до центроида не превосходила заданной величины. Доказано, что задачи NP-полны в сильном смысле в случае, когда число кластеров равно двум, а размерность пространства является частью входа задачи. Кроме того, доказано, что задачи NP-полны в одномерном случае для любого фиксированного числа кластеров.

Ил. 2, библиогр. 16.

Литература:

Pérez-Ortega J., Almanza-Ortega N. N., Vega-Villalobos A. [et al.]. The $K$-means algorithm evolution // Introduction to data science and machine learning. Rijeka: IntechOpen, 2019. 22 p. DOI: 10.5772/intechopen.85447.
Ikotun A. M., Ezugwu A. E., Abualigah L. [et al.]. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data // Inf. Sci. 2023. V. 622. P. 178–210. DOI: 10.1016/j.ins.2022.11.139.
Grant R. W., McCloskey J., Hatfield M. [et al.]. Use of latent class analysis and $k$-means clustering to identify complex patient profiles // JAMA Netw. Open. 2020. V. 3, No. 12. Article ID e2029068. 13 p. DOI: 10.1001/ jamanetworkopen.2020.29068.
Dhanachandra N., Manglem K., Chanu Y. J. Image segmentation using k-means clustering algorithm and subtractive clustering algorithm // Proc. Comput. Sci. 2015. V. 54. P. 764–771. DOI: 10.1016/j.procs.2015.06.090.
Kumari R., Sheetanshu, Singh M. K., Jha R. Anomaly detection in network traffic using $K$-mean clustering // 2016 3rd Int. Conf. Recent Advances in Information Technology (Dhanbad, India, Mar. 3–5, 2016). Piscataway: IEEE, 2016. P. 387–393. DOI: 10.1109/RAIT.2016.7507933.
Aloise D., Deshpande A., Hansen P. [et al.]. NP-hardness of Euclidean sum-of-squares clustering // Mach. Learn. 2009. V. 75, No. 2. P. 245–248. DOI: 10.1007/s10994-009-5103-0.
Jørgensen A. G., Larsen K. G., Mathiasen A. [et al.]. Fast exact $k$-means, $k$-medians and Bregman divergence clustering in 1D. Ithaca, NY, 2017. 16 p. (ePrint Archive / Cornell Univ.; arXiv:1701.07204). DOI: 10.48550/arXiv.1701. 07204.
Khodadadi A., Saeidi S. Discovering the maximum $k$-clique on social networks using bat optimization algorithm // Comput. Soc. Netw. 2021. V. 8, No. 1. Article ID 6. 15 p. DOI: 10.1186/s40649-021-00087-y.
Tomita E., Akutsu T., Matsunaga T. Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics // Biomedical engineering, trends in electronics, communications and software. Rijeka: Intech Open, 2011. P. 625–640. DOI: 10.5772/13245.
Кельманов А. В., Пяткин А. В. NP-полнота некоторых задач выбора подмножества векторов // Дискрет. анализ и исслед. операций. 2010. Т. 17, № 5. С. 37–45.
Aggarwal A., Imai H., Katoh N., Suri S. Finding $k$ points with minimum diameter and related problems // J. Algorithms. 1991. V. 12, No. 1. P. 38–56. DOI: 10.1016/0196-6774(91)90022-Q.
Kel’manov A. V., Ruzankin P. S. An accelerated exact algorithm for the one-dimensional $M$-variance problem // Pattern Recognit. Image Anal. 2019. V. 29, No. 4. P. 573–576. DOI: 10.1134/S1054661819040072.
Пяткин А. В. О сложности задачи выбора кластеров большого размера // Дискрет. анализ и исслед. операций. 2024. Т. 31, № 2. С. 136–143.
Кельманов А. В., Пяткин А. В., Хандеев В. И. О сложности некоторых максиминных задач кластеризации // Тр. Ин-та математики и механики. 2018. Т. 24, № 4. C. 189–198.
Garey M. R., Johnson D. S. Computers and intractability: A guide to the theory of NP-completeness. San Francisco: Freeman, 1979. 338 p.
Khandeev V. I., Neshchadim S. M. Pseudo-polynomial algorithms for some problems of searching for the largest subsets // Mathematical optimization theory and operations research: Recent trends. Rev. Sel. Pap. 23th Int. Conf. (Omsk, Russia, June 30 – July 6, 2024). Cham: Springer, 2024. P. 319–333. (Commun. Comput. Inf. Sci.; V. 2239). DOI: 10.1007/978-3-031-73365-9_22.

Исследование выполнено в рамках государственного задания Института математики им. С. Л. Соболева (проект № FWNF–2022–0015). Дополнительных грантов на проведение или руководство этим исследованием получено не было.

Нещадим Сергей Михайлович

Новосибирский гос. университет,
ул. Пирогова, 2, 630090 Новосибирск, Россия

E-mail: s.neshchadim@g.nsu.ru

Хандеев Владимир Ильич

Институт математики им. С. Л. Соболева,
пр. Акад. Коптюга, 4, 630090 Новосибирск, Россия

E-mail: khandeev@math.nsc.ru

Статья поступила 5 мая 2025 г.
После доработки — 25 августа 2025 г.
Принята к публикации 22 сентября 2025 г.

Abstract:

Clustering problems for a finite point set in Euclidean space are considered. The first problem requires each subset to have cardinality no smaller than a given threshold (not necessarily covering the entire set), while the second one requires all subsets to have the same cardinality and form a partition of the given set. In both problems for each subset, it is additionally required that the sum of squared distances to the centroid does not exceed a given value. Both problems are proven to be strongly NP-complete when the number of clusters is two and the space dimension is part of the input. Furthermore, NP-completeness is established for the one-dimensional case with an arbitrary fixed number of clusters.

Illustr. 2, bibliogr. 16.

References:

J. Pérez-Ortega, N. N. Almanza-Ortega, A. Vega-Villalobos, [et al.], The $K$-means algorithm evolution, in Introduction to Data Science and Machine Learning (IntechOpen, Rijeka, 2019), DOI: 10.5772/intechopen.85447.
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, [et al.], $K$-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data, Inf. Sci. 622, 178–210 (2023), DOI: 10.1016/j.ins.2022.11.139.
R. W. Grant, J. McCloskey, M. Hatfield, [et al.], Use of latent class analysis and $k$-means clustering to identify complex patient profiles, JAMA Netw. Open. 3 (12), ID e2029068 (2020), DOI: 10.1001/jamanetworkopen.2020. 29068.
N. Dhanachandra, K. Manglem, and Y. J. Chanu, Image segmentation using $k$-means clustering algorithm and subtractive clustering algorithm, Proc. Comput. Sci. 54, 764–771 (2015), DOI: 10.1016/j.procs.2015.06.090.
R. Kumari, M. K. Sheetanshu, Singh, and R. Jha, Anomaly detection in network traffic using $K$-mean clustering, in 2016 3rd Int. Conf. Recent Advances in Information Technology (Dhanbad, India, Mar. 3–5, 2016) (IEEE, Piscataway, 2016), pp. 387–393, DOI: 10.1109/RAIT.2016.7507933.
D. Aloise, A. Deshpande, P. Hansen, [et al.], NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn. 75 (2), 245–248 (2009), DOI: 10.1007/ s10994-009-5103-0.
A. G. Jørgensen, K. G. Larsen, A. Mathiasen, [et al.], Fast exact $k$-means, $k$-medians and Bregman divergence clustering in 1D (Ithaca, NY, 2017) (e-Print Archive / Cornell Univ., arXiv:1701.07204), DOI: 10.48550/ arXiv.1701.07204.
A. Khodadadi and S. Saeidi, Discovering the maximum $k$-clique on social networks using bat optimization algorithm, Comput. Soc. Netw. 8 (1), ID 6 (2021), DOI: 10.1186/s40649-021-00087-y.
E. Tomita, T. Akutsu, and T. Matsunaga, Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics, in Biomedical Engineering, Trends in Electronics, Communications and Software (Intech Open, Rijeka, 2011), pp. 625–640, DOI: 10.5772/13245.
A. V. Kel’manov and A. V. Pyatkin, NP-completeness of some problems of choosing a vector subset, Diskretn. Anal. Issled. Oper. 17 (5), 37–45 (2010) [Russian] [J. Appl. Ind. Math. 5 (3), 352–357 (2011) DOI: 10.1134/ S1990478911030069].
A. Aggarwal, H. Imai, N. Katoh, and S. Suri, Finding k points with minimum diameter and related problems, J. Algorithms 12 (1), 38–56 (1991), DOI: 10.1016/0196-6774(91)90022-Q.
A. V. Kel’manov and P. S. Ruzankin, An accelerated exact algorithm for the one-dimensional $M$-variance problem, Pattern Recognit. Image Anal. 29 (4), 573–576 (2019), DOI: 10.1134/S1054661819040072.
A. V. Pyatkin, On the complexity of the problem of choice of large clusters, Diskretn. Anal. Issled. Oper. 31 (2), 136–143 (2024) [Russian] [J. Appl. Ind. Math. 18 (2), 312–315 (2024), DOI: 10.1134/S1990478924020121].
A. V. Kel’manov, A. V. Pyatkin, and V. I. Khandeev, On the complexity of some max–min clustering problems, Tr. Inst. Mat. Mekh. 24 (4), 189–198 (2018) [Russian] [Proc. Steklov Inst. Math. 309 (Suppl. 1), S65–S73 (2020) DOI: 10.1134/S0081543820040082].
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).
V. I. Khandeev and S. M. Neshchadim, Pseudo-polynomial algorithms for some problems of searching for the largest subsets, in Mathematical Optimization Theory and Operations Research: Recent Trends, Rev. Sel. Pap. 23th Int. Conf. (Omsk, Russia, June 30 – July 6, 2024) (Springer, Cham, 2024), pp. 319–333 (Commun. Comput. Inf. Sci., V. 2239), DOI: 10.1007/ 978-3-031-73365-9_22.

электронное сетевое издание
«Дискретный анализ и исследование операций»

О сложности двух задач поиска кластеров с большой мощностью