algorithms – Sorting and grouping: What is the name of this type of problem? How can I implement the optimizations I've thought of?

I have a facial recognition algorithm that compares two images A and B and returns the probability that they match.

I also have 50,000 images and would like to sort these images into groups.

Here is the immediate way I thought to do this:

  1. Start with image 0, compare it to all 49,999 images. Store the similarities in a table
  2. Go to image 2, compare it to 49,998 images. Store the similarities in a table

At the end of all this I have left with a Verified_listimages that match, and I can basically feed them into a network to combine them, ie if I have a Verified_list so:

[[1,2,3],[2,5,6],[8,9]] (Verified_list is the same length as the number of images, so Verified_list[1] contains references to all images with which image_1 matches)

then the network graph combines them into:

[[1,2,3,5,6], [8,9]

Indicating that I have two groups.

Obviously, the treatment is huge, I think it equates to: nCr (50000.2), a huge number!

I would like to add logic to improve that. I can think of what needs to be done, but I do not know how to implement it.

Basically, I just want to jump as many images as possible without reducing the overall quality of the images. Verified_list

Say I got to picture 4. I can do a loop Verified_list[1] to know if image_4 is image_1, but it seems to me that if I get to image 39,543, I'd have to check all the 39,542 listes_vérifiées before determining whether image_39543 has been associated with a group or not

I have the impression of complicating a little too much this question … Is there a name for this kind of problem? Is there a better known way to do it?

There is also the problem where the recognition algorithm is not 100% accurate. That is, img_1 is img_2 and img_5, but img_5 is only img_1 and not img_2.