python – Multi-core OpenCV denoising

I have ~40,000 JPEG images from the Kaggle melanoma classification competition. I created the following functions to denoise the images:

# Denoising functions
def denoise_single_image(img_path):
    img = cv2.imread(f'../data/jpeg/{img_path}')
    dst = cv2.fastNlMeansDenoising(img, 10,10,7,21)
    cv2.imwrite(f'../processed_data/jpeg/{img_path}', dst)
    print(f'{img_path} denoised.')

def denoise(data):
    img_list = os.listdir(f'../data/jpeg/{data}')
    with concurrent.futures.ProcessPoolExecutor() as executor:
        tqdm.tqdm(executor.map(denoise_single_image, (f'{data}/{img_path}' for img_path in img_list)))

The denoise function uses concurrent.futures to map the denoise_single_image() function over the full list of images.

ProcessPoolExecutor() was used based on the assumption of denoising as a CPU-heavy task, rather than an I/O-intensitve task.

As it stands now, this function takes hours to run. With a CUDA-configured Nvidia GPU and 6 GB VRAM, I’d like to optimize this further.

Is there any way to improve the speed of this function?

Multi-core processing documentation
OpenCV documentation