Cleaning the harvested URLs

Hi,

I'm sure there would be an option, but I do not know which one or how that would be done. As we collect a lot of URLs, and the process involves deleting the duplicates.

Then I want to delete the URL with some words like;

Youtube.
wiki
CNN
bbc

So, what I want is perhaps to create a file or I found a blacklisted word and edited it, put those words in and delete them, but those URLs have always remained, so maybe there's something wrong with my way of doing it.

It would also be good to know if you could indicate how I can harvest so that these URLs containing these empty words are not harvested.

thanks again