My client’s website was hacked by a “pharma hack” which was a WORDPRESS website. Since then we reworked the website(design and logic) and it’s a completely new website that is no longer using WORDPRESS or any other CMS, it’s just plain PHP, JS and CSS files with a few forms.
The website is hosted on https://www.digitalocean.com/ and I’ve rebuilt the droplet that it was using, added a firewall, redirected all http traffic to https and the only thing that is the same is IP Address with the domain name.
After cleaning the server and website I’ve begun to clean the search results by using Google search console. In the tools google provides I tried using “URL Inspection tool” and requested indexing of the website, submitted a “sitemap.xml” and used “Removals tool” to remove cached content. But sadly the search results stayed the same.
Next thing I tried was returning a 410 status code for every page that doesn’t exist by using .htaccess file to redirect non-existing pages in my website to 404.php page with the code below.
header($_SERVER("SERVER_PROTOCOL")." 410 Gone");
After these changes I can see in apache logs that crawler bots (e.g SemrushBot, Dotbot, Googlebot, Petalbot and etc.) or some unknown user agents (e.g The Knowledge AI, ANTIPIDERSIA) are requesting infected pages(that no longer exist) and receives mostly 410 status code or sometimes 301 and right after a duplicate request with 410 status code.
18.104.22.168 - - (23/Jul/2021:18:24:51 +0000) "GET / HTTP/1.1" 301 557 "-" "ANTIPIDERSIA"
22.214.171.124 - - (23/Jul/2021:18:24:51 +0000) "GET / HTTP/1.1" 200 43030 "-" "ANTIPIDERSIA"
126.96.36.199 - - (23/Jul/2021:20:22:35 +0000) "GET /cialis-long-term-effects/ HTTP/1.1" 301 675 "-" "The Knowledge AI"
188.8.131.52 - - (23/Jul/2021:18:38:41 +0000) "GET /viagra-discounts/ HTTP/1.1" 301 659 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
184.108.40.206 - - (23/Jul/2021:18:38:42 +0000) "GET /viagra-discounts/ HTTP/1.1" 410 5502 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I was hoping that if bots receive 410 status code they will stop indexing them and remove them, but like before my search results were not starting to clear up.
Next I found out I can disavow links to my website by using this tool. I copied all unwanted search results and put them in a .txt file and submitted it to the tool.
All this is the span of a few days and I understand that it could take a while to clear up, but what’s bothering me is that search results length are changing in number. Sometimes they go down and sometimes they go up. One day there are 60+ other day it’s 100+ the other day its 80+ and so on…
To inspect search results I use keyword site:sitename and this gives different result lengths depending whether I include www, non-www, http or https. There are only 6 pages in my website so only 6 results should be present.
Currently I have no robots.txt to allow bots to crawl.
Am I missing something? Can someone point me to a right direction?