seo – How can a search engine crawl a dynamically generated website?

Short answer: That PHP code is run on the server before sending the response to the crawler, so by the time the page reaches the crawler, all that info is already populated.

For sites written using server-side languages such as your example, here’s the full lifecycle when a user visits a page:

  1. The user’s browser sends an HTTP request to the server for a certain path (such as /an/example/page/).

  2. The server receives the request and determines the appropriate server-side code to run to generate the page. It executes this code, if any (or none if it’s a static site).

  3. The server sends the final generated, by that point static HTML page back to the user’s browser.

Note that all the code is finished running on the server before the server actually sends any information back to the user’s browser (or a web crawler).

Things are a little different when the page is generated in part by client-side code (JavaScript) instead, which is a topic for a different discussion.

googlebot – Will high value of keep-alive help increase Google crawl rate

I have a website with thousands of pages with most of them indexed. The pages receive frequent updates (usually monthly). Google’s crawl rate is good, but not enough to capture all the changes before a month’s close. My current value of keep-alive is 5. Will increasing this have an impact on the crawl rate by helping crawl speed (due to persistent connection)?

Google image crawler won’t respect my robots.txt entry to not crawl images

I was looking for a way to prevent reverse image searching (namely I didn’t want for people who had a copy of one of my images to upload it to google and discover where it originated from). I created the following robots.txt file at put it at the root of my blogspot blog:

User-agent: *
Disallow: /hide*.jpg$
Disallow: /hide*.jpeg$
Disallow: /hide*.png$

User-agent: Googlebot-Image
Disallow: /hide*.jpg$
Disallow: /hide*.jpeg$
Disallow: /hide*.png$

With it, I was expecting that all jpg and png image files that start with the word hide (eg. hide1023939.jpg) would not appear in Google Images (or any other search engine). I was inspired by the official documentation here and here.

However Google Images keeps showing them, both when reverse searching as well as searching sitewise for any images. I’ve added many new images since I implemented the robots directives but even these new files get crawled.

As an observation the images on blogspot/blogger.com are hosted on http://1.bp.blogspot.com/....file.jpg instead of my own subdomain (http://domain.blogspot.com) and I wonder if this is the cause of the issue?
Any ideas how to solve this?

google analytics – Why the data range is different in the Crawl and report behavior tab

I use Google Analytics with the Web + App property. When I try to create custom reports using exploration in Analysis Hub, the date range available to select does not display data prior to February 6 while 39; other tabs in the left pane like Behavior, demographic data have options to select data prior to February 6. please let me know how to get data prior to February 6 in reports created using exploration.

Exploration:

Exploration

Behaviour:

Behaviour

html – Doubt when getting the value of an attribute using Scrapy Crawl

I am a beginner in programming and I am now learning Python. One of the utilities is being able to get data from websites using Scrapy. The fact is that I have tried it on different pages and that I have done well. When you test on this page www.sxyprn.com and you want to obtain the link which records the video, the output is an empty space. I am explaining a bit the code that I have.

The spider is as follows:

import scrapy
import requests

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from Sxyprn.items import SxyprnItem

class SpdvideosSpider(CrawlSpider):
    name = 'spdvideos'
    allowed_domains = ('sxyprn.com')
    start_urls = ('https://www.sxyprn.com')

    rules = {
        Rule(LinkExtractor(restrict_xpaths=('//div(@id="content_div")/div(3)/div(3)//div(2)/a')), follow= True,
                           callback = 'parse_item')
    }

    def parse_item(self, response):
        item = SxyprnItem()
        item('videos') = str(response.xpath('//video(@id="player_el")/@src').get())
        yield item

The rule allows it to reach all the pages where the videos are and in the parse_item I save the value of the src attribute of each video.

When you run the spider, this is the result (I only put a sample, but it takes around 35 values) where you can see that it saves space empty.

2020-03-22 14:11:32 (scrapy.core.engine) INFO: Spider opened
2020-03-22 14:11:32 (scrapy.extensions.logstats) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-22 14:11:32 (scrapy.extensions.telnet) INFO: Telnet console listening on 127.0.0.1:6023
2020-03-22 14:11:32 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: None)
2020-03-22 14:11:32 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: None)
2020-03-22 14:11:32 (scrapy.dupefilters) DEBUG: Filtered duplicate request:  - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2020-03-22 14:11:32 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.scraper) DEBUG: Scraped from <200 https://www.sxyprn.com/post/5e76138156bc8.html>
{'videos': ''}
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.engine) DEBUG: Crawled (200)  (referer: https://www.sxyprn.com)
2020-03-22 14:11:33 (scrapy.core.scraper) DEBUG: Scraped from <200 https://www.sxyprn.com/post/5e7637ca30adc.html>
{'videos': ''}
2020-03-22 14:11:33 (scrapy.core.scraper) DEBUG: Scraped from <200 https://www.sxyprn.com/post/5e766d272f600.html>
{'videos': ''}

With the inspector, I check the page structure and the src attribute, there is a value as shown below.


If I change the attribute to data-postid for example, if it returns the value.

What am i doing wrong? thank you so much

googlebot – Slow crawl speed for many pages (over 100,000) on a travel website

Context:

We have a relatively new hotel search site where users can freely search their preferences, such as "a child friendly hotel with bunk beds, good breakfast and clean rooms". There will be relevant comments displayed in each of the hotels in the results, according to the concepts mentioned in the request, in this case "child-friendly", "bunk beds", "breakfast", "clean".

We believe our website can offer unique value to travelers, and we can save users time by reading numerous reviews and finding related information. We have identified some concepts that we would like Google to index, for example, we have a referral page for "Boutique Hotels in Chicago, IL". Given the number of cities in the region, we have over 100,000 pages of this type.

however, Google currently indexes our pages at a rate of only ~ 350 pages per day. And more than 100K pages will take a year to be indexed. I would love to hear your suggestions / tips for speeding up the indexing speed.

Currently our ideas to improve the speed of indexing / SEO in general:

  1. Create internal links / improve navigation on the site – Is the creation of internal links important for SEO in this case? how to set up internal links as a hotel search site? (Search seems like a natural way to navigate the results. Perhaps Ariane's thread (city -> concepts, e.g. child-friendly hotel)?)
  2. Add a page on – state our mission and who we are.
  3. Rendered on the server side – the website is currently created in React.js, so Googlebot needs more resources to display each SEO page.

In the long term, we will reach out and build awareness of our website. However, given the current pandemic, we would like to focus more on the website / content itself.

Are there any other suggestions / comments on the above SEO ideas? Thank you very much for your time and your help!

How can I ask Google to crawl my site?

How can I ask Google to crawl my site?

googlebot – Why does Google create pages to crawl on my site?

For some reason, Google lists a series of pages that don't exist on my website, such as:

https://www.my_domain.com/index.php/about_us.php

He lists them as "Duplicate, Google chose a canonical different from the user" in the Search Console.

Google creates each combination of "real page" and nails on another page at the end.

My page index.php is not a folder, so why is Google crawling it as if it were a folder with all of my pages underneath?

dungeon crawl classics rpg – How does spell burn healing work with multiple ability scores

In DCC RPG, a magician can sacrifice certain points of strength, agility or endurance to give a unique bonus during a spell check. These ability point points do not disappear permanently, but can be healed.

Ability scores lost in this manner return when the sorcerer heals. Every day he doesn't try to cast spells, he recovers 1 ability score point.

(I underline.)

My question is what does this "1 point capacity score" mean? One point in total, increasing only one of the three ability scores that can be healed, or one point per ability score that has been used up? I can't find any other clarification in the book.

seo – Manual adjustment of the crawl rate after boosting the "Time Spent Downloading a page"?

I started using Cloudflare, and automatically the "Time spent downloading a page" went from 180 ms to 610 ms. It started to hurt the exploration budget of my website, which has over 2,000,000 pages.

I wondered if I should manually change the crawl rate using the "Google Search Console". Currently, the rate is set to be "optimized" automatically by Google.