performance – Why does my site load slower on the first attempt from an incognito browser?

Chrome’s incognito window uses a fresh cache, so you could say that Chrome has two caches – a “normal” cache and an “incognito” cache. When you close ALL incognito windows and tabs, the incognito cache gets cleared. With default Chrome settings, the normal cache persists across windows and browsing sessions.

It takes extra time for your site to load in incognito because your browser is re-downloading a lot of the page’s resources that it would otherwise have already cached, because it’s working with a fresh incognito cache.

You can improve load times for repeat visitors by making sure your site’s resources are being served with long-life cache headers where appropriate, so the browser won’t re-download them for a long time.

You can improve load times for first-time visitors by making sure your site’s resources are lean, so don’t include tons of JavaScript, images, or other things that slow down page load – or load them on demand with e.g. the loading="lazy" image attribute or async and defer JavaScript attributes. This may come down to going lean on plugins.

You can improve load times for both by making sure your server hardware is fast (i.e. not on over-provisioned shared hosting) and your server software is lean (e.g. not bogged down by too many plugins).

performance – Production database so much slower than development database?

I am running a script on both servers, one in my docker local developement system, and one on production. However, the same script is running much slower on production than on development. I am using MariaDB 10.5 on both. My developement system has 12 GB RAM while my production is 32 GB RAM. I temporary turned off nginx and ran the test at midnight where there were basically no request being made. It’s also the exact same database. I also ran these queries directly on MySQL console and still the speed was much slower.

This is in my development:
enter image description here

This is in my production:
enter image description here

I’m not asking for someone to solve this but how can I investigate this? What can be the possible culprits?

htop on development while running the script
enter image description here

htop on production while running the script
enter image description here

iotop had similar results.

I’m not asking for someone to solve this but how can I investigate this? What can be the possible culprits?

performance tuning – Why ParallelTable is even slower than Table for large tensor construction?

I would like to create a large rank-4 tensor by using both Table and ParallelTable.

What is actually constructed is given below:
$mathcal{L}_{m_1,m_2,V_1,V_2}=ileft(xi^*_2(m_1,v_1)-xi^*_2(m_2,v_2)right)V_d(m_1,v_1)V_d(v_2,m_2)$

where the function $xi_2(m,v;V)=ifrac{beta_2^2}{2beta_1^2}left(1-left(e^{frac{E_{v}-E_m-V/2}{kT}}+1right)^{-1}right)sqrt{4beta^2_1-(E_{v}-E_m-V/2)^2}$

and matrix elements $V_d(m,v)=langle m|vrangle=e^{-lambda^2/2Omega^2}sum_{i=0}^{nu}sum_{j=0}^{m}delta_{m-j,nu-i}(-1)^jleft(frac{lambda}{Omega}right)^{i+j}frac{1}{i!j!}sqrt{frac{m!nu!}{(m-j)!(v-i)!}}$

But I don’t know why ParallelTable didn’t give any improvement on the performance.

enter image description here
See the code below

Clear("Global`*")
Ns = 41; (*number of basis*)
Nb = 7;
V = 1.84;
(Lambda) = 0.3;
(CapitalOmega) = 0.5;
(Epsilon)0 = 0.5;
(Beta)1 = 1; (Beta)2 = 0.05; kT = 0.0259/300*10; (*10K*)
(*Franck-Condon factors*)

FK = E^(-(Lambda)^2/(2 (CapitalOmega)^2)) Table(!(
*UnderoverscriptBox(((Sum)), (i = 0), ((Nu)))(
*UnderoverscriptBox(((Sum)), (j = 
        0), (m))((KroneckerDelta(m - j, (Nu) - i))) 
*SuperscriptBox((((-1))), (j)) 
*SuperscriptBox(((
*FractionBox(((Lambda)), ((CapitalOmega))))), (i + j)) 
*FractionBox((1.), ((i!) (j!))) 
*SqrtBox(
FractionBox(((m!) ((Nu)!)), ((((m - j))!) ((((Nu) - 
             i))!)))))), {m, 0, Ns - 1}, {(Nu), 0, Ns - 1});
(*Eigenenergies of oscillator*)
Em = Table((CapitalOmega) (m + 1/2), {m, 0, Ns - 1});
(*Eigenenergies of shifted oscillator*)
E(Nu) = Table((CapitalOmega) ((Nu) + 1/
       2) + (Epsilon)0 - (Lambda)^2/(CapitalOmega), {(Nu), 0, 
    Ns - 1});
(*Define elementary functions*)
cf = Compile({{x, _Real}}, 
   If(x < -300., 1., If(x > 300., 0., 1./(1. + Exp(x)))), 
   CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
   Parallelization -> True, RuntimeOptions -> "Speed");
c(CapitalGamma)0 = 
  Compile({{E, _Real}, {qV, _Real}, {(Beta)1, _Real}, {(Beta)2, 
_Real}}, If(
    Abs(E - qV/2.) <= 
     2 (Beta)1, (Beta)2^2/(Beta)1^2 Sqrt(
      4 (Beta)1^2 - (E - qV/2.)^2), 0.), CompilationTarget -> "C", 
   RuntimeAttributes -> {Listable}, Parallelization -> True, 
   RuntimeOptions -> "Speed");
(Xi)2L(E_, V_) := 
  I/2 (1 - cf((E - V/2)/kT)) c(CapitalGamma)0(E, 
    V, (Beta)1, (Beta)2);
(Xi)2Lmv = 
Table((Xi)2L(E(Nu)((v + 1)) - Em((m + 1)), V), {m, 0, Ns - 1}, {v,
 0, Ns - 1});
Table( (Conjugate((Xi)2Lmv((m1, 
       V1))) - (Xi)2Lmv((m2, V2))) (FK((m1, V1)) FK((m2, V2)))
   , {m1, 1, Ns}, {V1, 1, Ns}, {m2, 1, Ns}, {V2, 1, 
    Ns}); // AbsoluteTiming
ParallelTable( (Conjugate((Xi)2Lmv((m1, 
       V1))) - (Xi)2Lmv((m2, V2))) (FK((m1, V1)) FK((m2, V2)))
   , {m1, 1, Ns}, {V1, 1, Ns}, {m2, 1, Ns}, {V2, 1, 
    Ns}); // AbsoluteTiming

Project Euler #12 in Julia slower than Python?

My code in Julia, almost identical as the Python code (see below), runs in 4.6 s while the Python version runs in 2.4 s. Obviously there is a lot or room for improvement.

function Problem12()
    #=
     The sequence of triangle numbers is generated by adding the natural
    numbers. So the 7th triangle number would be:
    1 + 2 + 3 + 4 + 5 + 6 + 7 = 28.

    The first ten terms would be:
    1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

    Let us list the factors of the first seven triangle numbers:

     1: 1
     3: 1,3
     6: 1,2,3,6
    10: 1,2,5,10
    15: 1,3,5,15
    21: 1,3,7,21
    28: 1,2,4,7,14,28

    We can see that 28 is the first triangle number to have over five divisors.

    What is the value of the first triangle number to have over five hundred
    divisors?
    =#

    function num_divisors(n)
        res = floor(sqrt(n))
        divs = ()
        for i in 1:res
            if n%i == 0
                append!(divs,i)
            end
        end
        if res^2 == n
            pop!(divs)
        end
        return 2*length(divs)
    end

    triangle = 0
    for i in Iterators.countfrom(1)
        triangle += i
        if num_divisors(triangle) > 500
            return string(triangle)
        end
    end
end

Python version below:

import itertools
from math import sqrt, floor


# Returns the number of integers in the range (1, n) that divide n.
def num_divisors(n):
    end = floor(sqrt(n))
    divs = ()
    for i in range(1, end + 1):
        if n % i == 0:
            divs.append(i)
    if end**2 == n:
        divs.pop()
    return 2*len(divs)


def compute():
    triangle = 0
    for i in itertools.count(1):
        # This is the ith triangle number, i.e. num = 1 + 2 + ... + i =
        # = i*(i+1)/2
        triangle += i
        if num_divisors(triangle) > 500:
            return str(triangle)

mysql – Why is limit 0,1 slower than limit 0, 17

I’m trying to analyze why the following query is slower with LIMIT 0,1 than LIMIT 0,100

I’ve added SQL_NO_CACHE for testing purposes.

Query:

 SELECT 
  SQL_NO_CACHE  SQL_CALC_FOUND_ROWS wp_posts.*, 
  low_stock_amount_meta.meta_value AS low_stock_amount
FROM 
  wp_posts 
  LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id 
  LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id 
  AND low_stock_amount_meta.meta_key = '_low_stock_amount' 
WHERE 
  1 = 1 
  AND wp_posts.post_type IN ('product', 'product_variation') 
  AND (
    (wp_posts.post_status = 'publish')
  ) 
  AND wc_product_meta_lookup.stock_quantity IS NOT NULL 
  AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock') 
  AND (
    (
      low_stock_amount_meta.meta_value > '' 
      AND wc_product_meta_lookup.stock_quantity <= CAST(
        low_stock_amount_meta.meta_value AS SIGNED
      )
    ) 
    OR (
      (
        low_stock_amount_meta.meta_value IS NULL 
        OR low_stock_amount_meta.meta_value <= ''
      ) 
      AND wc_product_meta_lookup.stock_quantity <= 2
    )
  ) 

ORDER BY 
  wp_posts.ID DESC 
LIMIT 
  0, 1

Explains shows the exact same output

1   SIMPLE  wp_posts    index   PRIMARY,type_status_date    PRIMARY 8   NULL    27071   Using where
1   SIMPLE  low_stock_amount_meta   ref post_id,meta_key    meta_key    767 const   1   Using where
1   SIMPLE  wc_product_meta_lookup  eq_ref  PRIMARY,stock_status,stock_quantity,product_id  PRIMARY 8   woocommerce-admin.wp_posts.ID   1   Using where

The average query time is 350ms with LIMIT 0,1

The average query time is 7ms with LIMIT 0,100

The query performance gets faster starting with LIMIT 0,17

I’ve added another column to the order by clause as suggested in this question, but that triggers Using filesort in the explain output

Order by wp_posts.post_date, wp_posts.ID desc

1   SIMPLE  wp_posts    ALL PRIMARY,type_status_date    NULL    NULL    NULL    27071   Using where; Using filesort
1   SIMPLE  low_stock_amount_meta   ref post_id,meta_key    meta_key    767 const   1   Using where
1   SIMPLE  wc_product_meta_lookup  eq_ref  PRIMARY,stock_status,stock_quantity,product_id  PRIMARY 8   woocommerce-admin.wp_posts.ID   1   Using where

Is there a way to work around it without altering indices and why is this happening?

It’s also interesting that the query time improves starting with LIMIT 0,17. I’m not sure why 17 is a magic number here.

Update 1: I just tried adding FORCE INDEX(PRIMARY) and now LIMIT 0,100 has the same performance as LIMIT 0,1 smh

c++ – Is a 32 bits game slower compared to 64 bits?

I was developing a graphics engine from scratch using Direct3D11 and some APIs, but I stumbled in a situation where it would benefical to me to use a certain library, but it’s binaries are only available on 32 bits. I’ve changed my engine to 32 bits and didn’t see much difference, would it be troublesome to use it on 32 bits?

python – One data object instead of array of objects – why is it slower?

I’m currently writing a game of life with small extra features but I ran into a problem – in my first try i decided to use simple list of Cell objects:

class Cell:
    def __init__(self, x, y, is_alive):
        self.is_alive = is_alive
        self.x = x
        self.y = y
        self.will_be_alive = False
        self.neighbours = 0

    def kill(self):
        self.will_be_alive = False

    def revive(self):
        self.will_be_alive = True

    def iterate(self):
        self.is_alive = self.will_be_alive

but now I’m using different approach:

class Cells:
    def __init__(self, width, height):
        self.current_states = ((False for i in range(width)) for j in range(height))
        self.next_states = ((False for i in range(width)) for j in range(height))
        self.neighbours = ((0 for i in range(width)) for j in range(height))

as you can see, instead of list of objects it’s now object with lists. I expected my program to run slightly faster, but it’s totally opposite – it’s 20 fps less. I think, that problem may be in iterate method in second way:

def next_state(cells):
    for i, row in enumerate(cells.current_states):
        for j, cell in enumerate(row):
            if cells.neighbours(i)(j) > 0:
                count_state(cells, j, i)
            elif cells.neighbours(i)(j) == 0 and cells.current_states(i)(j):
                cells.next_states(i)(j) = False
    for i, row in enumerate(cells.current_states):
        for j, cell in enumerate(row):
            neigh_iterate(cells, j, i)


def count_state(cells, x, y):
    nei = neighboors(cells.current_states, x, y)
    if (nei > 3 or nei < 2) and cells.current_states(y)(x):
        cells.next_states(y)(x) = False
    elif nei == 3 and not cells.current_states(y)(x):
        cells.next_states(y)(x) = True


def neigh_iterate(cells, x, y):
    prev = cells.current_states(y)(x)
    iterate(cells, x, y)
    if cells.current_states(y)(x) != prev:
        if cells.current_states(y)(x):
            add_neighbour(cells.neighbours, x, y, 1)
        else:
            add_neighbour(cells.neighbours, x, y, -1)


def neighboors(current_states, x, y):
    how_many = -1 if current_states(y)(x) else 0
    for i in range(-1, 2):
        for j in range(-1, 2):
            if current_states((y + i) % BOARD_HEIGHT)((x + j) % BOARD_WIDTH):
                how_many += 1
    return how_many


def iterate(cells, x, y):
    cells.current_states(y)(x) = cells.next_states(y)(x)

Do you have any ideas guys?

postgresql – One-row postgres query as CTE/subquery much slower when passing subquery field into function / maybe related to inlining?

I’m guessing here, since I don’t have the plans or any of that, but what I would do is the following:

WITH t1 AS (
        SELECT
            (SELECT COUNT(*) FROM racing.all_computable_xformula_bday_combos) AS all_count,
            (SELECT COUNT(*) FROM racing.xday_todo_all) AS todo_count,
            (SELECT COUNT(*) FROM racing.xday) AS xday_row_count
        OFFSET 0 -- this is to prevent inlining
)

SELECT
            t1.all_count,
            t1.all_count-t1.todo_count AS done_count,
            t1.todo_count,
            t1.xday_row_count,
            -- the line below is the only difference to Query1:

All the same to this point – and then:

  CASE
    WHEN t1.allcount = 0 THEN NULL -- (or maybe 0 might be suitable?)
    ELSE (t1.todo_count::REAL/t1.allcount::REAL) * 100
  AS todo_percentage
FROM t1;

The use of the casts to ::REAL means that you will only get a percentage accurate to 6 decimal places (see the PostgreSQL documentation here), but I have rarely come across situations where more than this was required.

I seached for the default precision of the FLOAT type in PostgreSQL and couldn’t find anything, so I did a little experiment here! You have to case to ::REAL twice (numerator and denominator) if you only want 6 decimal places – bit of a surprise really!

The above might be helpful in this case – because I think you’re possibly calling the functions you call in the CTE for each row of t1?

You might want to check out the plans as I suggested in my comments?

There are other ways and means of doing what you require…

Take a look here for a couple of suggestions from the PostgreSQL site if you don’t require an exact count and an exact percentage. And then there is (yet another) magisterial answer by @Erwin Brandstetter here – he gives you a few ways to accomplish your goal and explains the pros and cons of each…

postgresql – Why is OR statement slower than UNION

Database version: Postgresql 12.6

I have a table with 600000 records.

The table has columns:

  • name (varchar)
  • location_type (int) enum values: (1,2,3)
  • ancestry (varchar)

Indexes:

The ancestry column is a way to build a tree where every row has an ancestry containing all parent ids separated by “https://dba.stackexchange.com/”

Consider the following example:

id name ancestry
1 root null
5 node ‘1’
12 node ‘1/5’
22 leaf ‘1/5/12’

The following query takes 686 ms to execute:

SELECT * FROM geolocations
WHERE EXISTS (
   SELECT 1 FROM geolocations g2
   WHERE g2.ancestry = 
      CONCAT(geolocations.ancestry, "https://dba.stackexchange.com/", geolocations.id)
)

This query runs in 808 ms seconds:

SELECT * FROM geolocations
WHERE location_type = 2

When combining both queried with an OR it takes around 4 seconds 475 ms to finish if it ever finishes.

SELECT * FROM geolocations
WHERE EXISTS (
   SELECT 1 FROM geolocations g2
   WHERE g2.ancestry = 
      CONCAT(geolocations.ancestry, "https://dba.stackexchange.com/", geolocations.id)
) OR location_type = 2

Explain:

(
  {
    "Plan": {
      "Node Type": "Seq Scan",
      "Parallel Aware": false,
      "Relation Name": "geolocations",
      "Alias": "geolocations",
      "Startup Cost": 0,
      "Total Cost": 2760473.54,
      "Plan Rows": 582910,
      "Plan Width": 68,
      "Filter": "((SubPlan 1) OR (location_type = 2))",
      "Plans": (
        {
          "Node Type": "Index Only Scan",
          "Parent Relationship": "SubPlan",
          "Subplan Name": "SubPlan 1",
          "Parallel Aware": false,
          "Scan Direction": "Forward",
          "Index Name": "index_geolocations_on_ancestry",
          "Relation Name": "geolocations",
          "Alias": "g2",
          "Startup Cost": 0.43,
          "Total Cost": 124.91,
          "Plan Rows": 30,
          "Plan Width": 0,
          "Index Cond": "(ancestry = concat(geolocations.ancestry, "https://dba.stackexchange.com/", geolocations.id))"
        }
      )
    },
    "JIT": {
      "Worker Number": -1,
      "Functions": 8,
      "Options": {
        "Inlining": true,
        "Optimization": true,
        "Expressions": true,
        "Deforming": true
      }
    }
  }
)

While combining them with a union takes 1 sec 916 ms

SELECT * FROM geolocations
WHERE EXISTS (
   SELECT 1 FROM geolocations g2
   WHERE g2.ancestry = 
      CONCAT(geolocations.ancestry, "https://dba.stackexchange.com/", geolocations.id)
) UNION SELECT * FROM geolocations WHERE location_type = 2

Explain

(
  {
    "Plan": {
      "Node Type": "Unique",
      "Parallel Aware": false,
      "Startup Cost": 308693.44,
      "Total Cost": 332506.74,
      "Plan Rows": 865938,
      "Plan Width": 188,
      "Plans": (
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Startup Cost": 308693.44,
          "Total Cost": 310858.29,
          "Plan Rows": 865938,
          "Plan Width": 188,
          "Sort Key": (
            "geolocations.id",
            "geolocations.name",
            "geolocations.location_type",
            "geolocations.pricing",
            "geolocations.ancestry",
            "geolocations.geolocationable_id",
            "geolocations.geolocationable_type",
            "geolocations.created_at",
            "geolocations.updated_at",
            "geolocations.info"
          ),
          "Plans": (
            {
              "Node Type": "Append",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Startup Cost": 15851.41,
              "Total Cost": 63464.05,
              "Plan Rows": 865938,
              "Plan Width": 188,
              "Subplans Removed": 0,
              "Plans": (
                {
                  "Node Type": "Hash Join",
                  "Parent Relationship": "Member",
                  "Parallel Aware": false,
                  "Join Type": "Inner",
                  "Startup Cost": 15851.41,
                  "Total Cost": 35074.94,
                  "Plan Rows": 299882,
                  "Plan Width": 68,
                  "Inner Unique": true,
                  "Hash Cond": "(concat(geolocations.ancestry, "https://dba.stackexchange.com/", geolocations.id) = (g2.ancestry)::text)",
                  "Plans": (
                    {
                      "Node Type": "Seq Scan",
                      "Parent Relationship": "Outer",
                      "Parallel Aware": false,
                      "Relation Name": "geolocations",
                      "Alias": "geolocations",
                      "Startup Cost": 0,
                      "Total Cost": 13900.63,
                      "Plan Rows": 599763,
                      "Plan Width": 68
                    },
                    {
                      "Node Type": "Hash",
                      "Parent Relationship": "Inner",
                      "Parallel Aware": false,
                      "Startup Cost": 15600.65,
                      "Total Cost": 15600.65,
                      "Plan Rows": 20061,
                      "Plan Width": 12,
                      "Plans": (
                        {
                          "Node Type": "Aggregate",
                          "Strategy": "Hashed",
                          "Partial Mode": "Simple",
                          "Parent Relationship": "Outer",
                          "Parallel Aware": false,
                          "Startup Cost": 15400.04,
                          "Total Cost": 15600.65,
                          "Plan Rows": 20061,
                          "Plan Width": 12,
                          "Group Key": (
                            "(g2.ancestry)::text"
                          ),
                          "Plans": (
                            {
                              "Node Type": "Seq Scan",
                              "Parent Relationship": "Outer",
                              "Parallel Aware": false,
                              "Relation Name": "geolocations",
                              "Alias": "g2",
                              "Startup Cost": 0,
                              "Total Cost": 13900.63,
                              "Plan Rows": 599763,
                              "Plan Width": 12
                            }
                          )
                        }
                      )
                    }
                  )
                },
                {
                  "Node Type": "Seq Scan",
                  "Parent Relationship": "Member",
                  "Parallel Aware": false,
                  "Relation Name": "geolocations",
                  "Alias": "geolocations_1",
                  "Startup Cost": 0,
                  "Total Cost": 15400.04,
                  "Plan Rows": 566056,
                  "Plan Width": 68,
                  "Filter": "(location_type = 2)"
                }
              )
            }
          )
        }
      )
    },
    "JIT": {
      "Worker Number": -1,
      "Functions": 15,
      "Options": {
        "Inlining": false,
        "Optimization": false,
        "Expressions": true,
        "Deforming": true
      }
    }
  }
)

My question is, why does postgresql execute the OR query much slower?