Azure SQL database query performance degradation (cause of Query Store?)

I have a Azure SQL database (compatibility level = 150). I’m running into a performance problem executing a database query.

So after I’ve rebuilt all my indexes, the query runs fast. When testing, with different parameters, all of a sudden it slows down. The same query with the same parameters is fast in the beginning, and slow after I do around 50 tests.

Now when I purge the Query store data via the UI or via sql:

(ALTER DATABASE (mydb) SET QUERY_STORE CLEAR;)
the performance is back.

I’ve tried all settings on the Query Store, the problem remains, only deleting the data collected gives me back the initial performance.

Any help would be hugely appreciated.
Frank

applications – Cannot connect to database when publish app in play store

I am using SQL server for my mobile app but when doing beta testing, my app said it cannot connect to the database and when i check the report of my app, it says the error

android.database.sqlite.SQLiteException: no such table: server_preferences (code 1): , while compiling: SELECT * FROM server_preferences WHERE (name = ?) arrow_right_alt

It seems to be working on my android emulator but when i publish in play store, it gives me this error. Please help!

optimization – Should graphql api implementation select specific fields from database before resolving?

I recently had to optimize my graphql API server by selecting only specific fields from the database before returning the actual result to the client. For example, let’s say my graphql schema has the entity called Product:

type Product {
  name: String!
  details: String
  company: String
}

In addition to the 3 fields above there’re another 30 fields some of which are nested objects. I noticed that one of my resolvers took too long to return the result so as part of the optimisations I thought to myself: “If the client currently asks for only 5% of the total fields which are available for Product so maybe I should use projection in my query to select only the 5% of the possible fields”. (For an SQL-based database SELECT could be used similarly). But I’m wondering if conceptually it’s the right thing to do because one of the cool things about graphql is that you know you can select any field for some entity. But in this case I’m just using the trick that currently only 5% of the fields are requests. To clarify the only client using my API is myself (my company).

What do you think, is this OK?

postgresql – VERY slow lateral join on relatively small database

My database consists of apps and their reviews (schema below). I’m trying to answer the following question:

Given a series of dates from the earliest reviews.review_date to the latest reviews.review_date (incrementing by a day), for each date, D, which apps had the most reviews if the app’s earliest review was on or later than D?

This is the query that I’ve come up with to try and answer that question:

select
  review_windows.review_window_start,
  id,
  slug,
  review_total,
  earliest_review
from
  (
    select
      date_trunc('day', review_windows.review_windows) :: date as review_window_start
    from
      generate_series(
        (
          SELECT
            min(reviews.review_date)
          FROM
            reviews
        ),
        (
          SELECT
            max(reviews.review_date)
          FROM
            reviews
        ),
        '1 day'
      ) review_windows
    order by
      1 desc
  ) review_windows
  left join lateral (
    SELECT
      apps.id,
      apps.slug,
      count(reviews.*) as review_total,
      min(reviews.review_date) as earliest_review
    FROM
      reviews
      INNER JOIN apps ON apps.id = reviews.app_id
    where
      reviews.review_date >= review_windows.review_window_start
    group by
      1,
      2
    having
      min(reviews.review_date) >= review_windows.review_window_start
    order by
      3 desc,
      4 desc
    limit
      2
  ) apps_most_reviews on true;

It is extremely slow and I’m not sure why. If I want any kinds of results I use week instead of day in the generate_series call and even then that might take a minute or even longer.

Where should I start when debugging a performance issue like this?

Visualized query plan here

There are ~5K rows in apps and ~400K rows in reviews so it’s a mystery to me why this is taking so long.

Running the individual subquery that is run for each entry in the lateral join given a single date only takes 161 ms (below) and the subquery for generate_series only takes 4 ms. I’m clearly doing something very wrong. Any help would be much appreciated!

Individual subquery with an explicit date

SELECT
  apps.id,
  apps.slug,
  count(reviews.*) as review_total,
  min(reviews.review_date) as earliest_review
FROM
  reviews
  INNER JOIN apps ON apps.id = reviews.app_id
where
  reviews.review_date >= '2018-04-17'::date
group by
  1,
  2
having
  min(reviews.review_date) >= '2018-04-17'::date
order by
  3 desc,
  4 desc
limit
  2

apps

Schema

|   | column_name | data_type    | is_nullable | foreign_key |
|---|-------------|--------------|-------------|-------------|
| 1 | id          | int4         | NO          |             |
| 2 | name        | varchar(255) | NO          |             |
| 3 | slug        | varchar(255) | NO          |             |

Indexes

| index_name      | index_algorithm | is_unique | column_name |
|-----------------|-----------------|-----------|-------------|
| apps_slug_index | BTREE           | t         | slug        |
| apps_pkey       | BTREE           | t         | id          |

reviews

Schema

|   | column_name   | data_type    | is_nullable | foreign_key     |
|---|---------------|--------------|-------------|-----------------|
| 1 | id            | int4         | NO          |                 |
| 2 | rating        | int4         | NO          |                 |
| 3 | review_date   | date         | NO          |                 |
| 4 | reviewer_name | varchar(255) | NO          |                 |
| 5 | review_body   | text         | NO          |                 |
| 6 | app_id        | int4         | NO          | public.apps(id) |

Indexes

| index_name                  | index_algorithm | is_unique | column_name   |
|-----------------------------|-----------------|-----------|---------------|
| reviews_reviewer_name_index | BTREE           | f         | reviewer_name |
| reviews_review_date_index   | BTREE           | f         | review_date   |
| reviews_pkey                | BTREE           | t         | id            |
| reviews_app_id_index        | BTREE           | f         | app_id        |

database development – How do services distribute their servers around the globe while maintaining their whole dataset intact?

I’ve always been curious on how services such as google/youtube have multiple datacenters across the globe to serve requests faster to users while keeping their whole dataset structure intact. There has to be a “master database”, right? But then again if lets say the database is in the US and a server in Ireland is handling the request, the database speeds would be slow and it would be the equivelant of the user querying a US server from Ireland. Do they have a “dns lookup” type of strategy where there’s multiple instances of the database, queries closest databases to see if it has the data and if it does, cache it in the closest one. I might use this in the future once my application gets big enough, but i’m simply not sure how they manage to keep their database intact with multiple datacenters in different countries, while keeping the latency low.

database design – SQL Server Unique Constraint on two columns with an exception

Hi all and thanks for your advice.

Expense(SupplierID(Foreign Key), DocumentID(vchar))

I understand how to add a simple unique constraint on two columns. However, if DocumentID = ‘NA’, I would like to ignore the rules of the constraint.

Some suppliers in our system do not provide an invoice id, for example. Therefore, I leave the field NULL. I would like to remove all nulls for the field ‘DocumentID’ to avoid accounting for the NULLS in my client code.

I am new to SQL Server, but I could figure out how to do this using a trigger. The reason I’m asking here is to see if there is a better way to respond to this scenario or to avoid it by a different design.

Thanks!

postgresql – Is it a fundamentally flawed design concept to put everything in the same database (with different schemas) and only ever connect to that database?

I’ve long wrestled with an annoying problem:

If I put different “projects” in different databases, it immediately becomes a huge pain to connect to both, or to exchange information between them, or to do anything. The application code and my “mental picture” of the database is greatly simplified if I use a single database and instead use logical schemas to segment what one might at first thought put into their own databases.

I don’t do this to get around some sort of “database limit” or any other reason like that. I simply find it much better to do it this way, but have a constant nagging feeling at the back of my mind that this is “wrong”.

Since I have a bunch of “general data” tables, which I wouldn’t want to replicate into multiple databases, having them all in one actual database makes it much nicer to me to do it this way.

I’m interested in hearing (reasonable) criticism of this approach. Sometimes, I waste years doing something one way only to realize that there is some major reason not to do it which ends up biting me in the finger and then I have to redo a lot of work which seemed sensible at the time.

psycopg2 – Django not reconnecting to database?

I am using Django channels and once the database connection gets closed, it keeps on returning the following error for subsequent events in consumer:

Exception inside application: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

My question is that why is Django not trying to reconnect to database server automatically. If I have to handle it, where should I handle it?

Should I wrap the try except around the whole code and reconnect whenever this error occur.

I am using pgbouncer layer in between and my database config looks like:

'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'db_pgbouncer',
'USER': '<user_name>',
'PASSWORD': '<password>',
'HOST': '<ip>',
'PORT': '6432',
'CONN_MAX_AGE': None

oracle – How can I write procedures that I can loop over to insert a series of files into my database?

My goal:

Insert a series of csv files into my database by creating procedures for each individual table and then looping over them. My csv files will all be named very similar to this:

  • 1_to_be_inserted_into_table_1
  • 1_to_be_inserted_into_table_2
  • 1_to_be_inserted_into_table_3
  • 1_to_be_inserted_into_table_4
  • 2_to_be_inserted_into_table_1
  • 2_to_be_inserted_into_table_2
  • 2_to_be_inserted_into_table_3
  • 2_to_be_inserted_into_table_4
  • 3_to_be_inserted_into_table_1
  • 3_to_be_inserted_into_table_2
  • 3_to_be_inserted_into_table_3
  • 3_to_be_inserted_into_table_4

This is the pseudocode for the final loop where I’d like to reference all of my procedures:

CREATE OR REPLACE DIRECTORY all_the_data AS 'D:Favorites1. ProgrammingProjectsLYS_databaseDATA TO INPUT';

DECLARE @file_selector INT
SET @file_selector=1
    
BEGIN
    FOR files IN all_the_data LOOP 
        
        EXEC procedure_1 ((file_selector || 'to_be_inserted_into_table_1'|| '.csv')),
        EXEC procedure_2 ((file_selector || 'to_be_inserted_into_table_2'|| '.csv')),
        EXEC procedure_3 ((file_selector || 'to_be_inserted_into_table_3'|| '.csv')),
        EXEC procedure_4 ((file_selector || 'to_be_inserted_into_table_4'|| '.csv')),
        
    SET @file_selector= file_selector+1
    commit;

END;
/

QUESTION 1: What am I doing wrong with creating the procedure below? It worked perfectly fine to insert data into a table before I tried to make it a procedure.


CREATE OR REPLACE PROCEDURE INSERT_CPP 
    (file_name IN varchar2)

IS
    cpp_data VARCHAR(200) := 'D:Favorites1. ProgrammingProjectsLYS_database';

BEGIN
    insert into cpp
    SELECT * FROM EXTERNAL (
        (
      cpp VARCHAR2 (50),
      rfu1 NUMBER (6, 2),
      rfu2 NUMBER (6, 2),
      mean_rfu NUMBER (6, 2),
      charge_ph7_4 NUMBER (2),
      hydropathy NUMBER (3, 1))
    
        TYPE ORACLE_LOADER
        DEFAULT DIRECTORY (" || cpp_data || ")
        ACCESS PARAMETERS (
            RECORDS DELIMITED BY NEWLINE
            skip 1
            badfile (' || cpp_data || 'badflie_cpp.bad')
            FIELDS TERMINATED BY ','
            MISSING FIELD VALUES ARE NULL 
            ) 
        LOCATION (file_name)
        REJECT LIMIT UNLIMITED) ext
        where not exists (
            select * from cpp c
            where c.cpp = ext.cpp );
END;
/


I get an error:

5/5       PL/SQL: SQL Statement ignored
16/27     PL/SQL: ORA-00922: missing or invalid option
30/1      PLS-00103: Encountered the symbol "end-of-file" when expecting one of the following:     ; 

QUESTION 2. Is there a way to write a FOR files IN all_the_data LOOP in SQL? I tried this solution but the code from the first step wasn’t recognized as a command. I did

EXEC sp_configure 'show advanced options', 1

RECONFIGURE

and got

RECONFIGURE
Error report -
Unknown Command

QUESTION 3. Can i write commit; at the end of every loop so that if something goes wrong on the very last file it doesn’t rollback everything? Will that work?

sql server – Scalable database design review

I’m doing some research on how to scale our current SQL database and need some advice with a possible solution. The goal being able to handle more data and doing it in a way that performs well.

We have one MSSQL database that’s a couple TB in size, several hundred tables, some with billions of records. Everything is mostly normalized(3NF) with a few exceptions. Indexes are descent but we frequently run into SQL timeouts and really slow performance. Most of this is due IMO to have tables that are too large, incorrect indexes, and having to join on a ton of tables in queries.

We have really beefy hardware so I believe for our amount of data a lot can be leveraged with a new design.

My current plan is to first split the data into two types. Transactional and historical. Using orders as an example, the transactional order tables would have minimal indexes and be normalized but only contain orders until they can no longer be changed. I.e. refunds can’t happen anymore.

Once orders can no longer change, they would get denormalized and moved to historical tables with more indexing. The historical tables would eventually grow too large as well, I was thinking partitioning by year would help solve this.

To help prevent denormalization issues in code like data getting out of sync I would have an api layer of stored procs used to do CRUD operations from code instead of directly accessing tables.

I’m not sure if this is a practical approach or would even work at all.