hashing – Is it ok to use redis scan extensively?

In redis docs, it is stated that keys command should not be used in production, since it blocks other processes while executing, it is better to use scan iteration over all keys with some batch size.
I’ve read in docs that redis use hash index, so I assume it can’t use it for range queries like scan and keys.

But our system is done in such a way that we need to use scans extensively. Is it ok, could it decrease the performance of hash queries significantly?

network – Why do infected hosts scan port 80?

I have my serve set up with AbuseIPDB so I can keep a log of all suspicious request and hacking attempts. I noticed an IP that sent a GET request to port 80.

177.154.28.130 - - (07/Jan/2021:12:07:20 -0500) "GET / HTTP/1.1" 200 326 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"

I had a look at the other requests and noticed the IP with attempting to connect to telnet ports, typical behavior of an infected host. Why did this IP send a request to / on port 80? How does this help the botnet grow?

linux – Is there a program to scan and correct names of many files and avoid errors and failure during copying?

It seems that the problem reported in this question about Mac and Windows not being able to access some files and folders created in Linux is due to bad characters hidden somehow in their names (see also the commentaries thereunder). I personally think that such errors happened during copying, but that the bad results affect less Linux than the rest of the operating systems. See the end of my answer under linked question about a second event.

Trying to do the copying on a Mac the process was again stopped with an error saying something like some file couldn’t be read or written. I guess it’s the same type of error.

I have to start all over again and I guess I should do it in Linux, where I have many tools at my disposal and can most easily identify an error when it happens.

I guess a good solution for me would be a program (preferably on Linux or Windows) that would be able to identify and fix those names (and eliminate those ‘phantom’ characters).

networking – Unable to ping scan my machine (but I can ping)

I need to perform an nmap scan on all of the (responding) IPs in an array of VLANs.

The command is nested in a PowerShell for loop and looks something like this:

nmap.exe -Pn -T4 -A -oG (FILE).txt -oX (FILE).xml "$($subnet).0/24"

We had to add the -Pn flag as the command wasn’t picking up all of the hosts in each VLAN with a ping scan.

The issue is that we believe the command is taking too long and so we would like to only scan the hosts that respond to the ping scan.

I found one machine that responds to a simple ping (MACHINE) however running nmap -sn -Pn (MACHINE) results in Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn

I’m not sure why I am able to ping the machine but nmap isn’t? Running nmap in an Administrator PowerShell console does not resolve the issue.

I ran netsh advfirewall firewall add rule name="ICMP Allow incoming V4 echo request" protocol=icmpv4:8,any dir=in action=allow on the problem machine in order to allow ICMP traffic but this did not work either.

This is driving me crazy. Is anyone able to help me with this?

Thanks,

TheCube

nmap – Unable to ping scan my machine (but I can ping)

I need to perform an nmap scan on all of the (responding) IPs in an array of VLANs.

The command is nested in a PowerShell for loop and looks something like this:

nmap.exe -Pn -T4 -A -oG (FILE).txt -oX (FILE).xml "$($subnet).0/24"

We had to add the -Pn flag as the command wasn’t picking up all of the hosts in each VLAN with a ping scan.

The issue is that we believe the command is taking too long and so we would like to only scan the hosts that respond to the ping scan.

I found one machine that responds to a simple ping (MACHINE) however running nmap -sn -Pn (MACHINE) results in Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn

I’m not sure why I am able to ping the machine but nmap isn’t? Running nmap in an Administrator PowerShell console does not resolve the issue.

I ran netsh advfirewall firewall add rule name="ICMP Allow incoming V4 echo request" protocol=icmpv4:8,any dir=in action=allow on the problem machine in order to allow ICMP traffic but this did not work either.

This is driving me crazy. Is anyone able to help me with this?

Thanks,

TheCube

Why is this query on JSON values in SQL Server using an index scan instead of an index seek?

I have a table with the following schema:

CREATE TABLE (dbo).(Obj1Json)(
    (ObjectID) (int) IDENTITY(1,1) NOT NULL,
    (PointerToSource) (nvarchar)(255) NULL,
    (CreateDate) (datetime2)(7) NOT NULL,
    (ModifiedDate) (datetime2)(7) NOT NULL,
    (Indexes) (nvarchar)(max) NULL,
    (vAccountID)  AS (json_value((Indexes),'$.AccountID'))

where the Indexes column contains JSON that looks like:

{
  "AccountID": 73786,
  "AccountName": "5869b4e9-f441-463f-8f6d-93b4f4ff8c75",
  "ProcessLocation": "Start",
  "IsPasswordProtected": true,
  "InvoiceDate": "2020-12-30T09:00:32.8473077-05:00"
}

The vAccountID column is used for an index on the JSON in Indexes:

CREATE INDEX IDX_Obj1Json_AccountID
ON Obj1Json(vAccountID)

The table has about 10.5 million rows of randomly generated data (all data in the Indexes column has the same structure, however). The query I’m running is

SELECT JSON_VALUE(Indexes, '$.AccountID')
  FROM (Obj1Json)
  WHERE JSON_VALUE(Indexes, '$.AccountID') = 69725

which returns 110 results. When looking at the execution plan in SSMS, I see that an index scan is being used, whereas I would expect an index seek to be used instead.

Query plan of the above query, showing an index scan being used

The query returns quickly (under a second), but I’m curious why an index seek isn’t being used. Is there any obvious reason why SQL Server is using an index scan instead of a seek?

mobile application – UX when showing multiple QR codes, but making sure people don’t accidental scan the wrong one

We have created a poster with the most ordered products for our customers. This also shows QR and barcodes for each product which they can scan with their phone (and our app) to re-order the product with ease. The following user-flow happens:

  1. User opens app on phone (scan function is in the start-screen).
  2. User holds phone towards poster and scans with eyes for the correct product.
  3. Phone has already detected a QR code on the poster and opens link, or asks the user to open the link.
  4. The user is annoyed because they can’t scan the correct product right away since the phone keeps detecting other QR codes.

Does anyone have experience with this scenario? And how did they solve it?
Sadly I can’t change the layout of the poster since another department has already printed it.

One of the solutions I have in mind is letting the phone ‘scan’ every QR code it can get a lock on, but only giving feedback that it had found a valid QR code. Not opening it or using it.
If the phone is directed at the correct product the user would still have to tap to scan again and then open the product. The drawback being that there are a lot of clicks.

mobile application – How do I create a good user experience when showing multiple QR codes, but making sure people don’t accidental scan the wrong one?

For our customers we have created a poster with their most ordered products. This also shows QR and barcodes (for each product) they can scan with their phone (and our app) to re-order the product with ease. But as you can imagine the following user-flow happens:

  1. User opens app on phone (scan function is the start-screen).
  2. User holds phone towards poster and scans with eyes for the correct product
  3. Phone has already detected a QR code on the poster and opens links, or asks to open link.
  4. User is annoyed because he can’t scan the correct product right away because the phone keeps detecting other QR codes.

Does anyone have experience with this scenario? And how did they solve it?
Sadly I can’t change the layout of the poster since a other department has already printed it…

One of the solutions I had in mind was letting the phone ‘scan’ every QR code it can get a lock on, but only giving feedback that it had found a valid QR code. Not opening it or using it.
If the phone is directed at the correct product the user would still have to tap to scan again and then open the product. It sounds like a lot of clicks…

sql server – Remote Scan when updating using functions

I’m not 100% sure other than the difference in execution plan is enough for it to dumbly switch to a Remote Scan instead of Remote Query (maybe because of the order of events with the SET operator and using a function on the value being set).

What if you wrote the query like this instead, what does the execution plan show?

UPDATE LINKEDSERVER1.database1.dbo.table1 WITH(ROWLOCK)
SET number = 0
WHERE accounts = '123'
    AND number IS NULL

Another trick I’ve found to work in enforcing a Remote Query operator is first pulling the clustered index columns of the records that are needed to be updated into a local Temp Table, then joining to the remote table by that local Temp Table as part of the UPDATE to filter down the records instead and ensure a Remote Query occurs.

Example:

SELECT ClusteredIndexColumn
INTO #TempTable1
FROM LINKEDSERVER1.database1.dbo.table1 -- if you have a local copy of table1 here that'd be even better
WHERE accounts = '123'
    AND number IS NULL

UPDATE T1
SET number = 0
FROM LINKEDSERVER1.database1.dbo.table1 AS T1 WITH(ROWLOCK)
INNER JOIN #TempTable1 AS TT1
    ON T1.ClusteredIndexColumn = TT1.ClusteredIndexColumn

postgresql – Postgres 11: Query plan uses seq scan after upgrade

the situation

We have a database hosted on RDS with a few hundred tables, a few of which are quite large. The tables have the following (simplified) structure.

                    Table "public.flights"
 Column       | Type                     | Modifiers | Storage | Stats target 
--------------+--------------------------+-----------+---------+--------------
 uuid         | uuid                     | not null  | plain   |              
 date_created | timestamp with time zone | not null  | plain   |              

Indexes:
    "flights_pkey" PRIMARY KEY, btree (uuid)


                    Table "public.passengers"
 Column    | Type                   | Modifiers                     | Storage | Stats target
-----------+------------------------+-------------------------------+---------+-------------
 id        | bigint                 | not null default nextval(...) | plain   |             
 flight_id | uuid                   | not null                      | plain   |             
 name      | character varying(128) | not null                      | plain   |             

Indexes:
    "passengers_pkey" PRIMARY KEY, btree (id)
    "passengers_a08cee2d" btree (flight_id)
Foreign-key constraints:
    "p_flight_id_75a46b87233dc365_fk_flights_uuid" FOREIGN KEY (flight_id) REFERENCES flights(uuid) DEFERRABLE INITIALLY DEFERRED

The flights table has approx 17 million rows.
The passengers table has approx 2.6 billion rows.

We recently upgraded the database from 9.5.22 to 11.8 and performance is significantly degraded.

After upgrading, we ran VACUUM ANALYZE on the instance (as opposed to ./analyze_new_cluster.sh as we’re unable to run a shell on RDS instances).

This has not helped the situation. I spun up another standalone 11.8 instance of the database and ran a VACUUM FULL ANALYZE, and that database exhibits the same query planner behavior so including FULL in the VACUUM command did not help (as is suggested in some SO answers).

We have found one query that shows the most drastic change in performance before and after the upgrade:

SELECT f.uuid, p.name
FROM flights f 
LEFT OUTER JOIN passengers p 
    ON f.uuid = p.flight_id 
WHERE f.uuid IN (< UUIDs >)
ORDER BY f.date_created ASC;

Previously P95 latency was under 4 ms. Now, P95 is 15 seconds.

the trouble arises when the number of UUIDs in the WHERE clause includes ~10 or more UUIDs.

the execution plans

postgres 9.5 instance (with 50 UUIDs in the WHERE clause)

 Sort  (cost=7273695.73..7273707.45 rows=4688 width=36) (actual time=0.420..0.420 rows=0 loops=1)
   Sort Key: f.date_created
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop Left Join  (cost=1652.68..7273409.89 rows=4688 width=36) (actual time=0.408..0.408 rows=0 loops=1)
         ->  Index Scan using flights_pkey on flights f  (cost=0.56..428.86 rows=50 width=24) (actual time=0.406..0.406 rows=0 loops=1)
               Index Cond: (uuid = ANY ('{2c0adac6-79bb-48a1-a0ba-bd8f537d68de,...,a6605812-9a5b-46c4-9989-4d24d195e1c0}'::uuid()))
         ->  Bitmap Heap Scan on passengers p  (cost=1652.12..145082.56 rows=37706 width=28) (never executed)
               Recheck Cond: (f.uuid = flight_id)
               ->  Bitmap Index Scan on passengers_a08cee2d  (cost=0.00..1642.70 rows=37706 width=0) (never executed)
                     Index Cond: (f.uuid = flight_id)
 Planning time: 0.289 ms
 Execution time: 0.479 ms
(12 rows)

postgres 11 instance (with 50 UUIDs in the WHERE clause)

 Gather Merge  (cost=3149109.16..3149552.99 rows=3804 width=36) (actual time=3880.756..3882.219 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=3148109.14..3148113.89 rows=1902 width=36) (actual time=3878.194..3878.194 rows=0 loops=3)
         Sort Key: f.date_created
         Sort Method: quicksort  Memory: 25kB
         Worker 0:  Sort Method: quicksort  Memory: 25kB
         Worker 1:  Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=745.27..3148005.54 rows=1902 width=36) (actual time=3878.170..3878.170 rows=0 loops=3)
               ->  Parallel Seq Scan on flights f  (cost=0.00..669647.32 rows=21 width=24) (actual time=3878.167..3878.168 rows=0 loops=3)
                     Filter: (uuid = ANY ('{2c0adac6-79bb-48a1-a0ba-bd8f537d68de,...,a6605812-9a5b-46c4-9989-4d24d195e1c0}'::uuid()))
                     Rows Removed by Filter: 5631600
               ->  Bitmap Heap Scan on passengers p  (cost=745.27..117695.86 rows=32120 width=28) (never executed)
                     Recheck Cond: (f.uuid = flight_id)
                     ->  Bitmap Index Scan on passengers_a08cee2d  (cost=0.00..737.24 rows=32120 width=0) (never executed)
                           Index Cond: (f.uuid = flight_id)
 Planning Time: 0.286 ms
 Execution Time: 3882.262 ms
(18 rows)

My Best Assessment

In both scenarios, the scans on the passengers table are not executed. This is actually because the UUIDs I provide to the query did not exist in the flights table. I merely wanted to pass in a larger number to trigger the different behavior on how to scan the flights table.

In the postgres 9.5 instance, it performs an index scan with an index condition, as it expects 50 rows (the number of UUIDs I provide to the query) and returns none (as none of them existed)

In the postgres 11 instance, it wants to perform a sequential scan (in parallel) on the table with a filter. The filter essentially removes all rows returned by the sequential scan(s).

When there are less than 10 UUIDs passed to the WHERE clause, the postgres 11 instance generates the same index scan query plan as that used on the postgres 9.5 instance. Again, that points to statistics in taht it estimates costs differently, however for what I checked, those statistics appeared similar in both instances (unless I am not pulling the right values, which is very likely).

I have read many SO answers about “bad queries”, but they don’t address what I think may be a result of a major version upgrade.

I’ve checked the default_statistics_target for each database (both are 100) and the random_page_cost (both are 4).

I recognize that setting enable_seqscan to OFF is not a permanent solution, however it does coerce the postgres 11 instance to return a query plan identical to that of the postgres 9.5 instance.

-- on pg 11 instance with enable_seqscan = OFF

 Sort  (cost=5901559.44..5901570.85 rows=4566 width=36)
   Sort Key: f.date_created
   ->  Nested Loop Left Join  (cost=745.83..5901281.90 rows=4566 width=36)
         ->  Index Scan using flight_pkey on flight f  (cost=0.56..428.99 rows=50 width=24)
               Index Cond: (uuid = ANY ('{2c0adac6-79bb-48a1-a0ba-bd8f537d68de,...,a6605812-9a5b-46c4-9989-4d24d195e1c0}'::uuid()))
         ->  Bitmap Heap Scan on passengers p  (cost=745.27..117695.86 rows=32120 width=28)
               Recheck Cond: (f.uuid = flight_id)
               ->  Bitmap Index Scan on passengers_a08cee2d  (cost=0.00..737.24 rows=32120 width=0)
                     Index Cond: (f.uuid = flight_id)
(9 rows)

I’m reaching the point where I’m stabbing in the dark and was attempting to compare the pg_stats values for the uuid column in the flights table. Both of them show similar values for the null_frac, avg_width, n_distinct, and correlation values.

My Question

Given the above, what am I missing to help the postgres query planner avoid the expensive sequential scan?

All settings and statistics appear to be the same between the two instances, only the postgres version.

The 9.5 instance does not have any columns with stats targets that differ from the default. So before someone suggests to increase that value, why would that help the postgres 11 instance if the postgres 9.5 instance produces a “good” plan without them?

Is there something about postgres 11 (parallel workers?) that makes it think it can perform the sequential scan faster than the index scan? This seems unlikely given that the planner expects to return 21 rows but at a huge cost

Parallel Seq Scan on flights f  (cost=0.00..669647.32 rows=21

Thanks.