performance – Resolving contradictions in WFC more effectively than naive turning back

I recently started with the Wave Function Collapse Algorithm (WFC) in three-dimensional space. I made the basics work and now wanted to move on to let the algorithm automatically resolve any future contradictions / error states.

For this, I (inspired by this Oskar Stålberg's Tweet) has created a very small set of tiles that will encounter such contradictions quite often.

(Some examples of levels built in a grid (3x3x3) using this set of tiles.)

Note that this set of tiles, for example, will always run into a contradiction if the number of lower contacts is odd because each pipe that comes out must also end at the bottom. I have now implemented a basic backspace (naive ), which means that each time the algorithm encounters a contradiction, it reverts to the last decision taken cancels it and tries another. It works, however, because the possibility space for the placed modules can get huge, it can be resolved in a very slow generation by trying all possibilities before having gone back far enough to solve the modules that caused this contradiction. (Gif 1, Gif 2)

Now, I'm looking to optimize this algorithm.

I thought about using the jump back instead of the jump back with a jump distance increasing quadratically each time the same contradiction again (first time we go back by 1, then 2, then 4, 8, 16, …). But I don't know how to detect that the first contradiction has been exceeded so that I can then reset the jump distance.

I don't get either Oskar Stålberg & # 39; s solution to that. In the post comments, he talks about cutting out pieces to try to resolve the contradiction, but how does he calculate these pieces?

usability – How to motivate users to archive / delete to improve system performance?

I don't see anything possible that will motivate me to cross thousands (did I understand you correctly?) lists and archive some that I don't use. Looks like I need to hire someone to do this job for me all next month.

If I create 100 playlists on Apple Music, why the hell should I care about how the platform takes care of it? Why I need to delete / archive playlists, I need each of them.

As a developer, I can clearly see that you are trying to ignore a technical problem by getting users to solve it for you.

There are many ways, which do not require any attention from the user:

  • If the list is not changed / is not used, why would you recalculate it, huh? Ask yourself dev 🙂 It's 2020, we have lazy loading, caching, etc. ready to use for all major languages ​​and platforms.

  • If the list is not used for a long time, why not automatically archive it for the user?
    Put the icon (..zZZ) for such a list and have it restored if the user clicks on it. This may not happen instantly.

In my company, we suspend the entire account if it is not active for a certain period: the data disappears from the database, etc., so they do not consume resources but some disk space. But it is 100% transparent to the user, when the user visits his suspended account, he sees a message: "You have been inactive for a while. Please wait, we unlock your account", takes 10 -20 seconds to reactivate the acc.

I know for sure if I ask "you have been inactive for a while, can we archive your account?" most of them will choose "of course not !! even if i don't use it, i still need it."

5CloudHost Cloud Hosting, SSD storage protected by RAID, high performance! | NewProxyLists

Tired of paying huge monthly fees for slow web hosting?

5CloudHost.com offers incredibly fast loading speed for your website. We have powerful servers with 40 Intel Xeon CPU cores, 128 GB RAM, Raid-protected SSD storage and 10 Gbps Internet connectivity. We have a custom web server configuration powered by Apache with PHP7.3, mod_http2, php-fpm, mod_lsapi, opcache and MariaDB for Mysql databases, which will make any site open instantly. We are CloudFlare Optimized Partner, so you can activate Cloudflare CDN from your cPanel account in 1 click. CloudFlare allows any site to be as fast and secure as the Internet giants. We are using Storage protected against raids on SSD only, this will improve fault tolerance and read / write speed of servers (especially MySQL) and therefore your website will load at least 10 times faster. The servers that make up our Cloud are hosted in many data centers in the United States (Miami, FL and Dallas, TX) and in Europe (Manchester / UK, Frankfurt / DE, The Hague / NL and Bucharest / RO).

Our characteristics:

  • Daily backups
  • Free SSL certificates
  • Firewall and WAF
  • Protection against viruses and malware
  • CDN in 1 click (Cloudflare)
  • 1-click application installation (Softaculous)
  • Several versions of PHP
  • Powered by cPanel®
  • 30 day money back guarantee. No questions asked.

Get 5 years of super-fast web hosting at a fraction of what you pay at your current accommodation for 1 year only. Deploy your website on our super-fast, easy-to-use, secure and ultra-reliable cloud platform with corporate servers, powered by cPanel. Check-out Cloud hosting 5CloudHost.com packages:

Cloud Starter (personal websites with medium traffic)
1
Hosted domain
5 GB SSD Storage room
Dallas, TX Datacenter / Bucharest, RO Datacenter
Dedicated resources (LVE):
50% CPU(~ 1.4 GHz)
512 MB RAM
20 entry processes
Price – $ 47/5 years – ORDER NOW

Cloud Business (high traffic corporate sites – the most popular!)
10 domain hosted
15 GB SSD Storage room
Dallas, TX Datacenter / Bucharest, RO Datacenter
Dedicated resources (LVE):
100% CPU(~ 2.8 GHz)
1 GB RAM
50 entry processes
Price – $ 97/5 years – ORDER NOW

Cloud Enterprise (Enterprise Content Management – Better Value!)
Unlimited hosted domain
50 GB SSD Storage room
Dallas, TX Datacenter / Bucharest, RO Datacenter
Dedicated resources (LVE):
200% CPU(~ 5.6 GHz)
2 GB RAM
100 entry processes
Price – $ 147/5 years – ORDER NOW

Why choose us?

  • 450+ web applications with one-click installation – Your wordpress site is just a click away. There are also more than 450 other well-known web applications. However, we guarantee that not only will these apps work with our hosting, but even if you have a custom app.
  • Accepted payment methods – Flexible billing is one of the things that separates good hosting companies from mediocre companies, so we do our best to make your payment preferences acceptable on our site.
  • Easy to use control panel – cPanel is the most popular web control panel that helps you easily manage your website. It has automation tools designed to simplify the process of hosting a website.
  • Cloudflare plugin – CloudFlare allows any site to be as fast and secure as the Internet giants. Just look for the CloudFlare icon, choose the domain you want to activate, then click on the orange cloud. That's it!
  • CloudLinux OS – Each tenant is isolated in his own light virtualized environments and has his own dedicated resources. It also helps prevent abusive clients from affecting your performance.

Visit >>> 5cloudhost.com


Best regards,
5CloudHost Team

SQL server – How to improve performance in a complex SQL query

It is a mandatory request but the execution time is very high to improve the performance of the given request,

Select p.DepotId, P.ProgressiveRunKMDate As (Date), ST.TripNumber As TripCode, D.RunCode As SheduleNo, Cast((VT.DistanceTravelled/1000)as decimal(10,2)) As SheduleKM,
         (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
    Join ProgressiveRunKM P
      On DateOfOperation = P.ProgressiveRunKMDate
and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')
and P.DriverId is not Null and AllocationMode = 3) As CancelledKm,
(Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
From VehicleTripLog
  Where TripId = '' And VehicleCode is null) As (ExtraKms),
(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
From CasualContract C
Join ProgressiveRunKM P
     On C.Enddate = P.ProgressiveRunKMDate
    And (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
    And (Null is null or P.DepotId = 'DEP00304501804111758')) As CasualKm,
  (Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
  Join ProgressiveRunKM P
      On DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
   Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758'))) As EffectiveKm, A.DeadKm As DeadKm,
  ((Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' And VehicleCode is null)+(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758'))+(Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
 Join ProgressiveRunKM P  
  On ST.DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')))+(A.DeadKm)) As GrossKm,
  IsNull(F.Qty,0) As (HSD),
  ((Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' And VehicleCode is null)+(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758'))+(Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
 Join ProgressiveRunKM P  
  On ST.DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
  From CasualContract C
  Join ProgressiveRunKM P
On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')))+(A.DeadKm)/10) As Kmpl,
   Case when (((Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' And VehicleCode is null)+(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
  On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758'))+(Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
 Join ProgressiveRunKM P  
  On ST.DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')))+(A.DeadKm)/10))>3 Then 'A' When ((((Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' And VehicleCode is null)+(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758'))+(Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
 Join ProgressiveRunKM P  
  On ST.DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
     On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')))+(A.DeadKm)/10))>=2.5
      And (((Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' And VehicleCode is not null)+(Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758'))+(Cast((VT.DistanceTravelled/1000)as decimal(10,2)) - (Select  IsNull(Cast(Sum(R.Distance/1000) as decimal(10,2)),0)
From (Route) R
Join ScheduledTrip ST
  On R.RouteId = ST.RouteId
 Join ProgressiveRunKM P  
  On ST.DateOfOperation = P.ProgressiveRunKMDate
  and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
  and (Null is null or P.DepotId = 'DEP00304501804111758')
  and P.DriverId is not Null and AllocationMode = 3) + (Select  IsNull(Cast(Sum(DistanceTravelled/1000) as decimal(10,2)),0)
   From VehicleTripLog
  Where TripId = '' and DriverCode is null And VehicleCode is null) + (Select IsNull(Cast(Sum(C.RunKM/1000) as decimal(10,2)),0)
   From CasualContract C
Join ProgressiveRunKM P
 On C.Enddate = P.ProgressiveRunKMDate
    and (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10')
and (Null is null or P.DepotId = 'DEP00304501804111758')))+(A.DeadKm)/10))<3 ) Then 'B' Else 'C' End  As (ABC)
   From ProgressiveRunKM P
Join Vehicle V
    On V.VehicleId = P.VehicleId
     Join FuelLog F
       On F.VehicleId = P.VehicleId
Join ScheduledTrip S
  On S.VehicleId = V.VehicleId
Join Duty D
  On D.DutyCode = S.DutyCode
Join Attendance AD
  On S.DriverId = AD.EmployeeId  
Join (Shift) SF
       On D.ShiftId = SF.ShiftId
Join (Route) R
  On S.RouteId = R.RouteId  
Join ScheduledTrip ST
  On ST.RouteId = R.RouteId
Join VehicleTripLog VT
  On VT.TripId = ST.ScheduledTripId
 Join(
      Select R.RouteBlockId,(Case When RB.RouteCategory = 0 Then R.Distance Else 0 End) As DeadKm
    From (Route) R
    Join RouteBlock RB
  On R.RouteBlockId = RB.RouteBlockId
   -- Join (Corridor) C
  --On RB.CorridorId = C.CorridorId
Group By R.RouteBlockId, RB.RouteCategory, R.Distance
      ) A
On A.RouteBlockId = R.RouteBlockId
Where ((Null is null or P.DepotId = 'DEP00304501804111758')
        And (Null is Null Or D.RunCode = Null)
   And (P.ProgressiveRunKMDate Between '2019-05-01' And '2020-01-10'))
   Group By p.DepotId, D.RunCode, ST.TripNumber, VT.DistanceTravelled, P.ProgressiveRunKMDate, P.DriverId, V.RTONumber, A.DeadKm, F.Qty

MySQL slows performance when UPDATING with IN on large dataset

I need to update a main dataset through various sources which provide some of the existing records with some modifications, such as a new mobile phone number.

The execution time of each request is more than 10 hours.

Environment: MySQL 8, 8-core CPU, 32 GB of memory.

I have the following main table, it holds 3M atm records:

CREATE TABLE `contact_data` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `email_status` tinyint(3) unsigned DEFAULT '0',
  `mobile_phone` varchar(32) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `firstname` varchar(128) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `lastname` varchar(128) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `nickname` varchar(128) COLLATE utf8mb4_general_ci DEFAULT NULL,
  PRIMARY KEY (`email`),
  UNIQUE KEY `id` (`id`),
  KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

So far, I have tried to update in different ways. Most source tables have only 10,000 to 100,000 records. I have tried the same thing with MyISAM and "id" as the primary key.

Join:

UPDATE contact_data cd
LEFT JOIN (SELECT email, firstname FROM source2 WHERE firstname <> '' GROUP BY email ORDER BY id DESC) AS t2
ON cd.email = t2.email
SET cd.firstname = t2.firstname

Direct:

UPDATE contact_data SET mobile_phone = (SELECT phone FROM source1 WHERE email = contact_data.email ORDER BY id DESC LIMIT 1) 
WHERE mobile_phone IS NULL 

Live with record limit:

UPDATE contact_data SET mobile_phone = (SELECT phone FROM source1 WHERE email = contact_data.email ORDER BY id DESC LIMIT 1) 
WHERE mobile_phone IS NULL 
AND email IN (SELECT DISTINCT email FROM source1)

config:

innodb_buffer_pool_size = 16G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 10M

key_buffer_size = 512M

More adjustment of the configuration file would mainly bring a small improvement. I don't know if there is a better way to handle this or if I should move to another database.

performances – Word2vec implementation in Tensorflow 2.0 (performance optimization)

I want to implement word2vec using tensorflow 2.0
J & # 39; have prepared a set of data according to the skip-gramm model and I & # 39; have around. 18 million observations (target words and context words).

I used the following dataset for my goal:
https://www.kaggle.com/c/quora-question-pairs/notebooks

I have created a new dataset for the n-gramm model. I also used windows_size 2 and a number of jumps equal to 2 in order to create for each target word (like our entry) a context word (that's what I should predict). It looks like this:

target  context
  1        3
  1        1
  2        1 
  2       1222

Here is my code:

dataset_train = tf.data.Dataset.from_tensor_slices((target, context))
dataset_train = dataset_train.shuffle(buffer_size=1024).batch(64)

#Parameters:
num_words = len(word_index)#approximately 100000
embed_size = 300
num_sampled = 64
initializer_softmax = tf.keras.initializers.GlorotUniform()
#Variables:
embeddings_weight = tf.Variable(tf.random.uniform((num_words,embed_size),-1.0,1.0))
softmax_weight = tf.Variable(initializer_softmax((num_words,embed_size)))
softmax_bias = tf.Variable(initializer_softmax((num_words)))

optimizer = tf.keras.optimizers.Adam()


#As before, we are supplying a list of integers (that correspond to our validation vocabulary words) to the embedding_lookup() function, which looks up these rows in the normalized_embeddings tensor, and returns the subset of validation normalized embeddings.  
#Now that we have the normalized validation tensor, valid_embeddings, we can multiply this by the full normalized vocabulary (normalized_embedding) to finalize our similarity calculation:
@tf.function
def training(X,y):
    with tf.GradientTape() as tape:
        embed = tf.nn.embedding_lookup(embeddings_weight,X)
        loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(weights = softmax_weight, biases = softmax_bias, inputs = embed,
                                   labels = y, num_sampled = num_sampled, num_classes = num_words))
    variables = (embeddings_weight,softmax_weight,softmax_bias)  
    gradients = tape.gradient(loss,variables)
    optimizer.apply_gradients(zip(gradients,variables))


EPOCHS = 30
for epoch in range(EPOCHS):
    print('Epoch:',epoch)
    for X,y in dataset_train:
        training(X,y)  


#compute similarity of words: 
norm =  tf.sqrt(tf.reduce_sum(tf.square(embeddings_weight), 1, keepdims=True))
norm_embed = embeddings_weight/ norm
temp_emb = tf.nn.embedding_lookup(norm_embed,X)
similarity = tf.matmul(temp_emb,tf.transpose(norm_embed))

But the calculation of a single epoch lasts too long. Is it possible to improve the performance of my code in one way or another (i use google colab for code execution)

EDIT: this is a form of my train dataset

dataset_train


I followed the instructions in this guide: https://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/

performance – What is the optimal number of CPU intensive data mining jobs to run in parallel?

I have a heavy CPU data mining job that is parallelizable. The work is not heavy in IO, it is especially heavy in CPU calculation. I have a 32 core platform with 2 threads per core and hyperthreading.

My first question is whether it is optimal to run 32 or 64 jobs in parallel? My second question is, what level of performance per job will I expect when I go from 32 jobs in parallel to 64 jobs (assuming I / O is not a bottleneck)?

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               45
Model name:          Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
Stepping:            7
CPU MHz:             3092.334
CPU max MHz:         3500.0000
CPU min MHz:         1200.0000
BogoMIPS:            5386.65
Virtualisation:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            20480K
NUMA node0 CPU(s):   0-7,16-23
NUMA node1 CPU(s):   8-15,24-31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d

c ++ – Analysis of high performance multithreaded txt files

The analysis and processing of text files remains a common task. Often this is slow and can become a performance bottleneck if the data is large or if many files need to be processed.

Context

This is the next step in this question. As before, the "Yahtzee" programming challenge will serve as an illustrative example. Follow this link for more details, but the task is essentially:

  • Read the ~ 1MB file with about ~ 100,000 integers separated by line breaks (we use this repeated file x1,000 because we are too fast => ~ 900MB; see below).
  • Group them by hash card. We use the third party, ska::bytell_hash_map, instead of std::unordered_map, because it is about 2.5 times faster.
  • Find the group with the largest sum and return that sum.

In the first code review, I focused on:

  • How to read the file effectively => conclusion mmap (Linux specific, see the other question for details and alternatives).
  • How to analyze effectively => once we read the file effectively, the analysis becomes the bottleneck. The other question is gradually passed to the analysis of the analysis until a tight loop incrementing a pointer on the mmap file with a minimum of verification / connection.

The file format here is very simple. An integer per line ending in 'n'. We assume that the data is in ASCII and that no illegal character exists; only (0-9).

The test file we used initially is repeated 1000 times with: for i in {1..1000}; do cat yahtzee-upper-1.txt >> yahtzee-upper-big.txt ; done. This produces a file of about 900 MB, but with a very low cardinality (only 791 unique integers), which means that the hash part of the algorithm is not too much of a bottleneck. See below for the second part of the challenge with a file with a higher cardinality. With such a stripped code and mmap we were able to reach ~ 540 MB / s (1.7 seconds for this file) using a single thread.

(All times are for a Sandy Bridge i7-2600 processor with 16 GB of DDR3-1600 and an SSD of 500 MB / s, But note that we make sure the file is Cached OS so already in RAM. Therefore, the 500 MB / s is not really relevant.)

The next level

In order to go even faster, this question / code review focuses on multi-threading for analysis. It turns out that the algorithm is embarrassingly parallel if we cut the mmap and just let each thread build its own hash, then combine them at the end. This last step is not significant (certainly not for yahtzee-upper-big.txt file due to low cardinality).

The code is at the bottom. It can reach <400 ms for full read (cached), analysis, hash table and combination using 8 threads (this processor has 4 cores + HT). (Note the os::fs::MemoryMappedFile the class is mine RAII practical packaging for mmap). Scaling is not bad on 8 threads, but it ends visibly. (Note that a tight single wire loop turns a pointer across the mmap takes ~ 140 ms).

1 thread:  ~1700ms
4 threads: ~680ms
6 threads: ~437ms
8 threads: ~390ms

We tried to pin the processor (see code), but that didn't make much difference.

It's pretty good, even if I Welcome feedback on coding / competition style.

The real challenge for this approach comes when we use a file with higher cardinality (i.e. more unique integers and therefore a much larger hash table with more entries ).

Here is a small utility to generate a more difficult file:

#include 
#include 
#include 
#include 

int main(int argc, char* argv()) {
  if (argc < 2) return 1;

  auto gen = std::mt19937{std::random_device{}()};
  std::uniform_int_distribution dist(1'000'000, 2'000'000);

  auto file = std::ofstream{argv(1)}; // NOLINT

  for (std::size_t i = 0; i < 100'000'000; ++i) {
    file << dist(gen) << 'n';
  }

  return 0;
}

run like this to create a file of ~ 800 MB with a cardinality of 1,000,001.

./build/bin/yahtzee_gen yahtzee-upper-big2.txt

The performance of this file, although slightly smaller, is many slower: ~ 2.1s on 8 threads. The single thread performance is: 4.7 seconds showing poor scalability. perf stat reports this:

 Performance counter stats for 'build/bin/yahtzee_mthr yahtzee-upper-big2.txt':

         14,159.77 msec task-clock                #    6.044 CPUs utilized          
             2,789      context-switches          #    0.197 K/sec                  
                 8      cpu-migrations            #    0.001 K/sec                  
           169,114      page-faults               #    0.012 M/sec                  
    49,300,366,303      cycles                    #    3.482 GHz                      (83.26%)
    44,125,329,192      stalled-cycles-frontend   #   89.50% frontend cycles idle     (83.26%)
    39,070,916,691      stalled-cycles-backend    #   79.25% backend cycles idle      (66.68%)
    16,818,483,760      instructions              #    0.34  insn per cycle         
                                                  #    2.62  stalled cycles per insn  (83.36%)
     2,613,261,878      branches                  #  184.555 M/sec                    (83.47%)
        24,712,823      branch-misses             #    0.95% of all branches          (83.33%)

       2.342779054 seconds time elapsed

      13.755426000 seconds user
       0.412222000 seconds sys

Note the forward and reverse idle cycles. I suspect that this type of bad scalability may be typical when we are browsing a large file while creating a data structure of significant size from the input. A common use case?

Comments / ideas sought

  • Coding style and technique for simultaneous implementation.
  • Ideas / code / tuning suggestions to improve the multithreaded performance of the high cardinality file.

And finally here is the code:

#include "flat_hash_map/bytell_hash_map.hpp"
#include "os/fs.hpp"
#include 
#include 
#include 
#include 
#include 
#include 
#include 

using uint64 = std::uint64_t;
using map_t  = ska::bytell_hash_map;
// using map_t  = std::unordered_map;  // ~2.5x slower

// Set the given thread's affinity to be exclusively on the given logical CPU number.
// the hope is to keep the per physcial CPU core L1/L2 cache warm
// results are a bit mixed: 5-10% inconsistent gain, no evidence of loss
void pin_thread_to_cpu(std::thread& t, unsigned cpu_num) {
  cpu_set_t cpuset;
  CPU_ZERO(&cpuset);
  CPU_SET(cpu_num, &cpuset);
  int rc = pthread_setaffinity_np(t.native_handle(), sizeof(cpu_set_t), &cpuset);
  if (rc != 0) std::cerr << "Error calling pthread_setaffinity_np: " << rc << "n";
}

std::pair from_sv(std::string_view sv) {
  return std::make_pair(sv.data(), sv.data() + sv.size());
}

std::string_view to_sv(const char* const begin, const char* const end) {
  return std::string_view{begin, static_cast(end - begin)};
}

void parse(std::string_view buf, map_t& map) {
  auto (begin, end) = from_sv(buf);
  const char* curr  = begin;
  uint64      val   = 0;
  while (curr != end) {
    if (*curr == 'n') {
      map(val) += val;
      val = 0;
    } else {
      val = val * 10 + (*curr - '0');
    }
    ++curr;
  }
}

std::vector chunk(std::string_view whole, int n_chunks, char delim = 'n') {
  auto (whole_begin, whole_end) = from_sv(whole);
  auto chunk_size = std::ptrdiff_t{(whole_end - whole_begin) / n_chunks};
  auto chunks = std::vector{};
  const char* end    = whole_begin;
  for (int i = 0; end != whole_end && i < n_chunks; ++i) {
    const char* begin = end;
    if (i == n_chunks - 1) {
      end = whole_end; // always ensure last chunk goes to the end
    } else {
      end = std::min(begin + chunk_size, whole_end);   // NOLINT std::min for OOB check
      while (end != whole_end && *end != delim) ++end; // NOLINT ensure we have a whole line
      if (end != whole_end) ++end;                     // NOLINT one past the end
    }
    chunks.push_back(to_sv(begin, end));
  }
  return chunks;
}

uint64 yahtzee_upper(const std::string& filename) {
  auto     mfile     = os::fs::MemoryMappedFile{filename};
  unsigned n_threads = std::thread::hardware_concurrency();

  auto chunks  = chunk(mfile.get_buffer(), n_threads); // NOLINT
  auto threads = std::vector{};
  auto maps    = std::vector{n_threads, map_t{}};

  for (unsigned i = 0; i < n_threads; ++i) {
    threads.push_back(std::thread(parse, chunks(i), std::ref(maps(i)))); // NOLINT
    pin_thread_to_cpu(threads(i), i);
  }

  for (auto&& t: threads) t.join();

  uint64 max_total = 0;
  auto   final_map = map_t{};
  for (auto&& m: maps) {
    for (auto p: m) {
      uint64 total = final_map(p.first) += p.second;
      if (total > max_total) max_total = total;
    }
  }
  return max_total;
}

int main(int argc, char* argv()) {
  if (argc < 2) return 1;
  std::cout << yahtzee_upper(argv(1)) << 'n'; // NOLINT
  return 0;
}

2013 – Improved performance of creating / copying files and folders (using code) in our large document library

We have a local SharePoint 2013 team site and we have a document library that contains around 29,000 +++ items, and these files are structured under around 4,000 folders and subfolders. now inside our server side event receiver, i use the following method to copy files and folders, as follows: –

private void copyfiles(SPItemEventProperties properties, SPListItem currentItem, string from, string to)
        {
    try
     {
      SPFolder folder = properties.Web.GetFolder(properties.Web.ServerRelativeUrl + "/Shared Documents/" + currentItem("ReferenceNumber") + "https://sharepoint.stackexchange.com/" + from + "https://sharepoint.stackexchange.com/");
      SPDocumentLibrary currentDL = (SPDocumentLibrary)properties.Web.GetList(properties.Web.ServerRelativeUrl + "/Shared Documents/");
      SPListItem softemplete = null;
      TaxonomyFieldValue taxFieldValue = currentItem("CustomerName") as TaxonomyFieldValue;

      var label = taxFieldValue.Label;
      var titlewithoutspecialchar = currentItem("Title").ToString().Replace("~", " ").Replace(""", " ").Replace("https://sharepoint.stackexchange.com/#", " ").Replace("%", " ").Replace("&", " ").Replace("*", " ").Replace(":", " ")
              .Replace("<", " ").Replace(">", " ").Replace("?", " ").Replace("https://sharepoint.stackexchange.com/", " ").Replace("\", " ").Replace("{", " ").Replace("}", " ").Replace("|", " ").Replace(".", " ");
       if (folder.ItemCount > 0)
          {
            SPList list = properties.Web.GetList(properties.Web.ServerRelativeUrl + "/Shared Documents/");
            SPQuery query = new SPQuery();
            query.Folder = folder;
            SPListItemCollection listitem = list.GetItems(query);
              foreach (SPListItem i in listitem)
               {
                 if ((i.Name.ToLower().Contains(currentItem("ReferenceNumber").ToString().ToLower() + " -") &&
                    (i.Name.ToLower().Contains("sof") || i.Name.ToLower().Contains("pof") || i.Name.ToLower().Contains("qo"))

                     )
                       ||
                     i.Name.ToLower().Contains("request for approval"))
                   {
                     softemplete = i;
                       if (softemplete != null)
                          {
                            byte() fileBytes = softemplete.File.OpenBinary();
                            string destUrl = properties.Web.ServerRelativeUrl + "https://sharepoint.stackexchange.com/" + currentDL.RootFolder.Url + "https://sharepoint.stackexchange.com/" +
                             currentItem("ReferenceNumber") + "https://sharepoint.stackexchange.com/" + to + "https://sharepoint.stackexchange.com/" + currentItem("ReferenceNumber") + " - " + label + " - " + titlewithoutspecialchar.Trim() + ".xlsx";
                             if (i.Name.ToLower().Contains("request for approval.oft"))
                                {
                                    destUrl = properties.Web.ServerRelativeUrl + "https://sharepoint.stackexchange.com/" + currentDL.RootFolder.Url + "https://sharepoint.stackexchange.com/" +
                                   currentItem("ReferenceNumber") + "https://sharepoint.stackexchange.com/"+to+"/Request for Approval.oft";
                                }
                              else if (i.Name.ToLower().Contains("request for approval.xlsm"))
                                {
                                    destUrl = properties.Web.ServerRelativeUrl + "https://sharepoint.stackexchange.com/" + currentDL.RootFolder.Url + "https://sharepoint.stackexchange.com/" +
                                   currentItem("ReferenceNumber") + "https://sharepoint.stackexchange.com/" + to + "/Request for Approval.xlsm";
                                }
                                try
                                {
                                    SPFile destFile = currentDL.RootFolder.Files.Add(destUrl, fileBytes, false);
                                }
                                catch
                                {
                                }
                            }

                        }
                    }
                }
            }
            catch
            {
            }


        }

Now this method takes enough time to finish copying (sometimes around 10 seconds), although I use the CAML query to find only the files in the desired folder usingquery.Folder = folder;. so can anyone provide advice on improving this method? as i think it is already set to the maximum, is this correct?

Performance Optimization – How To Speed ​​Up This SparseArray Construction

This code slows down (obviously) with the increasing number of trials, which currently last more than 5 hours. This stylized example takes 2.5 minutes. This is not entirely illustrative as the real situation is much more sparse and less spread out but I would not know how to condense this in the example below.

How can I speed it up? the First section defines the entrance, a set of two-dimensional tiles in a 4-dimensional space (5 racks and 20 shelves are assigned via a random discrete number, where slabs are defined by horStart, horEnd, verStart and verEnd – also at random – which define the coordinates of the borders via a two-dimensional table. The idea is to stack individual cells in these arrays and measure stack height.

rack = RandomInteger(5, trials);
shelf = RandomInteger(20, trials);
horStart = RandomInteger(100, trials);
verStart = RandomInteger(200, trials);
horEnd = 
 RandomChoice({.8, .15, .04, .01} -> {0, 10, 100, 200}, trials) + 
  horStart;
verEnd = 
 RandomChoice({.8, .15, .04, .01} -> {0, 10, 100, 200}, trials) + 
  verStart;

The whole is then assigned to input, which is a List of Integer with length 6, identifier rack and shelf space, and how many individual cells are absorbed by the slabs.

input = {rack, shelf, horStart, verStart, horEnd, verEnd}(Transpose);

the sparseArray must be initialized with a value for dimensions.

reach = Max /@ {rack, shelf, horEnd, verEnd};
sa2 = SparseArray({}, reach, 0);

the second section allocate that to a SparseArray where the number of overlaps is counted to identify the height of the stacks.

The area of ​​interest is:

sa2=SparseArray({{ra_, sh_, hor_, ver_} /; 
     Apply(Or, 
      Or(And(ra == #1, sh == #2, Between(hor, {#3, #5}), 
          Between(ver, {#4, #6})) & @@@ input)) :> (sa2((ra, sh, 
       hor, ver)) + 1)}, reach))

which is at the bottom of what follows integrated code.

AbsoluteTiming(
sa = Module({rack, shelf, horStart, verStart, horEnd, verEnd, 
     trials = 1000, input, reach, sa2},

(*first section: define rack, shelf and slab sizes to stack*)

rack = RandomInteger(5, trials);
shelf = RandomInteger(20, trials);
horStart = RandomInteger(100, trials);
verStart = RandomInteger(200, trials);
horEnd = 
 RandomChoice({.8, .15, .04, .01} -> {0, 10, 100, 200}, trials) + 
  horStart;
verEnd = 
 RandomChoice({.8, .15, .04, .01} -> {0, 10, 100, 200}, trials) + 
  verStart;
reach = Max /@ {rack, shelf, horEnd, verEnd};
input = {rack, shelf, horStart, verStart, horEnd, verEnd}(Transpose);

(* second section, allocate to SparseArray, measure size of stacks *)

sa2 = SparseArray({}, reach, 0);
sa2 = SparseArray({{ra_, sh_, hor_, ver_} /; 
     Apply(Or, 
      Or(And(ra == #1, sh == #2, Between(hor, {#3, #5}), 
          Between(ver, {#4, #6})) & @@@ input)) :> (sa2((ra, sh, 
       hor, ver)) + 1)}, reach));)


Out (* 140 *)