data.table – how to get more efficient row counts and uniqueN in a very BIG dataset

I have got a very big dataset. It includes, say, three dimensions, i.e. firm ID, bank code, bank branches and distances of each firm to every bank branches.

I need to get these two stats:
(i) the number of banks, by identifying the uniqueN of bank code
(ii) the number of branches, by taking the sum of the number of bank branches

And I coded it like the following, but it’s tedious and extremely slow and miles from the reputation of data.table.

n <- 1e7 # a simplified sample
DTsample <- data.table(Dist = abs(rnorm(n)*15 + 30), 
                       BankCode = sample(1:200, n, replace = TRUE),
                       firmID = sample(1:5000, n, replace = TRUE)
                       )(, BankBranch := .GRP, by = BankCode)

# (i) calculate uniqueN of bank code
DTsample(, .(Dleq05 = uniqueN(BankCode(Dist <=  5)),
             Dleq15 = uniqueN(BankCode(Dist <= 15)),
             Dleq25 = uniqueN(BankCode(Dist <= 25)),
             Dleq35 = uniqueN(BankCode(Dist <= 35)),
             Dleq45 = uniqueN(BankCode(Dist <= 45))), by = firmID)

# (ii) calculate the sum of the number of bank branches
DTsample(, .(Nleq05 = sum(+(Dist <=  5)),
             Nleq15 = sum(+(Dist <= 15)),
             Nleq25 = sum(+(Dist <= 25)),
             Nleq35 = sum(+(Dist <= 35)),
             Nleq45 = sum(+(Dist <= 45))), by = firmID)

Is it possible to run faster? I ran something like that in my real data, and it takes days.

algorithms – Efficient way to concatenate strings to find specific split up values

I’m working with a list of objects (“Section”) that contains a string value among other fields. The strings in all of the sections can be concatenated to form the complete text. I am, however, wanting to implement a method that searches for a specific (given) value and returns the Section object(s) it found the value in. There are however two different scenarios as described below.

Value: restaurant

Scenario 1

  1. “John went to the restaurant to pick ” ✔️

  2. “up the food he ordered online before “

  3. “he went to his mother’s house.”

Current behaviour: Found in Section 1

Desired behaviour: Found in Section 1

Scenario 2

  1. “John went to the restaura”

  2. “nt to pick up the food he ord”

  3. “ered online before he went to his mother’s house.”

Current behaviour: Couldn’t find the value

Desired behaviour: Found in list(Section 1, Section 2)

Scenario 1 is easily solved as the value is found completely in section 1. In the 2nd scenario, the value is never found as it is split up between sections. I have written an algorithm that tries to stitch these sections together. It appears to work when the given value is simple e.g., “car” or “boat”. I’m having troubles with compound strings such as “the supermarket” or “a green flower”. I wrote the code a couple of months ago and tried debugging it, but even I fail to understand my own code here.

I deliberately left my code out of this post as it has rough hardcoded rules and has extreme levels of code complexity. I’d like to rethink the algorithm without looking at the current one to refrain from blatantly copying sections of the existing code.

The code is written in Java in case anyone knows some handy Java functions that might help.

modular arithmetic – A more efficient way to solve system of congruences?

I'm trying to solve the problem that given any $ x in mathbb {Z} / 3705 mathbb {Z} $, I want to find all $ x $ such that when mapped under the Chinese Remainder Theorem, is mapped to $ ( bar {4}, bar {7}) $ in $ mathbb {Z} / 57 mathbb {Z} times mathbb {Z} / 65 mathbb {Z} $.

Now I tried solving the system of congruences,
$$ x equiv 4 ( text {mod} 57) $$
$$ x equiv 7 ( text {mod} 65) $$

Then I was left with,
$$ 57k equiv 3 ( text {mod} 65) $$
where $ x = 4 + $ 57k. Now I could solve this by repeatedly adding (or subtracting) $ 65 $ onto $ 3 $, until I get a multiple of $ 57 $. However when I typed it into Wolfram, the first solution that comes up is when I add or subtract $ 78 $ and $ 38 $ times respectively, which isn't exactly efficient in a test scenario. So I was wondering if there is some faster method to solve this question / system of congruences. Or some systematic way to find the number of times I need to add or subtract $ 65 $.

ag.algebraic geometry – Efficient semi-group of a singular abelian surface

Let $ A $ to be a singular abelian surface on $ mathbb {C} $; that is to say an abelian surface of maximum Picard rank $ rho (A) = $ 4. By Shioda-Mitani, we know $ A cong E times E & # 39; $ or $ E, E & # 39; $ are isogenic elliptic curves with CM in an imaginary quadratic field $ mathbb {Q} ( sqrt {-d}) $. I do not know if this is standard terminology, but from the efficient semi-group, I mean the semi-group $ text {NS} ^ {+} (A) subset text {NS} (A) $ of integral points in the effective cone of $ A $.

We can take as a basis for $ text {NS} (A) $ the four classes $ v, h, Gamma, Gamma _ { text {CM}} $, or $ v, h $ are the vertical and horizontal classes $ E times E & # 39; $, $ Gamma $ is the graph of an isogeny between $ E, E & # 39; $, and $ Gamma _ { text {CM}} $ is the graph of the CM card. Obviously, we get efficient classes by taking non-negative integer linear combinations of these basic elements. however, $ text {NS} ^ {+} (A) $ is not definitively generated (see, page 1 of My questions are therefore:

  1. Do we have an understanding of the points of the network in $ text {NS} ^ {+} (A) $ which are not non-negative linear combinations of $ v, h, Gamma, Gamma _ { text {CM}} $? Has this been studied somewhere? There are endless points, but I really lack intuition for these.

  2. Given an explicit class in $ text {NS} (A) $, is there a useful way to determine when it is effective? Other than the fact that he must cross positively with an ample class. I have not heard of such a condition in general, but I hope this particular case may be easier.

python – PyQt 5 CheckBox State Efficient WAy

I think there is an error, this line is defined twice (line 28):


Therefore, the event onStateChange is triggered twice when you check check_box2, which is logical.

I think you made a copy and paste and the last one should be check_box3. But your naming conventions are not intuitive, give more to your objects significant names, otherwise how will you tell the difference with your code.

If what you want is mutually exclusive check boxes implementation could be simpler. Personally, I prefer to use the radio buttons as in simple HTML because it is more intuitive (it is immediately obvious that only one answer is allowed).

First approach: a generic method which makes a loop on the check boxes of your form and deselects them all except the sender. Then you can simplify the code and get rid of if/elif

Second approach: use the integrated functionality of QT. You can wrap your check boxes in a QButtonGroup container.

A rectangular box before the text label appears when a QCheckBox
the object is added to the parent window. Just like QRadioButton, it's also
a selectable button. Its current use is in a scenario where the user is
prompted to choose one or more of the available options.

Unlike radio buttons, check boxes are not mutually exclusive by
default. In order to restrict the choice to one of the available
items, check boxes should be added to QButtonGroup.


As mentioned earlier, checkBox buttons can be made mutually exclusive
adding them in the QButtonGroup object. = QButtonGroup(),1),2)

QButtonGroup , provides an abstract container for buttons and
has no visual representation. He emits buttonCliked() signal
and sends the reference of the Button object to the slot function btngroup().

Source: PyQt – QCheckBox widget

python – Efficient manipulation of numpy arrays to convert an identity matrix to a permutation matrix


I want to be able to generate the permutation matrix which divides a 1D array of consecutive numbers (i.e. even, odd, even, odd, even, odd, …) into a 1D array where the first half is the same and the second half is the rating. So (pair1, odd1, pair2, odd2, pair3, odd3) goes to (pair1, pair2, pair3, odd1, odd2, odd3).

For example, with N = 6, the permutation matrix would be:

M = array((1, 0, 0, 0, 0, 0),
          (0, 0, 1, 0, 0, 0),
          (0, 0, 0, 0, 1, 0),
          (0, 1, 0, 0, 0, 0),
          (0, 0, 0, 1, 0, 0),
          (0, 0, 0, 0, 0, 1))

You can verify that by multiplying this by M * array((0, 1, 2, 3, 4, 5)) = array((0, 2, 4, 1, 3, 5)).

My pseudocode approach

(Full code below.) Here's the mathematically correct way to generate this:

I = NxN identity matrix
for i in (0:N-1):
    if i < N/2:
        shift the 1 in row i by 2*i to the right
    if i >= N/2:
        shift the 1 in row i by 2*(i - N/2)+1 to the right

You can see how it works to generate M above.

Code (Python)

I am implementing the above pseudocode using the numpy array manipulation (this code can be copied and pasted):

import numpy as np

def permutation_matrix(N):
    N_half = int(N/2) #This is done in order to not repeatedly do int(N/2) on each array slice
    I = np.identity(N) 
    I_even, I_odd = I(:N_half), I(N_half:) #Split the identity matrix into the top and bottom half, since they have different shifting formulas

    #Loop through the row indices
    for i in range(N_half):
        # Apply method to the first half
        i_even = 2 * i #Set up the new (shifted) index for the 1 in the row
        zeros_even = np.zeros(N) #Create a zeros array (will become the new row)
        zeros_even(i_even) = 1. #Put the 1 in the new location
        I_even(i) = zeros_even #Replace the row in the array with our new, shifted, row

        # Apply method to the second half
        i_odd = (2 * (i - N_half)) + 1
        zeros_odd = np.zeros(N)
        zeros_odd(i_odd) = 1.
        I_odd(i) = zeros_odd

    M = np.concatenate((I_even, I_odd), axis=0) 

    return M

N = 8
M = permutation_matrix(N)

array(((1., 0., 0., 0., 0., 0., 0., 0.),
       (0., 0., 1., 0., 0., 0., 0., 0.),
       (0., 0., 0., 0., 1., 0., 0., 0.),
       (0., 0., 0., 0., 0., 0., 1., 0.),
       (0., 1., 0., 0., 0., 0., 0., 0.),
       (0., 0., 0., 1., 0., 0., 0., 0.),
       (0., 0., 0., 0., 0., 1., 0., 0.),
       (0., 0., 0., 0., 0., 0., 0., 1.)))

My problems

I have a feeling that there are more effective ways to do this. To summarize what I do for each matrix:

  1. Loop through the rows

  2. In each row, identify where the 1 must be moved to, call it idx

  3. Create a separate zero array and insert a 1 in the index idx

  4. Replace the row we are evaluating with our modified zero array

Is it necessary to divide the table in half?

Is there a Pythonic way to implement two different functions on two halves of the same array without dividing them?

Is there an approach where I can move the 1s without needing to create a separate array of zeros in memory?

Do I even have to run the lines?

Are there more efficient libraries than numpy for that?

usb – What is the fastest / most efficient way to transfer a 50 GB data folder from the SSD to the USB stick?

The details may be of some interest:

The 50 GB folder (with fragmented files of different types and sizes) is located on a mid-2015 2.8 GHz i7 MBP with 500 GB SSD.
The USB drive is a variety of 64 GB Lexar ™ garden

What parameters allow the best transfer?
Bluetooth on / off; Wifi on / off; Screen on / off; Right / left USB port; etc. Is there anything I should do first to prepare the files for better transfer. For example, I should defragment a hard drive before transferring data in the old days when computers were powered by steam.

… Or is it important since the bottleneck is the USB key?

c – How to analyze a more efficient command line entry?

in the main() based on the following code, i wrote a very naïve command analysis. Is there a more effective way?

This program aims to ADD, SUB, MUL, DIV these four arithmetic operations for integers. The entry would be in the following form: ADD 1 1, MUL 12 90. And there would also be a special input character % which means using the last result. For example, ADD 1 1 would return 2, then ADD % 1 would return 3.


typedef int (*calc_function)(int, int);

typedef struct operation {
        char *name;
        calc_function calculate;

} Operation;

int add(int a, int b) { return a+b; }
int sub(int a, int b) { return a-b; }
int mul(int a, int b) { return a*b; }
int divi(int a, int b) { return a/b; }

int calc(int a, int b, int (*c)(int, int)) {
        return c(a, b);

int main() {

        char *command = malloc(9);
        int result = 0;
        int a;
        int b;

        Operation ADD = {"ADD", add};
        Operation SUB = {"SUB", sub};
        Operation MUL = {"MUL", mul};
        Operation DIV = {"DIV", divi};

        Operation ops(4) = {ADD, SUB, MUL, DIV};

        while((command = fgets(command, 9, stdin)) != NULL) {

                for(int i = 0; i < 4; ++i) {
                        if (0 == strncmp(ops(i).name, command, 3)) {
                                command = strchr(command, ' ');
                                if (*command == '%') {
                                        a = result;
                                } else {
                                        sscanf(command, "%d", &a);
                                command = strchr(command, ' ');
                                if (*command == '%') {
                                        b = result;
                                } else {
                                        sscanf(command, "%d", &b);
                                result = ops(i).calculate(a, b);
                                printf("%dn", result);

        return 0;

In addition, any advice on improving the performance and style of this program would be greatly appreciated!

[WTS] Large VPS accounts with high availability and efficient support.

KVC hosting started in 2010 for the sole purpose of creating a hospitality business accessible to all. Since its creation, the company has grown steadily thanks to our practical business model, to the point that we have increased the size of our data center in order to meet the demands of our customers. Putting customers first is what drives us, which has won us multiple awards for providing quality technology at a great price. We focus on providing cost effective web hosting solutions, and our extensive industry experience ensures that you will receive high quality and cost effective web hosting options that can be customized to suit your needs. We are proud of the trust placed in us by our partners and customers and we invite everyone to try our business.
Be in charge of your own virtual private server (VPS hosting)! It will work as if you had your own machine.

Our virtual private servers are affordable and powerful, giving you your own mail server and better security control. SSD VPS Hosting is ideal for businesses that anticipate rapid growth, huge website traffic, or run highly dynamic and interactive websites.


Enterprise Cloud SSD VPS

100 GB SSD RAID 10
4 XEON CPU @ 2.13 GHz (8.52 GHz)
6TB premium bandwidth
$ 49.99 / M


300 GB SSD RAID 10
8 XEON CPU @ 2.13 GHz (17.0 GHz)
22 TB premium bandwidth
$ 169.99 / M



Disk space – 75 GB
Bandwidth – 200 GB
RAM – 4 GB
$ 25.99 / M


Disk space – 225 GB
Bandwidth – 3 TB
RAM – 10 GB
$ 79.99 / M

Have a question?

Do not hesitate to contact us: sales on

. (tagsToTranslate) webmaster forum (t) internet marketing (t) search engine optimization (t) web design (t) seo (t) ppc (t) affiliate marketing (t) search engine marketing ( t) website hosting (t) domain name (t) social media

c # – Fast efficient hash of a small byte array (length ~ 200) at Int32

I have a lot of small bytes (200) (± 200 bytes, around 300+ million instances) and I want to produce fast and efficient 32-bit hashes.

Looking around, it looks like three categories of hash functions:

  1. Hash string
  2. Hash byte () – MD5, SHA, etc .; but they produce relatively large hashes 128 bits + (vs 32 bits)
  3. XOR type functions developed at home with prime numbers thrown away for luck

So far, I haven't found anything that meets my needs. Any suggestion?