probability distributions – why inverse of CDF generates the samples of PDF

I programmed the “inverse transform sampling” according to its wikipage. It sounds amazing:
Given a PDF:
$$
p(x)
$$

we can generate the samples by
$$
s= F^{-1}(r)
$$

where $rin (0,1)$ is a uniform distribution and F(x) denotes the CDF of p(x).

But why? Could someone explain it to me?

postgresql – How to generates VALUES literal expression using a query?

During development and testing VALUES literal expressions are useful because they enable you to store data definition in your SQL query.

WITH foobar(foo, bar) AS (
  VALUES
  (1::integer,'a'::text),
  (2,'b'),
  (3,'c'),
  (4,'d')
)
SELECT * FROM foobar;

However this can become tedious when trying implement really wide or big table.
And it’s even more frustrating when this could have been generated from an existing table.


So I there a way to output rows in format easily copy/paste-able as a literal VALUES expression?

The closest I could come by is to output row as record
(please note the meta because this SQL actually try to reverse engineer literal VALUES back from a given literal VALUES).

WITH foobar(foo, bar) AS (
  VALUES
  (1::integer,'a'::text),
  (2,'b'),
  (3,'c'),
  (4,'d')
)

SELECT foobar::record FROM foobar;

Here is the psql output:

 foobar
--------
 (1,a)
 (2,b)
 (3,c)
 (4,d)
(4 rows)

However this need extra editing to rightfully quote,type and escape content so should i look for a formatting output trick or an SQL trick?

address – How is it possible that the same WIF Private Key generates two different addresses?

I am wondering if you can explain this.

I went here:
https://coinb.in/#newSegWit

Created the SegWit Address below:

3ASaGJ8h2bLn6Jha3hncBzXUKRCLyfv9bk

RedeemScript

037fa484e4b870082298d6acca0702a48714323664c647604e6461d4739feb0d9b

Public key

00147a201aa7e1cfb799dcfc2b447a0238421ae2fa60

Private key (WIF key)

KzVonapu3Cf7DaMpaqkUQ9tpsBQPBH7Yk6tXJdvMwuJSy97GK7dc

Then I opened Electrum v3.1.3 and Imported WIF Pkey above.

And I got this address instead:

1C8ju39MFSPXNqtYDhsuB3Ek45L46dBr3G

So now there are two addresses, one SegWit (or I think P2SH) starting with 3… and the legacy address from Electrum starting with 1…

Appreciate an explanation of this.

Thank you.

Resources for creating JavaScript library that generates HTML Components?

Edit: I see that this question has received a couple downvotes so far. Could someone please explain why they think that is? So I can try prevent it in the future?

I want to create a JavaScript library that can create HTML components through JavaScript method calls. Libraries like chart.js or sigma.js are good examples of the kind of library I desire to create.

Do libraries like chart.js and sigma.js implement their visual libraries with no prior blueprint, and not follow a “best practice” for similar libraries that generate HTML components? Or are there known optimal methods/structures to creating a JavaScript library that generates HTML?

If you know of any resources, would you mind sharing them? I was not able to find resources that discussed this specific question. One thing I could do is study these libraries. However, they are large libraries and I wanted to see if there were other resources I can checkout first. I predict my library will be much simpler in functionality and smaller in scale to libraries like chart.js and sigma.js.

Why do I get AutoSave Clipping.txt files auto generates on my desktop screen

I don’t know how but I get to see a lot of auto clipping files being generated on my desktop screen.
This is how my desktop looks
enter image description here

windows 10 – Why the same call instruction in assembly generates different machine code?

I am learning about stack overflows and its exploitation , but during my journey i’m stuck!

I am trying to generate shellcode for a call instruction, but it seems that same call instructon to the system() function generates different machine code equivalent.

In my program, I tried calling the system function twice eg: system("notepad");

The first call generates the following machine code: E8 DE100000

The second call to the same system function generates the following machine code: E8 D2100000

We can see that the machine code generated is different and it doesn’t look like an address to the system function.

This is preventing me from hard-coding the call instruction as shellcode as there is no static code for the call. Am I unable to do it?

Why does this happen?
Is there any way around it?

I am using windows 10

public key infrastructure – Certificate Authority generates private key for Extended Validation code signing certificate?

My company upgraded to an Extended Validation code signing security certificate, which was delivered via mail on a physical USB key, called a “token.” The token contains a private key and the digital certificate, both generated by the Certificate Authority (CA). I was surprised that the CA created the private key. It is my understanding that private keys should never be shared with a third party, including a CA. I’m used to the Certificate Signing Request (CSR) process, where a company keeps its private key private and only shares its public key.

My question: What security concerns are there with a private key being generated and initially owned by (in possession of) a Certificate Authority? Is this standard practice for EV certificates delivered on a physical token? We are told that the private key only exists on the token and there are no other copies.

Perhaps I’m missing the point. Maybe it’s more about establishing trust with a CA, and therefore we should also trust that the private key was handled correctly and that we have the only copy (E.g., why do business with them if we don’t trust them). At the same time, alerts go off because a third party had our private key. I realize that it might not be practical to create a token unless the private key is present, so maybe it’s inevitable that the CA possesses it at some point.

Fresh Catalina install (10.15.6) generates flood of diskarbitrationd and deleted daemon messages. How to stop them?

Since doing a fresh install of Catalina (10.15.6) last week, my machine’s Console has been flooded with messages from the deleted daemon and diskarbitrationd. The deleted daemon is consuming significant CPU time (30%). Together there are about 100 messages per second being logged in the console at peak times, between the two processes.

The machine has a 2TB drive. I see the drive as 2 volumes, one I created and the other with the same name with ” – Data” appended. I’m unfamiliar with why there’s this 2-volume configuration with the newer installations, but I am starting to suspect that this is related to this problem.

In the Console display, I see some entries like the following image. The highlighted one is the only one that contains further details in the lower panel.

Detail from Console

Other sections of the Console entries look more like this:

enter image description hereThere is no useful information in the console entries for the disarbitrationd messages: it just says <private> in the lower panel.

I’m interesting in resolving the underlying problem but would also welcome guidance as to how I might at least temporarily block the massive CPU consumption of the deleted daemon as a temporary measure.

python – Calculating text similarity, filtering the results, and reshaping matrix into a list-of-tuples generates MemoryError

I have a constantly growing corpus of currently ~36,000 documents (growing daily) and I want calculate the similarity between each pair of documents. After calculating the similarity scores, I want to filter the results to only include scores above 0.9, and capture the (row label, column label, metric) in a list of tuples that can be written to a CSV file.

The process that I currently have works as intended, but as my corpus of documents has continued to grow, I’m now receiving a MemoryError when trying to process my actual dataset.

Error generated with my actual dataset:

2020-08-10 10:37:08,933 There are 35845 records loaded
2020-08-10 10:37:19,570 Completed pre-processing all documents
2020-08-10 10:38:05,458 Completed calculating similarity
Traceback (most recent call last):
  File "/home/curtis/project/text_similarity.py", line 97, in <module>
    scores = get_scores(pairwise_similarity, doc_keys, threshold=0.9)
  File "/home/curtis/project/text_similarity.py", line 61, in get_scores
    arr(np.tril_indices(arr.shape(0), -1)) = np.nan
  File "/home/curtis/miniconda3/envs/scraper-dev/lib/python3.7/site-packages/numpy/lib/twodim_base.py", line 868, in tril_indices
    return nonzero(tri(n, m, k=k, dtype=bool))
  File "<__array_function__ internals>", line 6, in nonzero
  File "/home/curtis/miniconda3/envs/project-dev/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1896, in nonzero
    return _wrapfunc(a, 'nonzero')
  File "/home/curtis/miniconda3/envs/project-dev/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
    return bound(*args, **kwds)
MemoryError: Unable to allocate 9.57 GiB for an array with shape (642414090, 2) and data type int64

Existing process text_similarity.py with sample data:

import csv
import logging
import os
import string

import numpy as np
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.datasets import fetch_20newsgroups


class clean_document:
    """Class to pre-process documents"""

    def __init__(self, input_string, stopwords):
        self.input_string = input_string
        self.string_lower = self.lower_string()
        self.string_no_emoji = self.drop_emoji()
        self.string_no_punct = self.remove_punction()
        self.tokens = self.tokenizer()
        self.tokens_no_stopwords = self.remove_stopwords(stopwords)

    def lower_string(self):
        string_lower = self.input_string.lower()
        return string_lower

    def drop_emoji(self):
        """Thanks to https://stackoverflow.com/a/49986645"""
        no_emoji = self.string_lower.encode("ascii", "ignore").decode("ascii")
        return no_emoji

    def remove_punction(self):
        """Thanks to https://stackoverflow.com/a/266162"""
        no_punct = self.string_no_emoji.translate(str.maketrans("", "", string.punctuation))
        return no_punct

    def tokenizer(self):
        tokens = word_tokenize(self.string_no_punct)
        return tokens

    def remove_stopwords(self, stopwords):
        no_stopwords = (line for line in self.tokens if line not in stopwords)
        return no_stopwords


def calc_pairwise_similarity(corpus):
    """Thanks to https://stackoverflow.com/a/8897648"""
    vect = TfidfVectorizer(min_df=1)
    tfidf = vect.fit_transform(corpus)
    pairwise_similarity = tfidf * tfidf.T

    return pairwise_similarity


def get_scores(pairwise_similarity, doc_keys, threshold=0.9):
    """Extract scores into a list-of-tuples"""
    arr = pairwise_similarity.toarray()
    arr(arr <= threshold) = np.nan
    np.fill_diagonal(arr, np.nan)
    arr(np.tril_indices(arr.shape(0), -1)) = np.nan
    idx = (~np.isnan(arr)).nonzero()
    vals = arr(idx).tolist()
    keys = list(zip(idx(0).tolist(), idx(1).tolist()))
    output = ((x(0)(0), x(0)(1), x(1)) for x in list(zip(keys, vals)))
    final = ((doc_keys(line(0)), doc_keys(line(1)), line(2)) for line in output)

    return final


# MAIN PROGRAM

# set up basic logging
logging.basicConfig(format="%(asctime)s %(message)s", level=logging.INFO)

# load the dataset
newsgroups_train = fetch_20newsgroups(subset='train')
documents = {i: line for i, line in enumerate(newsgroups_train('data'))}
logging.info(f"There are {len(documents)} records loaded")

# define the stop words to use
stop_words = set(stopwords.words("english"))

# process the original strings and create a cleaned corpus
corpus = ()
for line in documents.values():
    x = clean_document(line, stop_words)
    corpus.append(x.string_no_punct)
logging.info("Completed pre-processing all documents")

# calculate pairwise similaritry
pairwise_similarity = calc_pairwise_similarity(corpus)
logging.info("Completed calculating similarity")

# get similiarity metrics
doc_keys = list(documents.keys())
scores = get_scores(pairwise_similarity, doc_keys, threshold=0.9)
logging.info("Extracted similarity scores")

# write scores to CSV file
with open("scores.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(scores)
logging.info("Successfully wrote metrics to file")