## algorithms – Need help understanding Knuth’s proof that: The set of all pure words is well-ordered by the relation >

In the paper linked below by Knuth and Bendix, theorem 1:

The set of all pure words is well-ordered by the relation ‘$$>$$

has a proof that I can’t seem to follow despite staring at it all day. I can’t even decipher what the notation is trying to represent.

So, we have four fixed sequences:

• An infinite sequence of variables $$v_1, v_2, v_3, …$$ etc. (not relevant for Theorem 1)
• A finite sequence of operators $$f_1, f_2, f_3, …, f_n$$
• A finite sequence of degrees $$d_1, d_2, d_3, .., d_n$$
• (i.e. degree of $$f_4 = d_4$$)
• and a finite sequence of weights $$w_1, w_2, w_3, …, w_n$$
• (i.e. weight of $$f_4 = w_4$$)

A word is then defined inductively as either a variable alone or an operator followed by an appropriate number of other words depending on the degree of said operator.

A ‘pure word’ is a word with no variables and the weight of a pure word, $$alpha$$ is defined as follows:

• $$w(alpha) = sum_j w_jn(f_j , alpha)$$ where $$n(f_j, alpha)$$ is the number of occurrences of $$f_j$$ in the word $$alpha$$

Finally the relation $$>$$ that orders all pure words that is proved to be well-ordered is defined as:

$$alpha > beta$$ if and only if:

1. $$w(alpha) > w(beta)$$
• (I’m assuming the ‘$$>$$‘ here is strict integer inequality and not the same ‘$$>$$‘ that is being defined.)
1. $$w(alpha) = w(beta)$$ and $$alpha = f_jalpha_1 … alpha_{dj}$$, $$beta = f_kbeta_1 … beta_{dk}$$, and either:
• 2a) $$j > k$$; or
• 2b) $$j = k$$ and $$alpha_1 = beta_1,…, alpha_{t-1} = beta_{t-1}, alpha_t > beta_t$$, for some $$t, 1 leq t leq d_j$$

Finally on page 266, for the proof that $$>$$ is well-ordered the exposition states:

Now let $$alpha$$ be a pure word with $$n_j$$ symbols of degree $$d_j$$. It is easy to prove inductively that:

$$n_0 + n_1 + n_2 + … = 1 + 0.n_0 + 1.n_1 + 2.n_2 + …,$$

i.e. $$n_0 = 1 + n_2 + 2n_3 + …$$

I don’t know whether it really is ‘easy’ to prove the above because I don’t really know what the above is.

I’m fairly sure symbols are just operators in this context of pure words but if $$n_0, n_1,$$ etc. are symbols then how are they being summed? Does $$n_0$$ in the above actually represent $$w(n_0)$$. That doesn’t seem to provide any clarity as to what is going on with the proof.

I’ve reread the first four pages of the paper multiple times and nothing seems to be getting any clearer:

https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/knuth-bendix.pdf

Any insight would be appreciated.

## algorithms – What is O(…) and how do I calculate it?

### What are the asymptotic functions? What is an asymptote, anyway?

Given a function f(n) that describes the amount of resources (CPU time, RAM, disk space, etc) consumed by an algorithm when applied to an input of size of n, we define up to three asymptotic notations for describing its performance for large n.

An asymptote (or asymptotic function) is simply some other function (or relation) g(n) that f(n) gets increasingly close to as n grows larger and larger, but never quite reaches. The advantage of talking about asymptotic functions is that they are generally much simpler to talk about even if the expression for f(n) is extremely complicated. Asymptotic functions are used as part of the bounding notations that restrict f(n) above or below.

(Note: in the sense employed here, the asymptotic functions are only close to the original function after correcting for some constant nonzero factor, as all the three big-O/Θ/Ω notations disregard this constant factors from their consideration.)

### What are the three asymptotic bounding notations and how are they different?

All three notations is used like this:

f(n) = O(g(n))

where f(n) here is the function of interest, and g(n) is some other asymptotic function that you are trying to approximate f(n) with. This should not be taken as an equality in a rigorous sense, but a formal statement between how fast f(n) grows with respect to n in comparison to g(n), as n becomes large. Purists will often use the alternative notation f(n) ∈ O(g(n)) to emphasize that the symbol O(g(n)) is really a whole family of functions that share a common growth rate.

Big-ϴ (Theta) notation states an equality on the growth of f(n) up to a constant factor (more on this later). It behaves similar to an `=` operator for growth rates.

Big-O notation describes an upper-bound on the growth of f(n). It behaves similar to a `≤` operator for growth rates.

Big-Ω (Omega) notation describes a lower-bound on a growth of f(n). It behaves similar to a `≥` operator for growth rates.

There are many other asymptotic notations, but they do not occur nearly as often in computer science literature.

Big-O notations and its ilk are often as a way to compare the time complexity.

### What is time complexity?

Time complexity is a fancy term for the amount of time T(n) it takes for an algorithm to execute as a function of its input size n. This can be measured in the amount of real time (e.g. seconds), the number of CPU instructions, etc. Usually it is assumed that the algorithm will run on your everyday von Neumann architecture computer. But of course you can use time complexity to talk about more exotic computing systems, where things may be different!

It is also common to talk about space complexity using Big-O notation. Space complexity is the amount of memory (storage) required to complete the algorithm, which could be RAM, disk, etc.

It may be the case that one algorithm is slower but uses less memory, while another is faster but uses more memory. Each may be more appropriate in different circumstances, if resources are constrained differently. For example, an embedded processor may have limited memory and favor the slower algorithm, while a server in a data center may have a large amount of memory and favor the faster algorithm.

### Calculating Big-ϴ

Calculating the Big-ϴ of an algorithm is a topic that can fill a small textbook or roughly half a semester of undergraduate class: this section will cover the basics.

Given a function f(n) in pseudocode:

``````int f(n) {
int x = 0;
for (int i = 1 to n) {
for (int j = 1 to n) {
++x;
}
}
return x;
}
``````

What is the time complexity?

The outer loop runs n times. For each time the outer loop runs, the inner loop runs n times. This puts the running time at T(n) = n2.

Consider a second function:

``````int g(n) {
int x = 0;
for (int k = 1 to 2) {
for (int i = 1 to n) {
for (int j = 1 to n) {
++x;
}
}
}
return x;
}
``````

The outer loop runs twice. The middle loop runs n times. For each time the middle loop runs, the inner loop runs n times. This puts the running time at T(n) = 2n2.

Now the question is, what is the asymptotic running time of both functions?

To calculate this, we perform two steps:

• Remove constants. As algorithms increase in time due to inputs, the other terms dominate the running time, making them unimportant.
• Remove all but the largest term. As n goes to infinity, n2 rapidly outpaces n.

They key here is focus on the dominant terms, and simplify to those terms.

T(n) = n2 ∈ ϴ(n2)
T(n) = 2n2 ∈ ϴ(n2)

If we have another algorithm with multiple terms, we would simplify it using the same rules:

T(n) = 2n2 + 4n + 7 ∈ ϴ(n2)

The key with all of these algorithms is we focus on the largest terms and remove constants. We are not looking at the actual running time, but the relative complexity.

### Calculating Big-Ω and Big-O

First off, be warned that in informal literature, “Big-O” is often treated as a synonym for Big-Θ, perhaps because Greek letters are tricky to type. So if someone out of the blue asks you for the Big-O of an algorithm, they probably want its Big-Θ.

Now if you really do want to calculate Big-Ω and Big-O in the formal senses defined earlier, you have a major problem: there are infinitely many Big-Ω and Big-O descriptions for any given function! It’s like asking what the numbers that are less than or equal to 42 are. There are many possibilities.

For an algorithm with T(n) ∈ ϴ(n2), any of the following are formally valid statements to make:

• T(n) ∈ O(n2)
• T(n) ∈ O(n3)
• T(n) ∈ O(n5)
• T(n) ∈ O(n12345 × en)
• T(n) ∈ Ω(n2)
• T(n) ∈ Ω(n)
• T(n) ∈ Ω(log(n))
• T(n) ∈ Ω(log(log(n)))
• T(n) ∈ Ω(1)

But it is incorrect to state T(n) ∈ O(n) or T(n) ∈ Ω(n3).

### What is relative complexity? What classes of algorithms are there?

If we compare two different algorithms, their complexity as the input goes to infinity will normally increase. If we look at different types of algorithms, they may stay relatively the same (say, differing by a constant factor) or may diverge greatly. This is the reason for performing Big-O analysis: to determine if an algorithm will perform reasonably with large inputs.

The classes of algorithms break down as follows:

• Θ(1) – constant. For example, picking the first number in a list will always take the same amount of time.

• Θ(n) – linear. For example, iterating a list will always take time proportional to the list size, n.

• Θ(log(n)) – logarithmic (base normally does not matter). Algorithms that divide the input space at each step, such as binary search, are examples.

• Θ(n × log(n)) – linear times logarithmic (“linearithmic”). These algorithms typically divide and conquer (log(n)) while still iterating (n) all of the input. Many popular sorting algorithms (merge sort, Timsort) fall into this category.

• Θ(nm) – polynomial (n raised to any constant m). This is a very common complexity class, often found in nested loops.

• Θ(mn) – exponential (any constant m raised to n). Many recursive and graph algorithms fall into this category.

• Θ(n!) – factorial. Certain graph and combinatorial algorithms are factorial complexity.

### Does this have anything to do with best/average/worst case?

No. Big-O and its family of notations talk about a specific mathematical function. They are mathematical tools employed to help characterize the efficiency of algorithms, but the notion of best/average/worst-case is unrelated to the theory of growth rates described here.

To talk about the Big-O of an algorithm, one must commit to a specific mathematical model of an algorithm with exactly one parameter `n`, which is supposed to describe the “size” of the input, in whatever sense is useful. But in the real world, inputs have much more structure than just their lengths. If this was a sorting algorithm, I could feed in the strings `"abcdef"`, `"fedcba"`, or `"dbafce"`. All of them are of length 6, but one of them is already sorted, one is reversed, and the last is just a random jumble. Some sorting algorithms (like Timsort) work better if the input is already sorted. But how does one incorporate this inhomogeneity into a mathematical model?

The typical approach is to simply assume the input comes from some random, probabilistic distribution. Then, you average the algorithm’s complexity over all inputs with length `n`. This gives you an average-case complexity model of the algorithm. From here, you can use the Big-O/Θ/Ω notations as usual to describe the average case behavior.

But if you are concerned about denial-of-service attacks, then you might have to be more pessimistic. In this case, it is safer to assume that the only inputs are those that cause the most amount of grief to your algorithm. This gives you a worst-case complexity model of the algorithm. Afterwards, you can talk about Big-O/Θ/Ω etc of the worst-case model.

Similarly, you can also focus your interest exclusively to the inputs that your algorithm has the least amount of trouble with to arrive at a best-case model, then look at Big-O/Θ/Ω etc.

## algorithms – Maximum weighted subgraph

Problem: Given a graph G with weights on nodes as well as edges. All weights are positive and integers. Find an induced subgraph H with maximum total weight per node, i.e., the goal is to maximize $$big( sum_{e in H}w(e) + sum_{v in H} w(v) big)$$/(number of vertices in H).

If you allow negative weights, then it is NP-hard (easy reduction from Clique). I have a strong feeling that the above is not NP-Hard.

I would love to have some first-thoughts on the above, if not a full-fledged optimal algorithm. Does it seem to be NP-Hard? What is the closest known problem — my thoughts were Prize Collection, Clique — but both seem much harder.

## algorithms – How to implement a dirty flag on a complex object hierarchy?

I’ve got an object with a complex hierarchy, something like:

``````Document
Paragraphs
Paragraph
Words
Word
``````

and I need to implement a dirty flag on the `Document` object. In general, there can be a huge number of Paragraphs and Words, and paragraphs and words have many properties that can be changed. So I’m having trouble figuring out where to capture state change so I know when the document is dirty. I could:

• Give every single Paragraph a reference back to the Document object, and give every single Word a reference to its containing Paragraph, and then in every property change, walk up the hierarchy and tell the document that it’s dirty. But that’s a lot of references and hooks on property changes.
• Find any place in the application that changes any paragraph or word property, and have that code also tell the document that it’s dirty. But then it would be hard to know that I had found all such places, and I’d have to make sure that any future document changes also made sure to do this, which seems hard to maintain.

Is there a better design pattern for this situation?

## algorithms – Time complexity of finding median in data stream

I was reading a solution to the problem in the title on leetcode and the article says that the time complexity of the following solution is O(n)

1. setup a data structure to hold stream value and insert new element in the right place using linear search or binary search
2. return median

my solution is as follows:

``````class MedianFinder:

def __init__(self):
"""
"""
self.arr = ()

def addNum(self, num: int) -> None:
idx = bisect.bisect_left(self.arr, num)
self.arr.insert(idx, num)

def findMedian(self) -> float:
# self.arr.sort()
if len(self.arr) % 2 != 0:
return self.arr(len(self.arr)//2)
else:
return (self.arr(len(self.arr)//2 -1) + self.arr(len(self.arr)//2 ))/2

``````

My question is about the time complexity of the push method. the binary search will take O(log n) to find the index. the insert will take O(n). but since the method will be called on the stream, will the complexity be O(n^2) or O(n) ?

## algorithms – Do the following two CFGs describe the same language

Do the following two CFGs describe the same language

1. S → aS | bS | ε
2. S → aS | Sb | ε

Would the answer to this be no, because the order can’t be switched? bS and Sb are different. I’m a bit confused about how I would find out if they described the same language.

## algorithms – How could a bank software intend to transfer \$82.56 but result in a error transfer of \$1,205,619.56

algorithms – How could a bank software intend to transfer \$82.56 but result in a error transfer of \$1,205,619.56 – Software Engineering Stack Exchange

## algorithms – How to prove that one problem belongs to class P?

Is there any typical proving method when proving that one problem belongs to class P?

For example, when proving that

The problem of finding n to the kth power is the P problem. (Each multiplication can be done in unit time)

If you present an algorithm that can solve this problem with $$O (log n)$$, can it be a proof?

## algorithms – Is this radix sort implementation in-place or out-place?

Below is the implementation of Radix Sort in python taken from here

``````def countingSort(arr, exp1):
n = len(arr)
# The output array elements that will have sorted arr
output = (0) * (n)
# initialize count array as 0
count = (0) * (10)
# Store count of occurrences in count()
for i in range(0, n):
index = (arr(i) / exp1)
count(int(index % 10)) += 1
# Change count(i) so that count(i) now contains actual
# position of this digit in output array
for i in range(1, 10):
count(i) += count(i - 1)
# Build the output array
i = n - 1
while i >= 0:
index = (arr(i) / exp1)
output(count(int(index % 10)) - 1) = arr(i)
count(int(index % 10)) -= 1
i -= 1
# Copying the output array to arr(),
# so that arr now contains sorted numbers
i = 0
for i in range(0, len(arr)):
arr(i) = output(i)

# Find the maximum number to know number of digits
max1 = max(arr)
# Do counting sort for every digit. Note that instead
# of passing digit number, exp is passed. exp is 10^i
# where i is current digit number
exp = 1
while max1 / exp > 0:
countingSort(arr, exp)
exp *= 10
# This code is contributed by Mohit Kumra
# Edited by Patrick Gallagher
``````

Generally, we can implement algorithms in-place and out-of-place. However, I can’t tell whether the above implementation is out-of-place.