C++ STL: How does the distance() method work for a set/ multiset (stored internally as a self balancing tree)?

The complexity of the distance function depends on the type of the iterators supplied: in general it only required to take linear time in the distance but, in the special case in which the input iterators are random access iterators, the worst-case running time is linear. (I believe this is accounting for the time spent in the function in itself, and assumes that the time needed to advance iterators is constant).

The C++ specification do not mandate any particular implementation as long as it conforms to the required complexity so your question cannot be answered without inspecting the particular implementation that you are using.
However, just to convey the intuition, here are two possible implementations that would conform to the requirements:

  • Given random access iterators $x$ and $y$, distance($x$, $y$) returns $y-x$.
  • For general iterators increment $x$ until it equals $y$. Return the number of increments performed.

The type std::set does not return a random access iterator, therefore std::distance can take linear time and the second implementation above can be used. Now your question reduces to “how can the standard library iterate over the elements of a std::set in sorted order?”

The answer to this question depends once again on the implementation as there is no particular data structure mandated by the standard to implement std::set.
Since you mention red-black trees, which are a special kind of BSTs, this can easily be done by noticing that the order of iteration coincides with the order in which the vertices of a BST are visited by an in-order visit.

Notice that the concept of distance completely abstracts from the data structure used to store the elements of the set. Instead it only refer to the positions in which two elements appear when using an iterator to access the collection’s contents (in the case of std::set, the elements appear in sorted order).

version control – Best practice for organizing build products of dependencies and project code in your repo source tree?

I’ve checked quite a few related questions on source tree organization, but couldn’t find the answer for my exact need:

For a project I’m working on, my source tree is organized this way

  • build: all build scripts and resources required by continuous integration
  • src: all first-party source code and IDE projects of our team
  • test: all the code and data required for automated tests
  • thirdparty: all external dependencies
    • _original_: all downloaded open-source library archives
    • libthis: unzipped open-source lib with possible custom changes
    • libthat: …
    • ….

So far I’ve been building our first-party build products right in the src folder inside each IDE projects, such as Visual Studio and Xcode; and build the third-party products in their own working copy folders.

Problem

However, this reveals several drawbacks, e.g.:

  • In order to accommodate the variety of dependency locations, the lib search paths of the first-party IDE projects become messy
  • it’s hard to track the output products through the file system

Intentions

So I’d love to centralize all the build products including dependencies and our first-party products, so that

  • the build products don’t mess up the repo SCM tidiness
  • all the targets and intermediates are easy to rebuild, archive, or purge
  • it’s easy to track down to the sub source tree of the products from file system

Current Ideas

I’ve tried to create another folder, e.g., _prebuilt_ under thirdparty folder so that it looks like this

  • thirdparty
    • _original_
    • _prebuilt_: holding build products from all thirdparty libs for all platforms
      • platform1
      • platform2
      • platform3
      • ….
    • libthis
    • libthat

One complaint I have towards this scheme: mixing derivatives with working copies (lib…) and archives (original) forces me to make folders of derivatives and archives stand out by naming them with ugly markups, in this case underscores (_).

Another idea is to use a universal folder right at the root of the repo and have all the artifacts of dependencies and project products sit there in a jumble. But then, it sounds messy and would be hard to track the artifacts’ sources.

Either way, some post-build scripts must be set in action to move artifacts out of their original working copies.

Question

In general, what would be the best practice to organize the build products?

I’d love to achieve at least the goals in the Intentions above

bitcoin core – get a proof in merkell tree

suppose I have a set of transaction hashes for a block as ( “ab12” -> “bn56” -> “lk87” ->”op92”) , what would be the code to generate a simple Merkle root of the transactions above.

Given a transaction hash “lm46” at index 1, what will be code to prove that it does not belong to the above Merkle tree. ( here index starts from 0 )

Algorithm to determine if binary tree is a Binary Search Tree (BST)

Continuing with algorithms I’ve implemented a binary search tree validator. I don’t like the two boolean variables within NodeFollowsBSTContract as it feels too complicated. I feel like it should be cleaned up but don’t see how, yet.

Also, before each recursive step down, to check child nodes, a new list is created. Is there’s a better way to implement this check that doesn’t repeatedly create new lists?

public class BinaryTreeNode
{
    public BinaryTreeNode Left { get; set; }
    public BinaryTreeNode Right { get; set; }

    public int? Value { get; }

    public BinaryTreeNode(int value)
    {
        Value = value;
    }
}

public class ValidateBST
{
    BinaryTreeNode _root;
    public ValidateBST(BinaryTreeNode root)
    {
        _root = root;
    }

    public bool IsBinarySearchTree()
    {
        if ((_root.Left?.Value ?? 0) <= (_root.Value)
            || (_root.Right?.Value ?? 0) > (_root.Value))
        {
            var listIncludingRootValue = new List<int>()
            {
                _root.Value.Value
            };

            var leftLegValid = NodeFollowsBSTContract(_root.Left, new List<int>(), new List<int>(listIncludingRootValue));

            var rightLegvalid = NodeFollowsBSTContract(_root.Right, new List<int>(listIncludingRootValue), new List<int>());

            return leftLegValid && rightLegvalid;
        }
        else
        {
            return false;
        }   
    }

    private bool NodeFollowsBSTContract(BinaryTreeNode node, List<int> parentSmallerValues, List<int> parentLargerValues)
    {
        if (node == null)
        {
            return true;
        }

        bool isLessThanAllParentLargerValues = !parentLargerValues.Any()
            || parentLargerValues.Where(value => node.Value.Value <= value).Count() == parentLargerValues.Count;

        bool isGreaterThanAllParentSmallerValues = !parentSmallerValues.Any()
            || parentSmallerValues.Where(value => node.Value.Value > value).Count() == parentSmallerValues.Count;

        if (!isLessThanAllParentLargerValues || !isGreaterThanAllParentSmallerValues)
        {
            return false;
        }

        if (node.Left != null)
        {
            var updatedLargerValues = GenerateUpdatedLists(node.Value.Value, parentLargerValues);
            var updatedSmallervalues = new List<int>(parentSmallerValues);

            if (!NodeFollowsBSTContract(node.Left, updatedSmallervalues, updatedLargerValues))
            {
                return false;
            }
        }

        if (node.Right != null)
        {
            var updatedvalues = GenerateUpdatedLists(node.Value.Value, parentSmallerValues);

            if (!NodeFollowsBSTContract(node.Right, updatedvalues, parentLargerValues))
            {
                return false;
            }
        }

        return true;
    }

    private List<int> GenerateUpdatedLists(int addValue, List<int> values)
    {
        var updatedValues = new List<int>(values)
        {
            addValue
        };

        return updatedValues;
    }
}

discrete mathematics – Prove that each tree $T=(V,E)$ with $|V| geq 2$ contains at least $2$ leaves.

I’m not sure if this lemma has a name or not but here goes:

Every tree $T=(V,E)$ with $|V| geq 2$ contains at least $2$ leaves.

The proof for this relies on the fact that if we start with only $2$ nodes, $v_1,v_2 in V$, then we have an $e in E$, which connects $v_1$ and $v_2$, then $T$ has $2$ leaves. If we have more than $2$ nodes, then we can pick either $v_1$ or $v_2$ and go in either direction. If we go in the direction of $v_1$ and it is not an end node, then we go to the succeeding node. If that is an end node, then we have one leaf. However, since we didn’t move in the direction of $v_2$, we have a total of $2$ leaves. We can repeat this for any number of nodes on either side.

However, how can we argue that it’s possible for there to be more than $2$ leaves?

Does all derivation trees generated by context free grammar in cnf form can be recognized by buttom up tree automata?

G is a context-free grammar in Chomsky normal form.

we define L(G) to the set of all derivation trees that formed by G.

Is it possible to generate a non-detrministic bottom-up tree automaton that will accept L(G) exactly?
if so, how to construct such automaton?

I think it’s true, so I’m trying to construct the automaton but having hard time to define specifically the transition function.

hope to get help.

complexity theory – Best case “skew height” of an arbitrary tree

Given an arbitrary binary tree on $n$ nodes, choose an assignment $A$ from each parent to one of its children (the “favored child” as it were). We define the skew height of the tree as $H_A(mathsf{nil})=0$ and $H_A(mathsf{node};a;b)=max(H_A(a), H_A(b)+1)$ if $A(mathsf{node};a;b)=a$ is the favored child and symmetrically $max(H_A(a)+1, H_A(b))$ if $b$ is favored.

The question is: For a fixed tree $T$, what is the minimum skew height over all assignments? I would like to get an asymptotic bound on $f(n)=max_{|T|=n}min_AH_A(T)$.

Other variations on this problem I am interested in are when the trees are not binary (but there is still one favored child and all others add one to the height), and when there is sharing (i.e. it is a dag), which doesn’t affect the height computation but allows for much wider “trees” while staying under the $n$ node bound.

The obvious bounds are $f(n)=Omega(log n)$ and $f(n)=O(n)$. My guess is that $f(n)=Theta(log n)$ for binary trees, and $f(n)=Theta(sqrt n)$ for dags (with some kind of grid graph as counterexample).

algorithms – Augmenting AVL tree to calculate sum of subtree

Suggest a way to augment an AVL tree to support a $O(log n)$ implementation of the function
calculateSum(key), which receives a key of a node and returns the sum of its subtree.

I implemented it this way:

sumSubtree(node):
    if node != null:
        return sumSubtree(node.left) + sumSubtree(node.right) + node.key
    return 0
    
calculateSum(key):
    node = Search(key) // assuming I have a search function
    return sumSubtree(node)

which solves it in $O(log n)$.

But I read it is possible to maintain the sum during insertion and deletion. And augment an AVL tree this way.

Which solution would be better? Mine, or the other method? Does it matter?

Is Merkle tree pruning described in the whitepaper feasible/useful? If not, would there be any alternative?

When I was reading bitcoin-paper-errata-and-details.md written by David A. Harding, I realized that there’s probably a common misunderstanding or over-simplification about Merkle tree pruning. What Nick ODell had said might be a live example:

  • A leaf (transaction) can be pruned when all of its outputs have been spent.

This once seemed to be true for me, until I read what David had written:

there is currently no way in Bitcoin to prove that a transaction has not been spent

I’m not sure whether I have grasped it, so firstly I made a diagram to illustrate (part of) my understanding to this problem:

incensistent-pruning

Still, I don’t think merely this problem can kill the whole idea of Merkle tree pruning yet, I think it just means that “the reclaimable disk capacity is much lower than expectation”. In other words, if I’m not mistaken, Nick ODell’s claim could be “corrected” like:

  • A leaf (transaction) can be pruned when all of its outputs have been spent, and all of its previous transactions have been pruned.

However, I then think that, even if the “corrected” claim is taken into consideration, the idea of Merkle tree pruning still doesn’t seem to be feasible/useful:

  1. A new full node joining the network needs to download & verify everything. Even if the problem mentioned above is avoided, a malicious node can still deceive the new full node by hiding/picking some merkle branches. In other words, a malicious node can lie about the actual ownership of coins (spent/unspent state) without breaking the Merkle tree structure at all.

  2. If a full node needs to enable pruning to reduce disk space requirement for itself, directly reading/modifying the blockchain files seems to be much less efficient than the current implementation that the UTXO set is completely separated from the blockchain storage, so that a full node (no matter it’s pruning or not) only needs to query and update the UTXO set database during the downloading & validation process. The blockchain itself doesn’t need to be touched once again for validation purposes at all, which is the reason why the old blocks can be simply deleted when “pruning” (not Merkle tree pruning) is enabled.

However, I’m still not sure about this conclusion. Is this related to the idea of fraud proofs, in the sense that as long as there’s still at least one honest full node, the new node would be able to spot which piece of data is the correct one? What if the UTXO set is also committed to the blockchain? What if some more commitments like the block height of previous transaction are also added to the blockchain?

Furthermore, I’ve heard that the Mimblewimble protocol enables secure blockchain pruning. I’m also curious how Mimblewimble could achieve this, and whether similar goal could be eventually achieved in Bitcoin?

recurrence relation – Clarifying statements involving asymptotic notations in soln of $T(n) = 3T(lfloor n/4 rfloor) + Theta(n^2)$ using recursion tree and substitution

Below is a problem worked out in the Introduction to Algorithms by Cormen et. al.

(I am not having problem with the proof but only I want to clarify the meaning conveyed by few statements in the text while solving the recurrence and the statements are given as ordered list at the end. Simply because I want to master the text.)

$$T(n) = 3T(lfloor n/4 rfloor) + Theta(n^2)$$

Now the authors attempt to first find a good guess of the recurrence relation using the recursion tree method and for that they allow sloppiness and assumes $T(n)=3T(n/4) + cn^2$.

Recursion Tree

Though the above recursion tree is not quite required for my question but I felt like including it to make the background a bit clearer.

The guessed candidate is $T(n)=O(n^2)$. Then the authors proof the same using the substitution method.

In fact, if $O(n^2)$ is indeed an upper bound for the recurrence (as we shall verify in a moment), then it must be a tight bound. Why? The first recursive call contributes a cost of $Theta(n^2)$ , and so $Omega(n^2)$ must be a lower bound for the recurrence. Now we can use the substitution method to verify that our guess was correct, that is, $T(n)=O(n^2)$ is an upper bound for the recurrence $T(n) = 3T(lfloor n/4 rfloor) + cn^2$ We want to show that $T(n)leq d n^2$ for some constant $d > 0$.

Now there are a few things which I want to get clarified…

(1) if $O(n^2)$ is indeed an upper bound for the recurrence. Here the sentence means (probably) $exists$ a function $f(n) in O(n^2)$ such that $T(n)in O(f(n))$

(2) $Omega(n^2)$ must be a lower bound for the recurrence Here the sentence means probably $exists$ a function $f(n) in Omega(n^2)$ such that $T(n)in Omega(f(n))$

(3) $T(n)=O(n^2)$ is an upper bound for the recurrence $T(n) = 3T(lfloor n/4 rfloor) + cn^2$ This sentence can be interpreted as follows assume that $T'(n) = 3T'(lfloor n/4 rfloor) + cn^2$ and $exists$ a function $T(n) in O(n^2)$ such that $T'(n)in O(T(n))$

(4) $T(n)leq d n^2$ for some constant $d > 0$ We are using induction to verify to the definition of Big Oh…

I feel that the author could simply have written the $T(n)$ is Upper Bounded by $n^2$ and Lower Bounded by $n^2$ or the author could have simply written $T(n) = O(n^2)$ and $T(n)=Omega(n^2)$, did the author just use the above style of statements as pointed out in $(1),(2),(3)$ just for more clearer explanation or there are some extra meaning conveyed which I am missing out.