performance – Python Rabin-Karp Code Optimization

Here’s the Rabin-Karp implementation that I’ve written in Python:

from functools import lru_cache
from itertools import islice
from typing import List, Optional


BASE = 139
MOD = 2 ** 31 - 1


@lru_cache(maxsize=None)
def char_to_int(c: str) -> int:
    return ord(c) + 1

def calculate_hash(s: str, length: Optional(int) = None) -> int:
    ret_hash = 0
    if length is not None:
        s = islice(s, 0, length)
    power = len(s) - 1 if length is None else length - 1
    for i, c in enumerate(s):
        ret_hash += char_to_int(c) * BASE ** power
        power -= 1
    return ret_hash % MOD

def roll_hash(prev_hash: int, prev_char: int, next_char: str, length: int) -> int:
    new_hash = ((prev_hash - char_to_int(prev_char) * BASE ** (length - 1)) * BASE) % MOD
    new_hash += char_to_int(next_char)
    return new_hash % MOD

def rabin_karp(text: str, pattern: str) -> List(int):
    """
    Returns a list of indices where each entry corresponds to the starting index of a
    substring of `text` that exactly matches `pattern`
    """
    p_hash = calculate_hash(pattern)
    n = len(pattern)
    curr_hash = calculate_hash(text, n)
    indices = ()
    if p_hash == curr_hash:
        indices.append(0)
    for i in range(1, len(text) - n + 1):
        curr_hash = roll_hash(
            curr_hash, text(i - 1), text(i + n - 1), n
        )
        if p_hash == curr_hash:
            indices.append(i)
    return indices

if __name__ == "__main__":
    with open("lorem_ipsum.txt", "r") as f:
        text = f.read()
    pattern = "lorem"
    indices = rabin_karp(text, pattern)
    print(f"indices: {indices}")

I’m trying to optimize the code as much as possible, so I’ve tried to do some dynamic code analysis to better understand bottlenecks. I used cProfile to understand the function calls and made changes to the code accordingly to arrive at the above code. Here is the final output from cProfile:

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    60301    0.091    0.000    0.091    0.000 rabin_karp.py:25(roll_hash)
        1    0.035    0.035    0.126    0.126 rabin_karp.py:30(rabin_karp)
       42    0.000    0.000    0.000    0.000 rabin_karp.py:11(char_to_int)
        2    0.000    0.000    0.000    0.000 rabin_karp.py:15(calculate_hash)
       57    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       42    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}

Are there any other ways I could further optimize the code? Another interesting thing of note is that adding @lru_cache actually increases execution time as measured by timeit despite the caching mechanism reducing the number of functions calls to char_to_int() (from 120612 to 42).

notation – The meaning of arrow in an optimization problem

I would like to know the interpretation of the following notations in the context of an optimisation problem.

I want to minimize $f_0(vec{x})$, where the vector $vec{x}=(x_1,x_2,cdots,x_n)$ is the optimization variable of the problem, and $f_0: mathbf{R}^n rightarrow mathbf{R}$ is the objective function.

How can I understand this part $f_0: mathbf{R}^n rightarrow mathbf{R}$ ?
Does this mean the objective function takes a vector of $n$ dimensional real numbers and gives a single one dimensional real number as a solution?

performance – Monte Carlo Tree Search Optimization and Loss Prevention

I’m working on a implementation of Monte Carlo Tree Search in Swift.

It’s not bad, but it could be better! I’m principally interested in making my algorithm:

  1. faster (more iterations/second)
  2. prioritize moves that prevent instant losses (you’ll see…)

Here is the main driver:

final class MonteCarloTreeSearch {
    var player: Player
    var timeBudget: Double
    var maxDepth: Int
    var explorationConstant: Double
    var root: Node?
    var iterations: Int

    init(for player: Player, timeBudget: Double = 5, maxDepth: Int = 5, explorationConstant: Double = sqrt(2)) {
        self.player = player
        self.timeBudget = timeBudget
        self.maxDepth = maxDepth
        self.explorationConstant = explorationConstant
        self.iterations = 0
    }
    
    func update(with game: Game) {
        if let newRoot = findNode(for: game) {
            newRoot.parent = nil
            newRoot.move = nil
            root = newRoot
        } else {
            root = Node(game: game)
        }
    }

    func findMove(for game: Game? = nil) -> Move? {
        iterations = 0
        let start = CFAbsoluteTimeGetCurrent()
        if let game = game {
            update(with: game)
        }
        while CFAbsoluteTimeGetCurrent() - start < timeBudget {
            refine()
            iterations += 1
        }
        print("Iterations: (iterations)")
        return bestMove
    }
    
    private func refine() {
        let leafNode = root!.select(explorationConstant)
        let value = rollout(leafNode)
        leafNode.backpropogate(value)
    }
    
    private func rollout(_ node: Node) -> Double {
        var depth = 0
        var game = node.game
        while !game.isFinished {
            if depth >= maxDepth { break }
            guard let move = game.randomMove() else { break }
            game = game.update(move)
            depth += 1
        }
        let value = game.evaluate(for: player).value
        return value
    }
    
    private var bestMove: Move? {
        root?.selectChildWithMaxUcb(0)?.move
    }
    
    private func findNode(for game: Game) -> Node? {
        guard let root = root else { return nil }
        var queue = (root)
        while !queue.isEmpty {
            let head = queue.removeFirst()
            if head.game == game {
                return head
            }
            for child in head.children {
                queue.append(child)
            }
        }
        return nil
    }
}

I built this driver with a maxDepth argument because playouts/rollouts in my real game are fairly long and I have a access to a decent static evaluation function. Also, the BFS findNode method is so that I can reuse parts of the tree.

Here’s what a node in the driver looks like:

final class Node {
    weak var parent: Node?
    var move: Move?
    var game: Game
    var untriedMoves: (Move)
    var children: (Node)
    var cumulativeValueFor: Double
    var cumulativeValueAgainst: Double
    var visits: Double

    init(parent: Node? = nil, move: Move? = nil, game: Game) {
        self.parent = parent
        self.move = move
        self.game = game
        self.children = ()
        self.untriedMoves = game.availableMoves()
        self.cumulativeValueFor = 0
        self.cumulativeValueAgainst = 0
        self.visits = 0
    }
    
    var isFullyExpanded: Bool {
        untriedMoves.isEmpty
    }
    
    lazy var isTerminal: Bool = {
        game.isFinished
    }()
    
    func select(_ c: Double) -> Node {
        var leafNode = self
        while !leafNode.isTerminal {
            if !leafNode.isFullyExpanded {
                return leafNode.expand()
            } else {
                leafNode = leafNode.selectChildWithMaxUcb(c)!
            }
        }
        return leafNode
    }
    
    func expand() -> Node {
        let move = untriedMoves.popLast()!
        let nextGame = game.update(move)
        let childNode = Node(parent: self, move: move, game: nextGame)
        children.append(childNode)
        return childNode
    }
    
    func backpropogate(_ value: Double) {
        visits += 1
        cumulativeValueFor += value
        if let parent = parent {
            parent.backpropogate(value)
        }
    }
    
    func selectChildWithMaxUcb(_ c: Double) -> Node? {
        children.max { $0.ucb(c) < $1.ucb(c) }
    }

    func ucb(_ c: Double) -> Double {
        q + c * u
    }
    
    private var q: Double {
        let value = cumulativeValueFor - cumulativeValueAgainst
        return value / visits
    }
    
    private var u: Double {
        sqrt(log(parent!.visits) / visits)
    }
}

extension Node: CustomStringConvertible {
    var description: String {
        guard let move = move else { return "" }
        return "(move) ((cumulativeValueFor)/(visits))"
    }
}

I don’t think there’s anything extraordinary about my node object? (I am hoping, though, that I can do something to/about q so that I might prevent an “instant” loss in my test game…


I’ve been testing this implementation of MCTS on a 1-D variant of “Connect 4”.

Here’s the game and all of it’s primitives:

enum Player: Int {
    case one = 1
    case two = 2
    
    var opposite: Self {
        switch self {
        case .one: return .two
        case .two: return .one
        }
    }
}

extension Player: CustomStringConvertible {
    var description: String {
        "(rawValue)"
    }
}

typealias Move = Int

enum Evaluation {
    case win
    case loss
    case draw
    case ongoing(Double)
    
    var value: Double {
        switch self {
        case .win: return 1
        case .loss: return 0
        case .draw: return 0.5
        case .ongoing(let v): return v
        }
    }
}

struct Game {
    var array: Array<Int>
    var currentPlayer: Player
    
    init(length: Int = 10, currentPlayer: Player = .one) {
        self.array = Array.init(repeating: 0, count: length)
        self.currentPlayer = currentPlayer
    }
    
    var isFinished: Bool {
        switch evaluate() {
        case .ongoing: return false
        default: return true
        }
    }

    func availableMoves() -> (Move) {
        array
            .enumerated()
            .compactMap { $0.element == 0 ? Move($0.offset) : nil}
    }
    
    func update(_ move: Move) -> Self {
        var copy = self
        copy.array(move) = currentPlayer.rawValue
        copy.currentPlayer = currentPlayer.opposite
        return copy
    }
    
    func evaluate(for player: Player) -> Evaluation {
        let player3 = three(for: player)
        let oppo3 = three(for: player.opposite)
        let remaining0 = array.contains(0)
        switch (player3, oppo3, remaining0) {
        case (true, true, _): return .draw
        case (true, false, _): return .win
        case (false, true, _): return .loss
        case (false, false, false): return .draw
        default: return .ongoing(0.5)
        }
    }
    
    private func three(for player: Player) -> Bool {
        var count = 0
        for slot in array {
            if slot == player.rawValue {
                count += 1
            } else {
                count = 0
            }
            if count == 3 {
                return true
            }
        }
        return false
    }
}

extension Game {
    func evaluate() -> Evaluation {
        evaluate(for: currentPlayer)
    }
    
    func randomMove() -> Move? {
        availableMoves().randomElement()
    }
}

extension Game: CustomStringConvertible {
    var description: String {
        return array.reduce(into: "") { result, i in
            result += String(i)
        }
    }
}

extension Game: Equatable {}

While there are definitely efficiencies to be gained in optimizing the evaluate/three(for:) scoring methods, I’m more concerned about improving the performance of the driver and the node as this “1d-connect-3” game isn’t my real game. That said, if there’s a huge mistake here and a simple fix I’ll take it!

Another note: I am actually using ongoing(Double) in my real game (I’ve got a static evaluation function that can reliably score a player as 1-99% likely to win).


A bit of Playground code:


var mcts = MonteCarloTreeSearch(for: .two, timeBudget: 5, maxDepth: 3)
var game = Game(length: 10)
// 0000000000
game = game.update(0) // player 1
// 1000000000
game = game.update(8) // player 2
// 1000000020
game = game.update(1) // player 1
// 1100000020
let move1 = mcts.findMove(for: game)!
// usually 7 or 9... and not 2
print(mcts.root!.children)
game = game.update(move1) // player 2
mcts.update(with: game)
game = game.update(4) // player 1
mcts.update(with: game)
let move2 = mcts.findMove()!

Unfortunately, move1 in this sample “playthru” doesn’t try to prevent the instant win-condition on the next turn for player 1?! (I know that orthodox Monte Carlo Tree Search is in the business of maximizing winning not minimizing losing, but not picking 2 here is unfortunate).

So yeah, any help in making all this faster (perhaps through parallelization), and fixing the “instant-loss” business would be swell!

[FREE] 30+ Search Engine Optimization tools for FREE!

[FREE] 30+ Search Engine Optimization tools for FREE!

30+ Exclusive SEO tools for BHW users:

Download:
full2design.com/seotools/tools.php

optimization – Optimal clustering with optimal number of clusters as well as max and min cluster size constraints

I need to cluster $N$ data points.

I don’t know the number of clusters to be formed. It needs to be found optimally.

Also, there is maximum and minimum cluster size constraints, where $C_{max}$ is the maximum size that one cluster can get and $C_{min}$ is the minimum size that one cluster must get.

The coordinates of the $N$ data points are stored in a matrix $Dinmathbb{R}^{2times N}$, $D$ is a matrix of $N$ rows and each row has two elements defining the $x-$axis and $y-$axis coordinates. However, $D$ can be expressed in any convenient form, for example, $Dinmathbb{R}^{Ntimes 2}$ or $Dinmathbb{C}^{Ntimes 1}$ (where the coordinates are expressed as a complex number: $x+iy$)

How can I formulate this as a mathematical optimization problem and solve it efficiently?

Data points can be uniformly distributed over a 2D plan of any given size.

Polynomial time optimization problems belong to which complexity class?

Take a look at the FP complexity class.

Technically speaking, it is wrong. However, usually people don’t make a huge difference between the two. Its usually clear from the context whether the problem is a decision or a search problem, hence its clear whether it should be in $P$ or $FP$.

So, writing that “Finding the shortest $(s,t)$ path in the graph $G$, is a $P$ problem” would actually mean: “Finding the shortest $(s,t)$ path in the graph $G$, is an $FP$ problem” (since the context is clear)

mathematical optimization – Bug in NMaximize in 12.2?

It seems (clearly!) to be a bug in the preprocessing for the new convex optimizer. Use one of the other methods (e.g. "DifferentialEvolution"):

Trace(
 NMaximize(E^(-x^2) - 1, x),
 _Optimization`MinimizationProblem,
 TraceForward -> True,
 TraceInternal -> True
 )

Workaround:

NMaximize(E^(-x^2) - 1, x, Method -> "DifferentialEvolution")

(*  {0., {x -> -5.45643*10^-9}}  *)

Alternatively, you can turn off the convex minimizer:

Block({Optimization`UseConvexMinimize = False},
 NMinimize(-(E^(-x^2) - 1), x)
 )

(*   {0., {x -> -5.45643*10^-9}}  *)

optimization – In a language interpreted line by line – is optimizing similar lines of code within a module into functions better in terms of efficiency?

Python does not interpret line by line. The default Python implementation (CPython) compiles the entire module to bytecode and then runs it. However, the CPython implementation does not place an emphasis on optimizations. The interpreter will do exactly what you tell it to do, which means that small changes to your code can have a big performance effect. In particular, function or method calls are relatively “slow” in CPython.

But in absolute terms, it doesn’t matter for performance. You’re writing code for GUI automation. The interaction with the GUI will be far slower than calling a function or parsing some lines of code. This is a bit like being on a journey between two cities. You are proposing to take a shortcut that will save you a minute on this tour, when you’ll actually spend an hour waiting in an airport to board a flight.

So performance doesn’t really matter here. What does matter? Maintainability. Instead of copy and pasting the same code four times, it is usually better to clearly separate the things that stay the same from the things that differ between instances of this code. A function is the perfect tool to express that. Thus, while your alternative solution with a function might run 200 nanoseconds slower, it is the objectively better approach here.

In reality, writing maintainable code with good abstractions is good for performance. When your code is easy to understand, it is easier to understand how performance can be improved, and to implement those improvements. For example, if you find that there’s a faster alternative for the checkExists() method, you’ll now only have to update it in one place. Of course, most code is not performance-critical, so such optimizations are unlikely to have a noticeable effect.

algebra precalculus – A proof method that optimization of algebraic expression without using any inequality and calculus tools

Problem says:

Let $x^2+y^2=100$, where $x,y>0$. For which ratio of $x$ to $y$, the value of $x^2y$ will be maximum?

I know these possible tools:

  • AM-GM inequality

  • Calculus tools


Here I want to escape from all of the tools I mentioned above.

Here I will try to explain my attempts in the simplest sentences. (my english is not enough, unfortunately). I will not prove any strong theorem and also I’m not sure what I’m doing exactly matches the math, rigorously.


Solution I made:

First, it is not necessary to make these substitutions. I’m just doing this to work with smaller numbers.

Let, $x=10m, ~y=10n$,

where $0<m<1,~ 0<n<1$, then we have

$$ x^2+y^2=100 iff m^2+n^2=1$$

$$x^2y=1000m^2n$$

This means

$$maxleft{x^2yright}=10^3maxleft{m^2nright}$$

$$m^2n=n(1-n^2)=n-n^3$$

Then suppose that,

$$begin{align}maxleft{n-n^3 mid 0<n<1right}&=a, a>0&end{align}$$

This implies

$$n-n^3-a≤0,~ forall ninmathbb (0,1)$$

$$n^3-n+a≥0,~forall ninmathbb (0,1)$$

Then, we observe that

For all $0<n<1$ we have

$$n^3-n<0, a>0$$

This means,

$$forall n≥1implies n^3-n+a≥0$$

Finally, we deduce that

$$forall n>0implies n^3-n+a≥0.$$


Using the last conclusion, I will use the my following assumption/lemma (I’m making up a little):

Let, $a>0$ and if

$$begin{cases} n^3-n+a≥0, forall n>0 \ n^3-n+a not ≥0, forall n≤0 end{cases}$$

then there exist $u,v>0$, such that

$$n^3-n+a=(n-u)^2(n+v)≥0.$$

Equality occurs, if and only if

$$n=u>0$$


Based on these, we have:

$$begin{align}n^3-n+a= (n-u)^2(n+v)≥0 end{align}$$

$$begin{align}n^3-n+a = & n^3 – n^2(2u-v)\
&+ n(u^2 – 2 u v ) \
&+ u^2v end{align}$$

$$begin{align} begin{cases} 2u-v=0 \ u^2-2uv=-1 \u^2v=a \u,v>0 end{cases} &implies begin{cases} v=2u \ u^2-4u^2=-1 \ 2u^3=a \ u,v>0 end{cases}\
&implies begin{cases} u=frac{sqrt 3}{3} \ v=frac{2sqrt 3}{3}\ a=2left(frac{sqrt 3}{3} right)^3=frac{2sqrt 3}{9} end{cases} end{align}$$

$$begin{align}n^3-n+frac{2sqrt 3}{9} &=left(n-frac{sqrt 3}{3} right)^2left(n+frac{2sqrt 3}{3}right)≥0.&end{align}$$

As a result, we deduce that

$$begin{align}n-n^3-frac{2sqrt 3}{9} &=-left(n-frac{sqrt 3}{3} right)^2left(n+frac{2sqrt 3}{3}right)≤0, &~forall nin (0,1).&end{align}$$

$$begin{align}maxleft{n-n^3 mid 0<n<1right}&=frac{2sqrt 3}{9}, ~ text{at }~ n=frac{sqrt 3}{3}&end{align}$$

Finally, we obtain

$$m=sqrt{1-n^2}=sqrt{1-frac 13}=frac{2sqrt 3}{3}$$

$$frac xy=frac mn=2.$$


Question:

  • How much of the things I’ve done here are correct?

Why Promote Abhi for Social Media Optimization Services?

We have built a successful presence among the large number of social media optimization agencies in India. We keep everything transparent between us and our customers in order to build trust. We hired the best talent in the industry to take a company to new heights. Our priority is to deliver high quality results on time. We strictly adhere to our schedule and deadlines to deliver the work on time. With our social media optimization services, you can see how your business is getting results in real time. Affordable pricing that will give you exactly what your budget is looking for! You can choose our low cost SMO solutions for the best value for money. Sitting in a chair and running a business is not enough. If you want to be successful in real time, call us now!

.