A few things to get out of the way first: I’m not asking what properties the function must have such that a global optimum exists, we assume that the objective function has a (possibly non-unique) global optimum which could be theoretically found by an exhaustive search of the candidate space. I’m also using “theoretically useful” in a slightly misleading way because I really couldn’t understand how to phrase this question otherwise. A “theoretically useful cost function” the way I’m defining it is:

A function to which some theoretical optimisation algorithm can be applied such that the algorithm has a non-negligible chance of finding the global optimum in less time than exhaustive search

A few simplified, 1-dimensional examples of where this thought process came from:

Here’s a function which, while not being convex or differentiable (as it’s discrete), is easily optimisable (in terms of finding the global maximum) with an algorithm such as Simulated Annealing.

Here is a function which clearly cannot be a useful cost function, as this would imply that the arbitrary search problem can be classically solved faster than exhaustive search.

Here is a function which I do not believe can be a useful cost function, as moving between points gives no meaningful information about the direction which must be moved in to find the global maximum.

The crux of my thinking so far is along the lines of “applying the cost function to points in the neighbourhood of a point must yield some information about the location of the global optimum”. I attempted to formalise (in a perhaps convoluted manner) this as:

Consider the set $D$ representing the search space of the problem and thus the domain of the function and the undirected graph $G$, where each element of $D$ is assigned a node in $G$, and each node in $G$ has edges which connect it to its neighbours in $D$. We then remove elements from $D$ until the objective function has no non-global local optima over this domain and no plateaus exist (i.e. the value of the cost function at each point in the domain is different from the value of the cost function at each of its neighbours). Every time we remove an element $e$ from $D$, we remove the corresponding node from the graph $G$ and add edges which directly connect each neighbour of $e$ to each other, thus they become each others’ new neighbours. The number of elements which remain in the domain after this process is applied is designated $N$. If $N$ is a non-negligible proportion of $#(D)$ (i.e. significantly greater than the proportion of $#({$possible global optima$})$ to $#(D)$) then the function is a useful objective function.

Whilst this works well for the function which definitely is useful and the definitely not useful boolean function, this process applied to the random function seems incorrect, as the number of elements that would lead to a function with no local optima IS a non-negligible proportion of the total domain.

Is my definition on the right track? Is this a well known question I just can’t figure out how to find the answer to? Does there exist some optimisation algorithm that would theoretically be able to find the optimum of a completely random function faster than exhaustive search, or is my assertion that it wouldn’t be able to correct?

In conclusion, what is different about the first function that makes it a good candidate for optimisation to any other functions which are not.