User:Enkidu19/sandbox

= Oracle complexity (Optimization) =

In mathematical optimization, Oracle Complexity is a standard theoretical framework to study the computational requirements for solving classes of optimization problems. It is suitable for analyzing iterative algorithms which proceed by computing local information about the objective function at various points (such as the function's value, gradient, Hessian etc.). The framework has been used to provide tight worst-case guarantees on the number of required iterations, for several important classes of optimization problems.

Formal description
Consider the problem of minimizing some objective function $$f:\mathcal{X}\rightarrow \mathbb{R}$$ (over some domain $$\mathcal{X}$$), where $$f$$ is known to belong to some family of functions $$\mathcal{F}$$. Rather than direct access to $$\mathcal{f}$$, it is assumed that the algorithm can obtain information about $$f$$ via an oracle $$\mathcal{O}$$, which given a point $$\mathbf{x}$$ in $$\mathcal{X}$$, returns some local information about $$f$$ in the neighborhood of $$\mathbf{x}$$. The algorithm begins at some initialization point $$\mathbf{x}_1$$, uses the information provided by the oracle to choose the next point $$\mathbf{x}_2$$, uses the additional information to choose the following point $$\mathbf{x}_3$$, and so on.

To give a concrete example, suppose that $$\mathcal{X}=\mathbb{R}^d$$ (the $$d$$-dimensional Euclidean space), and consider the gradient descent algorithm, which initializes at some point $$\mathbf{x}_1$$ and proceeds via the recursive equation


 * $$ \mathbf{x}_{t+1} = \mathbf{x}_t-\eta\nabla f(\mathbf{x}_t)$$,

where $$\eta$$ is some step size parameter. This algorithm can be modeled in the framework above, where given any $$\mathbf{x_t}$$, the oracle returns the gradient $$\nabla f(\mathbf{x_t})$$, which is then used to choose the next point $$\mathbf{x_{t+1}}$$.

In this framework, for each choice of function family $$\mathcal{F}$$ and oracle $$\mathcal{O}$$, one can study how many oracle calls/iterations are required, to guarantee some optimization criterion (for example, ensuring that the algorithm produces a point $$\mathbf{x}_T$$ such that $$f(\mathbf{x}_T)-\inf_{\mathbf{x}\in\mathcal{X}}f(\mathbf{x})\leq \epsilon$$ for some $$\epsilon>0$$). This is known as the oracle complexity of this class of optimization problems: Namely, the number of iterations such that on one hand, there is an algorithm that provably requires only this many iterations to succeed (for any function in $$\mathcal{F}$$), and on the other hand, there is a proof that no algorithm can succeed with fewer iterations uniformly for all functions in $$\mathcal{F}$$.

The oracle complexity approach is inherently different from computational complexity theory, which relies on the Turing machine to model algorithms, and requires the algorithm's input (in this case, the function $$f$$) to be represented as a bit of strings in memory. Instead, the algorithm is not computationally constrained, but its access to the function $$f$$ is assumed to be constrained. This means that on the one hand, oracle complexity results only apply to specific families of algorithms which access the function in a certain manner, and not any algorithm as in computational complexity theory. On the other hand, the results apply to most if not all iterative algorithms used in practice, do not rely on any unproven assumptions, and lead to a nuanced understanding of how the function's geometry and type of information used by the algorithm affects practical performance.

Common settings
Oracle complexity has been applied to quite a few different settings, depending on the optimization criterion, function class $$\mathcal{F}$$, and type of oracle $$\mathcal{O}$$.

In terms of optimization criterion, by far the most common one is finding a near-optimal point, namely making $$f(\mathbf{x}_T)-\inf_{\mathbf{x}\in\mathcal{X}}f(\mathbf{x})\leq \epsilon$$ for some small $$\epsilon>0$$. Some other criteria include finding an approximately-stationary point ($$\|\nabla f(\mathbf{x}_T)\|\leq \epsilon $$), or finding an approximate local minima.

There are many function classes $$\mathcal{F}$$ that have been studied. Some common choices include convex vs. strongly-convex vs. non-convex functions, smooth vs. non-smooth functions (say, in terms of Lipschitz properties of the gradients or higher-order derivatives), domains with bounded dimension $$d$$, vs. domains with unbounded dimension, and sums of two or more functions with different properties.

In terms of the oracle $$\mathcal{O}$$, it is common to assume that given a point $$\mathbf{x}$$, it returns the value of the function at $$\mathbf{x}$$, as well as derivatives up to some order (say, value only, value and gradient, value and gradient and Hessian, etc.). Sometimes, one studies more complicated oracles. For example, a stochastic oracle returns the values and derivatives corrupted by some random noise, and is useful for studying stochastic optimization methods. Another example is a proximal oracle, which given a point $$\mathbf{x}$$ and a parameter $$\gamma$$, returns the point $$\mathbf{y}$$ minimizing $$f(\mathbf{y})+\gamma \|\mathbf{y}-\mathbf{x}\|^2$$.

Examples of oracle complexity results
The following are a few known oracle complexity results (up to numerical constants), for obtaining optimization error $$\epsilon$$ for some small enough $$\epsilon$$, and over the domain $$\mathbb{R}^d$$ where $$d$$ is not fixed and can be arbitrarily large (unless stated otherwise). We also assume that the initialization point $$\mathbf{x}_1$$ satisfies $$\|\mathbf{x}_1-\mathbf{x}^*\|\leq B$$ for some parameter $$B$$, where $$\mathbf{x}^*$$ is some global minimizer of the objective function.