Thanks!
@spmontecarlo
Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him). Homepage: https://sites.google.com/view/sp-monte-carlo Seminar: https://sites.google.com/view/monte-carlo-semina
Thanks!
I guess the thinking was that a specific example might be more interesting than the general phenomenon of convex / concave as sup / inf of linear.
Sure, yeah.
An odd miscellaneum: viewed as a functional of the data-generating process, the Bayes risk is concave.
Corollary: variance and mean absolute deviation are concave functions of the measure in question.
".. and our host: Danieeeel Stroock!"
With that being said, the substance of the proof is much as it ever was, and the quantitative part is easy to extract.
The more 'modern' version of the result then appears in Stroock's book on Large Deviations with Deuschel, and again in a collaboration of Deuschel-Holley-Stroock.
1) The 'usual' lemma does not appear as a stand-alone result in the original paper, but is rather used as part of another lemma (5.1, for posterity).
2) The usual lemma is also quantitative in character, whereas in this initial paper, it is only used qualitatively.
Something which I finally got around to: figuring out the origins of "the Holley-Stroock perturbation lemma" (in the context of functional inequalities). This seems self-explanatory enough, since the relevant 1987 paper of Holley and Stroock is relatively well-documented. Still, some surprises:
(not that i'll be dropping Β£170 any time soon ...)
do mine eyes deceive me? a release date in the present year?
Somehow, theorems about P_{\hat{\theta}} should 'always' be viable, not depend on parameterisations, etc., which seems ideal on the theory side.
Ah, sure; this isn't really one of those. This is coming out of teaching a statistical theory course, and trying to sound out what sort of theorems are "natural". Increasingly, I see that I want to prove things about e.g. P_{\hat{\theta}} rather than just \hat{\theta}, which is more usual.
*- any functional of the data-generating process is a parameter
A byproduct of this framing is that it (not necessarily beneficially) precludes the existence of certain types of non-identifiability.
My relationship with "parameters" in statistical models has passed through approximately three phases:
1) parameters are convenient ways of thinking about models
2) parameters are a red herring; focus directly on the data-generating process
3) parameters are good, because anything* is a parameter.
Let me be explicit in highlighting Rocco as leading the work on this project and doing an excellent job - very talented junior researcher, and well worth keeping on your radar for all the expected reasons.
A little part of the paper which I like a lot is to express the main algorithm (MTM) as an approximation to an approximation of a certain 'ideal' algorithm. Among other things, this 'twice-approximated' perspective helps to pin down which of the approximations is more problematic / delicate.
Some work freshly published at EJS: projecteuclid.org/journals/ele...
'Analysis of Multiple-try Metropolis via PoincarΓ© inequalities'
- Rocco Caprio, Sam Power, Andi Q. Wang
We conduct a convergence analysis of a specific class of MCMC procedures based on multiple-proposal strategies.
Splendid!
It has a name (Cayley transform), but relevance to this problem is not a priori clear, e.g. doesn't clearly extend or generalise to cube roots.
Generalisations come relatively easily when they are also expressible as an LP / convex program (so e.g. maximising worst-case power over some set is good, while maximising best-case power is not as straightforward; adding in extra significance-type constraints is usually fine, etc.).
Basically, write the test in terms of an f mapping to [0, 1]; significance is an expectation of that f, as is the power. Hence, optimising power s.t. significance constraint (and constraining f to map to [0, 1]) is an LP, and the form of the optimiser is informed by those constraints.
Or maybe just linear programming by itself; need to think more carefully about whether the duality aspect is key. I guess it often is with LPs.
Fun thing to clarify for myself: the Neyman-Pearson Lemma is, at its core, an application of linear programming duality. Once this perspective clicks, it becomes clearer why certain extensions are and are not possible.
A fun thing which is not so widely known: consider the 'Newton ODE' for convex minimisation. Then g_t := grad f (x_t) satisfies a very simple universal ODE.
Very cool!
(Save yourself time in the calculations by taking a = 1!)
A simple calculation with a neat consequence:
Consider using Newton's method to solve
x^2 = a
for some a > 0.
The iterates x_n don't obviously lead to a solvable recurrence.
However, upon setting
w = (x - sqrt(a)) / (x + sqrt(a)),
the recursion for w_n becomes remarkably simple.
Haven't checked the first yet, second is a classic (but equally slightly out of date), I think the third is associated to a one-off workshop / grant / similar.
The office library expands ... (courtesy of Peter Green)
Maybe the more serious limitations would be seen to kick in when you move to the nonparametric setting.