## Regression as derivative

In two recent posts here and here, I looked at a moving OLS trend calculation as a numerical derivative for a time series. I was mainly interested in improving the noise performance, leading to an acceleration operator.

Along the way I claimed that you could get essentially the same results by either smoothing and differentiating the smooth, or differencing and smoothing the differences. In this post, I'd like to develop that, because I think it is a good way of seeing the derivative functionality.

This has some relevance in the light of a recent paper of Marotske et al, discussed here. M used "sliding" regressions in this way, and Carrick linked to my earlier posts.

#### Integrating by parts

My earlier derivation was for continuous functions. If we define an operator:

R=t/X, t from -N to N, zero outside

and X is a normalising constant, then the OLS moving trend is

β(t) = ∫R(τ)y(τ+t) dτ

where ∫ is over all reals ( OK since R has compact support). X is chosen so that t has unit trend: ∫R(τ)*(τ+t) dτ = 1.

I'll define W(t)=-∫

_{∞}

^{t}R(τ) dτ, (modified following suggestion from HaroldW, thanks) and use D=d/dτ, so DW = -R, and W=-D

^{-1}R. Then

∫D(W(τ)y(τ+t)) dτ = 0 = -∫R(τ)y(τ+t) dτ + ∫W(τ)Dy(τ+t) dτ,

or, β(t) = ∫R(τ)y(τ+t) dτ = ∫W(τ)Dy(τ+t) dτ

Now W is a standard Welch taper.

It is the cumulative integral of R, and since that has mean subtracted, so integral over the whole range is zero, then it is a quadratic that starts from zero at -N and returns to zero at N, and is zero outside that range. So that establishes our first proposition:

β(t) = ∫W(τ)Dy(τ+t) dτ

ie a Welch-smoothed derivative of y.

Now D is wrt τ, but would give the same result if it were wrt t. In that case, it can be taken outside the integration:

β(t) = D∫W(τ)y(τ+t) dτ

That is or second result - the sliding trend β(t) is just the derivative of the Welch-smoothed y.

#### Application to time series

I introduced D because it has a nice difference analogue

Δy = y

_{i}- y

_{i-1}

It's inverse Δ

^{-1}is a cumulative sum (from -∞). So the same summation by parts works:

β(i) = Σ

_{j}R(j)y(i+j)

Again W = -Δ

_{-1}R is a symmetric parabola coming to zero at each end of the range - ie Welch. Then

Σ

_{j}Δ(W(j)y(i+j) = 0 = -Σ

_{j}R(j)y(i+j) + Σ

_{j}W(j)Δy(i+j)

or β(i) = Σ

_{j}W(j)Δy(i+j)

Again that's the first result - the sliding trend is exactly the Welch smooth of the differences of y. Smoothed differentiation.

Again, Δ can be regarded as applying to i rather than j.

β(i) = Σ

_{j}W(j)y(i+j)+Σ

_{j}W(j)y(i+j-1) = ΔΣ

_{j}W(j)y(i+j)

The sliding trend is exactly the differences of the Welch smooth of y.