## Sunday, November 13, 2011

### ## A numbers puzzle.

OK, this has nothing to do with climate. But it's connected with the trends gadget. I made a nudger that looked like this <<<<>>>>. If you click on the center symbols, you nudge by one month. Next out by 2, then four, and in this version, the outermost move by 8.

Well, suppose the sequence of symbols continued, in powers of 2, as far as needed. How far can you be sure of being able to move with, say, 2, 3, 4 or 5 clicks. More precisely, what is the least number that requires n clicks.

For n=2, it's 3. For n=3 it's 11 (8+2+1 or 16-4-1). But 4? 5? Can anyone think of a rule?

I can think of an upper bound for any n. It's less that 4^(n-1). But not a general formula.

1. 00 = 0 (N - 0)
01 = 1
10 = 2 (N = 1)
11 = 3 (N = 2)
110 = 4
101 = 5
110 = 6
111 = 7
1000 = 8
1001 = 9
1010 = 10
1011 = 11 (8,2,1) (N = 3)
1100 = 12
1101 = 13
1110 = 14
1111 = 15
10000 = 16
10001 = 17
10010 = 18
10011 = 19
10100 = 20
10101 = 21
10110 = 22
10111 = 23 (16,8,-1)
11000 = 24
11001 = 25
11010 = 26
11011 = 27 (32,-4,-1)
11100 = 28
11101 = 29 (32,-4,1)
11110 = 30 (32,-1,-1)
11111 = 31 (32,-1)
100000 = 32
100001 = 33
100010 = 34
100011 = 35
100100 = 36
100101 = 37
100110 = 38
100111 = 39 (32,8,-1)
101000 = 40
101001 = 41
101010 = 42
101011 = 43 (32,8,2,1) (N = 4)
10101011 = 167 (128,32,8,2,1) (N = 5)
1010101011 = 683 (512,128,32,8,2,1) (N = 6)
10101010101010101010101010101010101011 = ? (N = 20)

All are odd numbers, all are prime numbers, all represented in binary start with an on bit, end in two on bits, and in between, alternate between off and on bits.

Just my quess, don't really know if it's correct, but I think writing down in binary makes finding a solution easier.

2. Oops, not all prime numbers. Rats.

"10101011 = 167 (128,32,8,2,1) (N = 5)"

Should be;

"10101011 = 171 (128,32,8,2,1) (N = 5)"

171 = 3 * 57

Oh well, at least they're all odd numbers.

3. Well I popped this into Excel along with the 4^(N-1), and if the progression I'm suggesting is correct. than your equation needs a 2/3 in front of it;

upper bound = 2*4^(N-1)/3 (the ratio is 1 at N = 1, but rather quickly drops down to the 2/3 (~say by N = 5) as an asymptotic value for large N)

Excel poops out at N = 26 (in significant digits anyways), but I ran it up a ways beyond that, what the heck.

But an exponential fit appears to be correct (R^2 = 1), and is a straight line with the y-axis plotted on a log scale.

4. EFS
Yes, I think binary is the way, and you've got the pattern. The way I reasoned in the end was this. Suppose you want an algorithm for efficiently advancing N months. That amounts to reducing N to zero by adding or subtracting powers of 2.

In binary, N can be written N=1x.y, where x is 0 or 1. The "decimal" point is just a marker There are 3 cases for the first step:
x=0: then subtract 10.00.. and two digits have gone; .y remains
x=1 and y>0: subtract 100.00.. That leaves -y, and again 2 digits gone
x=1 and y=0; then N can be reduced in 2 steps.

Then do the same with y. So at each step you can remove two digits or go to zero in two steps. The only number for which the last case hurts is 3, where it takes 2 steps to remove the last two digits.

If at any stage, y begins with 0, you gain a digit. So then you can construct the smallest number that actually requires the theoretical maximum steps (y always starts with 1). And, as you say, it is
10101....011.

5. Oops "This leaves -y" I meant "This leaves 1.-y". But it's less than 1., so the 2 digits have gone.

6. This is off topic.

Nick, if you're ever feeling bored, could you re-run your "Just 60 stations" analysis but include a land mask? There's some talk over at Lucia's about the results of analyses with such few stations.

7. CCE, well the topic is off-topic anyway. I've never actually used a land mask, because I have mainly been interested in the 60 stations idea for global temperatures. So stations which represent a lot of ocean are good.

The triangular mesh methods work a bit like that for regions, because only area within the mesh is used for weighting. I did some crude land-masking in Antarctica where I blocked out the Weddell Sea.

So it's not high priority - it's quite a lot of work, and I don't think land-only temperatures are all that important.

8. If you restrict yourself to only positive or negative clicks, them the number of clicks needed is the number of ones in the binary representation of the absolute value of the number of months that you want to move. So +/- (2^n)-1 requires n clicks.

That gives you a lower bound, and also gives you an efficient way of finding the clicks you need.

9. ...1010101011
= 1 + 2 + 8 + 32 + 128 + ...
= 1 + 2[1 + 4 + 16 + 64 + ...]
= 1 + 2 [4^(n-1) - 1] / [4 - 1] (geometric series)
= (1/3) + (2/3) . 4^(n-1)

(which confirm's EFS's 2/3 value)

10. I think this question may be related to the mathematics behind error correcting codes. The question there is how to include redundancy in the data such that a certain number of bits of error can be corrected. Errors become nudges. We are looking for an optimal sampling of a high dimensional space such that every point is within a certain distance of a sample point.
Power laws are a simple answer - if you want a power law less than one, the Fibonacci sequence has a ratio equal to the golden ratio. However IIRC they are either non-optimal or optimal but non-general. It's ages since I looked at any of this stuff though.
Kevin C

11. Nick,
since you've explored the BEST data stuff a bit is there any way to extract for example subsets of station data based upon lat/long. I know you've plotted the lat/long data but is there any way to extract the data for a selected range of points easily? I'm interested for the purpose of incorporation into a GIS platform.

12. Robert,
You need to organise the data in some structure where it is collected by station. Then in R it is easy to select. TempLS facilitates this - you just write down the latitude and longitude conditions you want, and it analyses that. TempLS doesn't currently write out the selected data, but it could.

Steve McI organises the data into a time series for each station. That isn't ideal for all purposes, but would work well here.

13. Yeah I think that what I would want to do is something like Steve McI does... I can manually do this in excel etc... but the issue is that BEST has access to more stations than I would through GHCN or CHCN. I will have a look at some of his functions over at climate audit. Thanks.

(My intention is to spatially interpolate raw temperatures using kriging in a GIS system within a region because I think it may be an interesting way of looking at the data)

14. If you are bored, could you amuse me please?