## Thursday, September 28, 2017

### ## Simple use of a complex grid - Earth temperature.

This is a follow-up to my last post, which refined ideas from an earlier post on using platonic solids as a basis for grids on the sphere that were reasonably uniform, and free of the pole singularities of latitude/longitude. I gave data files for use, as I did with an earlier post on a special case, the cubed sphere.

The geometry involved can be complicated, but a point I made in that last post was that users need never deal with the complexity. I gave a minimal set of data for grids of varying resolution, which basically recorded the mid-points of the cells, and their area. That is all you need to make use of them.

I should add that I don't think the hexagon method I recommend is a critical improvement over, say, the cubed sphere method. Both work well. But since this method of application is the same for any variant, just using cell centres and areas in the same way, there is no cost in using the optimal.

In this post, I'd like to demonstrate that with an example, with R code for definiteness. I'd also like to expand on the basic idea, which is that near-regular grids of any complexity have he Voronoi property, which is that cells are the domain of points closest to the mid-points. That is why mid-point location is sufficient information. I can extend that to embedding grids in grids of lower resolution; I will recommend a method of hierarchical integration in which empty cells inherit estimates from the most refined grid that has information for their area. I think this is the most logical answer to the empty cell problem.

In the demonstration, I will take the inventory of stations that I use for TempLS. It has all GHCN V3 stations together with a selection of ERSST cells, treated as stations located at grid centres. It has 10997 locations. I will show how to bin these, and use the result to do a single integration of data on those points.

I start with calling the data for the inventory ("invo.sav") (posted in the dataset for cubed sphere above). Then I call the Moyhuhexmin data that I posted in the last post. I am going to do the integration over all 8 resolution levels, so I loop over variable K, collecting results in ave[]:
```load("invo.sav")
ave=rep(NA,8) # initialising
for(K in 1:8){
h=hexmin[[K]]  # dataframe for level K
g=as.matrix(h\$cells) # 3D coords of centres, and areas
y=invo\$z; n=nrow(y);  # invo\$z are stations;
pointer=rep(0,n)
```
This is just gathering the information. g and y are the two sets of 3D cartesian coordinates on the sphere to work with. Next I locate y in the cells which have centre g:
``` pointer=rep(0,n)
for(i in 1:n) pointer[i]=which.min(g[,1:3]%*%y[i,]) # finding closest g to y
```
If this were a standalone calculation, I wouldn't have done this as a separate loop. But the idea is that, once I have found the pointers, I would  store them as a property of the stations, and never have to do this again. Not that it is such a pain; although I discussed last time a multi-stage process, first identifying the face and then searching that subset, in fact with near 11000 nodes and 7682 cells (highest resolution), the time taken is still negligible - maybe 2 seconds on my PC.

Now to do an actual integration. I'll use a simple known function, where one would normally use temperature anomalies assigned to a subset of stations y. I'll use the y-coord in my 3D, which is sin(latitude),, and sunce that has zero integral, I'll integrate the square. The answer should be 1/3.
```integrand=y[,2]^2
area=g[,4];
cellsum=cellnum=rep(0,nrow(g))  # initialising
for(i in 1:n){
j=pointer[i]
cellsum[j]=cellsum[j]+integrand[i]
cellnum[j]=cellnum[j]+1
}
```
area[] is just the fourth column of data from hexmin; it is the area of each cell on the sphere. cellsum[] with be the sum of integrand values in the cell, and cellnum[] the count (for averaging). This is where the pointers are used. The final stage is the weighted summation:
```o=cellnum>0 # cells with data
ave[K] = sum(cellsum[o]*area[o]/cellnum[o])/sum(area[o]) # weighted average
} # end of K loop
```
This is, I hope, fairly obvious R stuff. o[] marks cells with data which are the only ones included in the sum. area[o] are the weights, and to get the averages I divide by the sum of weights. This is just conventional grid averaging.

#### Integration results

Here are the results of grid integration of sin^2(lat) at various resolution levels:

 level Number of cells average 1 32 0.3292 2 122 0.3311 3 272 0.3275 4 482 0.3256 5 1082 0.3206 6 1922 0.3167 7 3632 0.3108 8 7682 0.3096
The exact answer is 1/3. This was reached at fairly coarse resolution, which is adequate for this very smooth function. At finer resolution, empty cells are an increasing problem. Simple averaging ignoring empty cells effectively assigns to those eells the average of the rest. Because the integrand has peak value 1 at the poles, where many cells are empty, those cells are assigned a value of about 1/3, when they really should be 1. That is why the integral diminishes with increasing resolution. It is also why the shrinkage tapers off, because once most cells in the region are empty, further refinement can't make much difference.

This is the problem that Cowtan and Way 2013 studied with HADCRUT. HADCRUT averages hemispheres separately, so they effectively infill empty cells with hemisphere averages. But Arctic areas especially are warming faster than average, and HADCRUT tends to miss this. C&W tried various methods of interplating, particularly polar values, and got what many thought was an improvement, more in line with other indices. I showed at the time that just averaging by latitude bands went a fair way in the same direction.

With the new grid methods here, that can be done more systematically. The Voronoi based matching can be used to embed grids in grids of lower resolution, but fewer empty cells. Integrtaion can be done starting with a coarse grid, and then going to higher resolution. Infilling of an empty cell can be done with the best value from the heirarchy.

I use an alternative diffusion based interpolation as one of the four methods for TempLS. It works very well, and gives results similar to the other two of the three best (node-nased triangular mesh and spherical harmonics). I have tried variants of the heirarchical method, with similar effect.

#### Next

In the next post, I will check out the hierarchical method applied to this simple example, and also to HADCRUT4 gridded version. I'm hoping from a better match with Cowtan and Way.