Thursday, April 18, 2019

Description and code of TempLS V4 global temperature program.

It's now time to post the code for TempLS V4. I know that it is a day when many folks who really want to get their message read put it out to the world. I didn't want to steal anyone's thunder; the week just didn't have enough days. I've been writing background articles leading up to the release of V4 of TempLS; here are the links:
ideas about spatial integration
icosahedral grid with equal area mapping
documentation system
details of new methods
tests
math methods

TempLS is a code written in R, dating back to about 2010 - there is a summary of its history up to V3 here. The release post (Aug 2010) for V2 is here; Ver 3 was rolled out in three parts, of which the last, here, links to the earlier parts.

Release of V4 is, as I mentioned in the documentation post, complicated by the fact that I use extensively a library of R functions that I keep, and will need to include. But I hope the documentation will also help; I'll be relying on it for this release.

The release is in three zip files, LScode.zip, Moyhupack.zip and LSdata.zip
  • There is a file of code and some internal data - LScode.zip. To run it, you should extract it in an empty directory; it will create six subdirectories (x,g,i,l,m,n). The R file to run it is LS_wt.r, explained below. This zip file is about 98 Kb.
  • As mentioned before, the code now has a lot of generic functions embedded from my general library. There is a documentation system explained at that link, which I also used for the functions of TempLS, below. This zip file includes a list Moyhupack.rda. This you can attach, R style, as a package, which has advantages. The functions from it are also on a file Moyhupack.r, which you can just source() into your environment. It has a lot of functions with short names, most of which you won't want to know about, so there is potential for naming clashes. Finally, there is a documentation file Moyhupack.html. If there is interest, I will write more about the library. I think it does have useful things for math programming. The zipfile is about 800 Kb Moyhupack.zip
  • Finally, there is a set of data. The main one is a recent version of GHCN V4 unadjusted TAVG, named ghcnv.dat. There is also the inventory file that came with it, called ghcnv.inv. The SST data is from ERSST V5; it is part processed in TempLS style and in a directory called x. Note that the code zipfile also created an x directory, with just one list, which would overwrite this larger version if unzipped later. This is a big file - about 54 Mb (mainly GHCN). LSdata.zip

The math task and code structure

As before (V3 and earlier), the code is divided into four parts, The code starts with a text version of GHCN V4, and goes looking for a NetCDF version of ERSST V5, if needed. It assembles all this data into 12 (month) arrays of station temperature (rows) vs year (cols). ERSST grid cells count as stations. Missing values are marked with NA. The objective is to fit the model
T=L+G
where T is the station temperature, L the time-invariant normal, and G the space-invariant global anomaly. The fitting requires integration, which comes down to a weighted sum of the data. The third section of the code uses the data about stations reporting, but not the temperatures, to work out weights for integration by various methods. This can be by far the most time-consuming part of the code, alleviated greatly by the use of past calculated weights where the list of stations reporting hasn't changed.

The fourth section is where the fitting is done, via a brief iteration (few seconds) to get convergent values of L and G, which is the main output. It is also where the optional Spherical Harmonics enhancement is done.

Code details.

The code is now almost entirely done as function calls. I call the main file LS_wt.r, which calls functions from LS_fns.r. These are documented below. The main sequence functions (the four parts) are
  • SortLandData() handles land data
  • SortSSTData() handles ERSST5 data.
  • DeriveWeights() calculates weights from the x array of the previous parts
  • FitGlobal() does the iterative fitting to produce L and G.
There are three lists of data that are switched in and out as needed; they are:
  • I the inventory data, which governs the dimensions of later data matrices. It is updated in a controlled way by newinv() on LSinv.rds

  • J is to describe the results in SortLandData and SortSSTData - assembling x. On LS.rds
  • K is for the method-dependent data in DeriveWeights and FitGlobal, including the main result, K$G.
The 12 (month) temperature files are stored in directory x. The 12 weight files are stored in directories, one for each method
("g","i","m","n","l") for ("grid","infilled","mesh","none","loess"). Since the K data is method dependent, a separate version is stored on each directory as LS.rds.

Code sequence

The main code is:
source("LS_fns.r")
if(!exists("job"))job="UEMO"
print(paste("Starting",job,Sys.time()))
i=vsplit(job,"")
kind=tolower(i)
kinds=c("g","i","m","n","l")
RDS=pc(kind[3],"/LS.rds")
K=readRDS(RDS) # I is inv list, J is list about x, K is list for method kind (w)
K$edition=I$edition
K$kind=kind; K$job=job;
do=i!=kind
wix=pc(kind[3],"/") # info about w
tm=timex(0); tx=unlist(tm); t0=Sys.time();
yr=c(1900,tx[1]+(tx[2]>1)-1); yrs=yr[1]:yr[2]; # you can change these
ny=diff(yr)+1
saveR(K)

if(do[1])SortLandData()

if(do[2])SortSSTData()

if(do[3])DeriveWeights()

if(do[4])K=FitGlobal()


As previously, the user supplies a job string of four characters. They are uppercase if that code section is to be performed. A typical value is job="UEMO". At the moment there aren't realistic alternatives for "UE", which is unadjusted GHCN V4 and ERSST V5. M stands for mesh method, but could be any of ("G","I","M","N","L"). "O" just means perform last section; "P" means go on to produce graphics with another program.

A second control variable that can be set is nSH=4, say. It induces the spherical harmonics enhancement, and the value sets the order, using (nSH+1)^2 functions. Going past 10 or so is risky for stability (and time-consuming).

The third control is called recalc. It lets you override the system of using stored values of weights when the stations reporting in a given month is unchanged. This saves time, but it can happen that you suspect the stored data is wrong, for some reason. Something might have gone wrong, or the inventory might have changed. The default setting is FALSE, or 0, but if you want it to not use stored data, set recalc to 1. There is also a setting that I find useful, recalc=2, which recalculates only the most recent year. This takes very little time, but will check if the code is currently working for that method option. Otherwise if it uses entirely stored data, it could take some time to find errors.

So the actual code here just brings in K, which also acts as a record of what was done. It stores some information and puts K back on disk. The other stuff just makes some time information for use in the main sequence. Note that the last step outputs K. This is where the results are (K$G and K$L).

Documentation of functions

Remember there are a lot of generic functions on the Moyhu package. The functions here are those specific to TempLS.

Topic

Code for TempLS V4.

SortLandData()
SortSSTData()
DeriveWeights()
FitGlobal()
saveR()
todate()
dozen()
getsst()
fixK()
putw()
getw()

fx()
putx()
getx()
getput()
newinv()
txt2dat()
seidel()
getGL()
convertH()
convertoH()
convertcgH()

wtM()
wtN()
nearest()
makewt()
HHa()
wtL()
wtK()
diffuse()
wtG()
wtI()

Description

Code for TempLS V4.

The main sequence calls the first 4 functions, SortLandData to FitGlobal

The analysis loops over the 12 months. There are 2 large entities, temperatures x and weights w, and these are stored by month on disk, and never held at once in RAM. The stages are:

SortLandData() puts on disk the land data, from GHCN V4. I have a scouting system that collects data on the web, and it puts the text files from GHCN under names ghcnv.dat and ghcnv.inv. TempLS has a delay system so that actions are not attempted until a time has elapsed since the last success. After waiting a month, newinv() is called. If the inventory is different from the old, The list I is updated with the new information, and various subsidiary info, including a database eg of distances between stations and a string of nodes (for LOESS). All this is put on files u1.rds etc on the x directory.


SortSSTData() handles ERSST5 data. It calls the files, processes, and puts on e1.rds etc.


DeriveWeights() is the first analysis. It calles the combined x data, figures what method is used, and puts the resulting weights w on disk. The functions for the various methods are called wtG, wtI, wtL, wtM a,d wtN.


FitGlobal() is the iterative loop which fits the basic T = L + G model, where L are the norms a,d G the global offsets. It takes the x and w data from disk, and returns the results as K$L and K$G on list K.


There are 3 lists used for passing results:

I is for the inventory, which governs the dimensions of later data matrices. It is updated in a controlled way by newinv() on LSinv.rds

J is to describe the results in SortLandData and SortSSTData - assembling x. On LS.rds

K is for the method-dependent data in DeriveWeights and FitGlobal, including the main result, K$G. On X/LS.rds, where X is the method letter (n,g,i,m,l)


  • SortLandData()

    SortLandData() puts on disk the land data, from GHCN V4. I have a scouting system that collects data on the web, and it puts the text files from GHCN under names ghcnv.dat and ghcnv.inv. TempLS has a delay system so that actions are not attempted until a time has elapsed since the last success. After waiting a month, newinv() is called. If the inventory is different from the old, The list I is updated with the new information, and various subsidiary info, including a database eg of distances between stations and a string of nodes (for LOESS). All this is put on files u1.rds etc on the x directory.


    function(){  # Stage 1 - read land (GHCN) 
    ki=kind[1]
    I=readRDS("LSinv.rds")
    if(t0-I$time>30)I=newinv(I)
    J=readRDS("LS.rds")
    fdat="ghcnv.dat"
    J$yr=yr; J$ny=diff(yr)+1
    if(file.mtime(fdat)-J$time>0)txt2dat(fdat,J,I) # makes x for land and puts on x
    #cal(x) # records date of new data for stations
    saveR(J)
    print(paste("End stage 1 land stations"))
    }

  • SortSSTData()

    SortSSTData() handles ERSST5 data. It calls the files, processes, and puts on e1.rds etc.


    function(){ # Stage 2 - read SST 
    # ERSST has lon 0:358:2, lat -88:88:2
    J=readRDS("LS.rds")

    if(t0-J$etime> 28) J$etime=getsst() #Don't look until ready
    saveR(J)
    print(pc("End stage 2 ",job))
    }

  • DeriveWeights()

    DeriveWeights() is the first analysis. It calles the combined x data, figures what method is used, and puts the resulting weights w on disk. Because weights are only a function of stations reporting, they dont change much in past years. So a system of using stored w (if no change) is used. The test for no change is total number of stations, stored on K$ix, and ox[] determined whether to use stored or calculate new.

    The ok[] list is to implement global preliminaries appropriate to each type. Then the appropriate wt function is chosen from f3, and implemented in a loop over months and then years. Then w is stores and K updated.

    function(){ #stage 3 -  wts from supplied x 
    rad=pi/180; Cx=list();
    I=readRDS("LSinv.rds")
    iv=I$iv
    f3=list(wtG,wtI,wtM,wtN,wtL)
    oj=o=kind[3]==kinds
    ok=list(g=o[1],i=o[2],m=o[3],n=o[4],l=o[5])
    if(ok$i | ok$g){ # infilled grid
    Msize=20
    Cx=cubemake(Msize)
    Cx$og=ok$g
    Kface=Cx$Kface
    Mn3=Cx$Mn3
    N2=Msize/2
    ck=c_s(I$z)*N2
    o=abs(ck)<N2
    ck[o]=round(ck[o]+.5)-.5
    Cx$kF=k_c(ck/N2) # cell score for each node with data
    }
    if(ok$l){
    Cx$z2=I$z
    mk1=makeicosN(24)
    Cx$z1=mk1$zj
    #step 2
    Cx$M2=20
    Cx$eg=readRDS("LSnearest.rds")
    #step 3
    Cx$HH=HHa(Cx$z2)

    }
    if(ok$m){ # Mesh
    fixhelp() # to fix a R bug with tmp dir
    require("geometry")
    }# end mesh/grid

    fw=f3[[which(oj)]] # choose the function
    oK=is.list(K$ix); if(!oK){K$ix=list();length(K$ix)=12;}
    K$mth=0;
    if(!exists("recalc"))recalc=F
    for(im in 1:12){ # over months
    x=getx(im,"ue") # get x
    w=getw(im,x)
    ix0=K$ix[[im]]
    ix1=colSums(!is.na(x))
    oa=is.null(ix0) | recalc==1 | sum(w)==0
    if(!oa){
    ox=ix0==ix1
    if(recalc==2)ox[ncol(x)]=F
    print(sum(ox))
    }
    print(paste("im",im,format(Sys.time(),"%M:%S")))
    for(ii in 1:ncol(x)){ # over months
    oJ=!is.na(x[,ii]) # stations with data
    lJ=sum(oJ)
    if(lJ<2000) next;
    if(ii==ny)K$mth=im # latest month count
    if(!oa)if(ox[ii])next
    yj=fw(oJ,Cx)
    w[oJ,ii]=yj/sum(yj)
    w[!oJ,ii]=0
    }
    w[is.na(w)]=0
    putw(im,w) # writes w
    K$ix[[im]]=ix1
    }
    K$ns=nrow(w)
    K$yr=dim(x)[2]+1899
    saveR(K)
    print(paste("End stage 3",job))
    }# end DeriveWeights

  • FitGlobal()

    FitGlobal() is the iterative loop which fits the basic T = L + G model, where L are the norms a,d G the global offsets. It takes the x and w data from disk, and returns the final results as K$L and K$G on list K.

    function(){ 
    K=readRDS(RDS)
    if(!exists("nSH"))nSH=0
    if(nSH>0){
    H=makeSH(I$z,nSH); H=H/H[1]; HL=NULL
    pt("Made SH")
    }
    K$nSH=nSH
    np=(nSH+1)^2
    K$time=Sys.time()
    ns=K$ns
    K=getGL(K,c(ns,ny))
    L=K$L;G=K$G;G[,]=0
    is=1:ns; iy=1:ny; ip=1:np
    tv=rep(0,3)
    K$w=K$r=array(NA,c(ns,12,3))
    for(im in 1:12){
    x=getx(im,"ue")
    w=getw(im,x)
    if(nSH>0){
    wl=convertoH(w,H,HL)
    w=wl$w
    if(im==1)HL=wl$HL
    }
    else{
    if(nSH>0)w=convertH(w,H)$w
    }
    K$w[,im,]=w[,-2:0+ny]
    ox=is.na(x)
    wx=w*x
    wx[ox]=0
    w[ox]=0
    tv[1]=tv[1]+system.time(
    {g=seidel(w,x,G[im,])} # gauss and seidel are alternative routines
    )[1]
    gt=g[ns+iy]
    gt[colSums(!ox)<2000]=NA
    Gm=mean(gt[62:91]); # Set base 1961-90
    L[im,]=g[is]+Gm; G[im,]=gt-Gm
    i=-2:0+ny
    K$r[,im,]=x[,i]-L[im,] #-rep(G[im,i],each=ns)

    }
    pt(4)
    print(tv)
    K$G=G
    K$L=L
    # post processing
    if(kind[2]!="e")stop("Finished")
    #doGall(G)
    saveR(K)
    dt=pc(kind[3],nSH,format(Sys.time(),"%b%d-%T"))
    #lt=K; for(i in 1:length(K)){a=K[[i]]; if(is.list(a)|(length(a)>50000))lt[[i]]=NA}
    #diary=getput(lt,"diary")
    #if(kind[4]=="p")source("LSpic.r")
    pt("Done")
    K
    }

  • saveR(K)

    Can save any I,J,K lists using the file destination recorded on the list.

    function(K)saveRDS(K,K$file) 

  • todate(y,m)

    Returns a list of useful things about calendar year y, month m. Used in LSsph

    function(y,m){ # n number of months to go back from now 
    j=(y-1900)*12+m
    t1=1:j-1; mr=t1%%12+1; mm=month.abb[mr]; yr=t1%/%12+1900;
    data.frame(t1=t1+1,nmth=mr,abb=mm,yr=yr,yfrac=round(yr+mr/12,2),yint=yr*100+mr,yalph=sprintf("%s_%02i",y,m))
    }

  • dozen(a,j=2)

    Takes a 2d array with months as columns, and reindexes with months as years, padding to the next year boundary with NA.

    function(a,j=2){   # converts to mth/yr dims, padding if needed 
    d=dim(a);n=length(d); i=d[1];
    if(n==j)return(a)
    if(n==2){
    k=d[2]%%12
    if(k>0)for(m in k:12)a=cbind(a,NA)
    dim(a)=c(i,12,d[2]%/%12)
    }
    if(n==3)dim(a)=c(i,dj[2]*d[3])
    a
    }

  • getsst(s=dater(30)

    The major call of SortSSTData(). It uses curl to call the latest month netcdf file, if it is more recent than the one on hand. The filename changes every month. Then the month of data is downloaded, ice data marked with NA, and the emask applied (see here) to make the grid density more even. It is then stored on the x/e?.rds file. In contrast to land, just the current month is updated.

    function(s=dater(30)){ 
    mth=as.numeric(s[2])
    fl=sprintf("ftp://ftp.ncdc.noaa.gov/pub/data/cmb/ersst/v5/netcdf/ersst.v5.%s%s.nc",s[1],s[2])
    o=getcurl(fl,"ersst.nc") # o=0 success
    if(length(o)!=0)return(J$etime)
    d1="ersst.nc"
    require("ncdf4")
    nc=nc_open(d1)
    s=t(ncvar_get(nc,"sst")) # 89*180
    nc_close(nc)
    s=s[,c(91:180,1:90)] # to start at -180
    o=s< -1 # eliminate ice -1
    s[o]=NA
    load("emask.rda"); # emask includes culling and landmask, also ivs complete\ersst4\mask.r and seamesh/seamesh.r
    oe=emask>0 # cull mask; ERSST is 88:-88:2, 0:358:2, lat first, then long
    x=getx(mth,"e"); d=dim(x)
    if(d[2]<ny)x=pad(x,d+0:1)
    x[,ny]=s[oe]
    putx(x,mth,"e")
    return(t0)
    }

  • fixK(K,s=NULL)

    A frame for modifying all the K files for various methods

    function(K,s=NULL){  # a is string of names) 
    b=c("n","g","i","l","m")
    browser()
    for(i in b){
    f=pc(i,"/LS.rds")
    K=readRDS(f)
    s=c("ivx","ivs","iv")
    for(j in s)K[[j]]=NULL
    saveRDS(K,f)
    }
    }

  • putw(im,x,s="w")

    Stores array x (a w file, DeriveWeights()) for month im (1:12)

    function(im,x,s="w"){saveRDS(x,pc(wix,s,im,".rds"));} 

  • getw(im,x,s="w")

    gets a weight file w from storage.

    function(im,x,s="w"){ 
    ff=pc(wix,s,im,".rds")
    o=file.exists(ff)
    if(o)w=readRDS(ff);
    if(o)o=is.matrix(w)
    if(o)o=prod(dim(w)==dim(x))==1
    if(!o)w=array(0,dim(x))
    w
    }

  • fx(f,i)

    Just makes the filename to look up for x data. f is the type.

    function(f,i)pc("x/",f,i,".rda") 

  • putx(x,i,f="u")

    Stores temperature data x for month i. f is "u" for land, "e" for SST, and "ue" for combined (the main use). In fact, the data is stored separately (as u and e) on the x directory, and put together when requested.

    function(x,i,f="u"){ 
    save(x,file=fx(f,i))
    }

  • getx(i,f="u")

    Recovers temperature data for month i; again could be "u","e" or "ue" (see putx). These mechanisms are used to avoid having more than one month data in RAM at one time.

    function(i,f="u"){ 
    y=NULL
    g=vsplit(f,""); if(g[1]=="w")g=f;
    n=length(g)
    for(j in 1:n){
    fg=fx(g[j],i)
    if(!file.exists(fg)) return(NULL)
    load(fg);
    if(n==1)return(x)
    y=rbind(y,x)
    }
    y
    }

  • getput(list,s="LS")

    s is the stem of a file with a list (I,J,K). The first argument list has components with same names as on s. The function reads s, adds the elements of a, usually oversriting, and stores the updated s.

    function(list,s="LS"){ # reads,RDS file with list; updates with items from argument list, then writes back  
    f=pc(s,".rds")
    a=readRDS(f)
    for(i in names(list)) a[i]=list[i]
    saveRDS(a,f)
    invisible(a)
    }

  • newinv(I)

    A major call from SortLandData(). It is only called once a month. It checks if there is a new GHCN inventory, and if so, updates it and subsidiary information on I, and stores it. It is important that I is only changed in this routine. There is currently no scheme for updating the inventory for SST, which comes from ../emask.sav. It wont change unless ERSST changes the 2degree scheme, or I change the culling.

    function(I){ # enter new inventory 
    print("newinv")
    b=readLines("ghcnv.inv")
    j=gregexpr(" *",b[2])[[1]] # works out spacing
    j=c(1,rbind(j-1,j),max(nchar(b)))
    b=substr(b,1,56)
    inv=Split(b,j) # returns array
    ivx=data.frame(inv)
    colnames(ivx)=c("code","lat","lon","alt","name")
    i=match(names(I$ivs),names(ivx))
    ivx=ivx[,i]
    for(i in 1:3)ivx[,i]=as.numeric(ivx[,i])
    i=I$ivx$code;j=ivx$code;
    o=length(j)!=length(i)
    if(!o)o=sum(i!=j)>0
    if(o){ # make new if needed
    I$ivx=ivx
    I$iv=rbind(I$ivx,I$ivs)
    I$z=tosph(I$iv[,1:2]) # 3D coords
    I$nx=nrow(ivx)
    I$edition=i=I$edition+1
    #r saveRDS(ivx$name,file=pc("invs/inv",0,".rds"))
    # for LOESS method
    mk1=makeicosN(24)
    z1=mk1$zj
    #step 2
    saveRDS(nearest(z1,I$z,200),"LSnearest.rds")
    }
    I$time=t0
    saveR(I)
    I
    }

  • txt2dat(fdat,J,I)

    The main call from SortLandData(). It takes the latest GHCN data file fdat (ghcnv.dat) (as found by the scouting scheme), reads the text, chacks with the inventory I$ivx to remove lines of unknown stations, and replaced the names with a pointer to the inventory. Then it reorganises each years data into the x-array format (stations in rows, years in columns, one for each month 1:12). It sets to NA any flagged data. It stores the x-files in the x dir as u1.rds etc, updates and stores the J list of metadata.

    function(fdat,J,I){ 
    b=readLines(fdat)
    j=gregexpr(" *",b[1])[[1]]
    j=c(1,11,12,15,16,rbind(j-1,j),max(nchar(b)))
    g=Split(b,j)
    m=match(g[,1],I$ivx$code)
    o=!is.na(m)
    g=g[o,];m=m[o];
    i=seq(1,ncol(g),2) # pick out data not flags
    g0=g[,i]
    qcf=substr(g0[,3:14],2,2)
    oq=qcf!=" "
    d=g[,-i]; dm=dim(d);
    d=as.numeric(d)
    dim(d)=dm
    d[,2:13][oq]=NA
    d=t(cbind(m,d))
    d=d[,d[2,]>1899]
    d[d< -9998]=NA
    d[3:14,]=d[3:14,]/100
    d=d[,!is.na(d[1,])]
    yr[2]=min(yr[2],max(d[2,]))
    ny=diff(yr)+1
    dm=c(I$nx,ny)
    ic=d[1,]+(d[2,]-yr[1])*I$nx
    for(i in 1:12){
    x=rep(NA,I$nx*ny)
    x[ic]=d[i+2,]
    dim(x)=dm
    putx(x,i,"u")
    }
    }

  • seidel(w,x,G)

    The core routine for FitGlobal(). It takes the temperature data x, weights w, and a previous estimate of G (can be zero), and iterates finding L and G to convergence (usually quick). In the process, it fixes the mean of G in years 1961-90 to zero, for each month.

    function(w,x,G){ 
    ox=!is.na(x)
    wx=w*x
    wx[!ox]=0
    w1=rS(ox);
    iw=w1<=0;
    w[iw,]=0; wx[iw,]=0; w1[iw]=1e-9
    w2=colSums(w)+1e-9; d=dim(w)
    wx1=RowSums(x); wx2=colSums(wx);
    for(i in 1:20){ # gauss-seidel iterations
    g=G;
    L=c(wx1-ox%*%G)/w1 # Solve for L
    G=c(wx2-L%*%w)/w2 # then G
    Gm=mean(G[62:91]); G=G-Gm; # Set to basis 1961:90
    gm=mean(abs(G-g));
    if(is.na(gm))browser()
    if(gm<0.001)break
    }
    L[iw]=NA
    c(L,G)
    }

  • getGL(K,d)

    Not really that useful now. It tries to return a prior estimate of L and G to start the seidel loop; if none, then it returns an appropriate matrix of zeroes.

    function(K,d){ 
    a=c("L","G")
    for(i in 1:2){
    s=a[i]
    g=K[[s]];
    if(is.null(g))g=diag(12)*0
    g=pad(g,c(12,d[i]),pd=0)
    g[is.na(g)]=0
    K[[s]]=g
    }
    K
    }

  • convertH(w,H)

    Takes a vector of weights (length ns) (for 1 month) w and a matrix H (ns*nh) of spherical harmonics, and returns the weights that result from fitting the SH. Uses solve(); does an exact fit for each step, which is rather slow.

    function(w,H){# converts w to version with H fitted and rempved 
    d=dim(w); v=w;e=NULL
    for(i in 1:d[2]){
    if(sum(w[,i])==0)next
    wH=w[,i]*H
    HwH=t(H)%*%wH
    if(i%%20==1){
    q=eigen(HwH)$val
    e=c(e,min(q)/max(q))
    }
    u=solve(HwH,1:ncol(H)==1)
    v[,i]=wH%*%u # new wts
    }
    list(w=v, e=e)
    }

  • convertoH(w,H,HL)

    Same function as convertH, but doesnt do an exact fit every month. Instead a regression matrix is set for every decade, so the SH fit that is subtracted is approximate. However, the residual is integrated as well, so the result is very good, and significantly faster. Currently the version used.

    function(w,H,HL){# converts w to version with H fitted and rempved 
    d=dim(w); v=w; pi4=pi*4
    if(is.null(HL)){
    HL=list()
    for(i in 1:(d[2]/10)){
    wH=w[,i*10-5]*H
    HwH=t(H)%*%wH
    Hi=solve(HwH)
    HL[[i]]=H%*%Hi
    }
    pt("HL done")
    }
    for(i in 1:d[2]){
    if(sum(w[,i])==0)next
    if(i%%10==1)HHi=HL[[i%/%10+1]]
    b=w[,i]*HHi; wh=drop(w[,i]%*%H)
    v[,i]=w[,i]+pi4*b[,1]-b%*%wh # new wts
    }
    list(w=v, HL=HL)
    }

  • convertcgH(w,HH)

    Takes a vector of weights (length ns) (for 1 month) w and a matrix H ns*nh of spherical harmonics, and returns the weights that result from fitting the SH. Uses conjugate gradient pcg(). Can be faster than convertH. These alternative routines exist because the SH fitting can take several minutes for higher order (parameter 10 or higher).

    function(w,HH){# converts w to version with H fitted and removed 
    d=dim(w); w=matrix(c(w),nrow=d[1])
    v=w*0
    np=ncol(HH)
    p=diag(np)*2
    A=list()
    A$f=function(x){
    t(H)%*%(A$w*(H%*%x))
    }
    A$b=function(x){A$p%*%x}
    iu=1:np==1
    for(i in 1:d[2]){
    o=w[,i]>0; H=HH[o,]
    A$w=w[o,i]/sum(w[o,i])
    if(i%%12==1)A$p=solve(t(H)%*%(A$w*H))
    v[o,i]=(H%*%pcg(A,iu,iu))*A$w
    }
    dim(v)=d
    v # new wts
    }

  • wtM(oJ,Cx)

    The weights routine for mesh, called by DeriveWeights(). oJ, length I$ns, is logical for stations reporting in that month. It first calls convhulln (package geometry) to return the mesh m (three cols, pointing into I$iv). It then calculates the triangles areas A, and then sums these for each node y, using the sparse op mulv(). These are the weights.

    function(oJ,Cx){ # need z,X makes mesh wts 
    y=I$z[oJ,]
    m=convhulln(y) # convex hull (tri mesh)
    A=abs(area(m,y))
    i3=rep(1:nrow(m),each=3)
    i=cbind(c(t(m)),i3) #make index for A
    wt=spmul(index=i,NULL,b=A)+1e-9 # rowsums
    #print(c(nrow(m),round(sum(wt)/pi/.0024),round(sum(A)/pi/.0008)))
    wt
    } # end wtM

  • wtN(oJ,Cx)

    The weights function for uniform weighting, just returns a constant value normalised so sum is 1 for any month.

    function(oJ,Cx){oJ[oJ]+0} # uniform weighting 

  • nearest(z1,z2,K=100)

    Finds K nearest points in z2 (N2x3) to z1 (N1x3). Both are sets of points on the unit sphere, though that is not essential. The list returned has 2 items with :
    $num - the K numbers for each z1 pointing to the closest z2
    $dist - the K distances in km corresponding to those numbers (straight line)
    There is also a brief $doc
    Since it is a function only of the inventory, it is run at renewal time, and the results stored. The routine is also run if a node does not have 20 nearest points from the initial 200 stored. Rare.

    function(z1,z2,K=100){#Finds K nearest points in z2 (N2x3@,100) to z1 
    doc="Finds K nearest points in z2 (N2x3@,100) to z1, all on unit sphere"
    z1=rbind(z1)
    N1=nrow(z1);N2=nrow(z2)
    g=e=matrix(NA,N1,K);
    for(i in seq(1,N1,100)){
    k=0:min(99,N1-i)
    j=i+k
    h=1-z2%*%drop(t(z1[j,]))
    d=t(apply(h,2,order)[1:K,])
    e[j,]=d;
    g[j,]=h[d+k*N2]
    #if(i%%1000==1)print(paste(i,format(Sys.time())))
    }
    list(num=e,dist=round(sqrt(g)*18100),doc=doc) # g_> km
    }

  • makewt(L,or)

    For LOESS method wtL(). Takes the eg list from nearest, for the subset or of z2, and cuts down to 20 nearest for each z1, of the z2 that report in that month.
    It returns an array k with 3 columns, and a row for each z1-z2 pair in the reduced list. The first is the pointer to z1, second to z2, and the third is a list of weights made from exp(-dist/1200), with dist in km.

    function(L,or){ # Makes sparse matrix of distance wts 
    M2=L$M2
    en=t(L$eg$n); d=dim(en)
    k=cbind(col(en),en,t(L$eg$dist))
    dim(k)=c(d,3)
    no=or[en]
    dim(no)=d
    op=ColSums(no)<=M2
    io=which(op)
    if(sum(op)>0){
    eo=nearest(L$z1[op,],L$z2[or,],M2+1)
    k[0:M2+1,io,2]=which(or)[t(eo$n)]
    k[0:M2+1,io,3]=t(eo$dist)
    }
    k=matrix(k,ncol=3)
    k=k[or[k[,2]],]
    i=c(1,diff(k[,1]))
    m=which(i>0)
    j=i*0;j[m]=1;j[m+M2]=-1
    k=k[cumsum(j)>0,]
    k[,3]=exp(-k[,3]/1000)
    k
    }

  • HHa(H)

    A utility used in wtL() for forming a Nxnh^3 array of outer products of the Nxnh matrix H (prior to a sparse op.

    function(H){ #HwH all nodes 
    s=array(0,c(nrow(H),9))
    k=ncol(H)
    for(i in 1:k)for(j in 1:k){
    ki=i+j*k-k
    if(i<=j){s[,ki]=H[,i]*H[,j]}
    else{s[,ki]=s[,j+i*k-k]}
    }
    s
    }

  • wtL(oJ,L)

    The first order LOESS weight fn calculator. oJ is the logical vec of stations reporting. z1 is the set of nodes, precomputed for the icosahedron in DeriveWeights(). z2 is the set of stations in 3D on unit sphere. First makewt() produces a 3-col array w of weights from stations reporting. See makewt() for details; w[,3] are the wts. Then the array of outer products HH of station coords z2, which was calculated in DeriveWeights() using HHa(), is reduced with mulv() to a vector of 3x3 matrices on nodes z1 HHp. The regressor variables are actually the coordinates z1, so these are adjoined to make an array to which the R function apply() can be used. The function loreg() actually does the local regression for each z1 (via apply). The resulting set needs to be sparse multiplied by the transpose of w to get rt and in a final step this is multiplied by the coords z2 of stations to get the weights. See here for the algebra of this process in index notation.

    function(oJ,L){ # first order regression 
    # Currently out of use because negative wts
    N1=nrow(L$z1); N2=nrow(L$z2)
    w=makewt(L,oJ)
    loreg=function(r){
    solve(r[,1:3],r[,4])
    }
    #step 5 Make HwH
    HHp=spmul(index=w[,1:2],A=w[,3],b=L$HH) # reduce over p => N2*9
    #Step 7 Adjoin z1
    Hpz=array(cbind(HHp,L$z1),c(N1,3,4))
    #step 8 regress
    rp=t(apply(Hpz,1,loreg)) # N1*3 - nodal values
    rt=spmul(index=w[,2:1],A=w[,3],b=rp) # N2*3
    rt=pad(rt,c(N2,3),pd=0)
    wts=RowSums(rt*L$z2)
    wts[oJ]
    }

  • wtK(oJ,Cx)

    This is the zero order version of LOESS. It is not routinely called from TempLS, but makes wtL() easier to understand. The regression denominator is just a single number for each node, but still formed by sparse multiplication. And then the formation of wts is a transpose sparse multiplication.

    function(oJ,Cx){ # zero order regression 
    N1=nrow(z1); N2=nrow(z2)
    noquote(paste("N1 ",N1,N2,sum(oJ)))
    w=makewt(eg,oJ)
    #step 5 Make HwH
    denom=spmul(w[,1:2],w[,3],NULL)
    wts=spmul(w[,2:1],w[,3],1/denom)
    wts[oJ]
    }

  • diffuse(o,Cx)

    Diffuse is the routine used in wtI() for soving the diffusion equation to infill empty cells with local information. wtI and wtG use the improved cubed sphere grid; this was set up in DeriveWeights() (ok$i) with a list of associated functions and data. So the first line here just calculated the cell areas, and for wtG that is all that is needed. For wtI(), the neighbors of each cell are listed, and in a loop over i, weight held by each empty cell is transferred to its neighbor. Cells with data keep the weight they receive, so eventually the empty cells have no weight, but the near cells with dtaa have the extra weight. This is the conjugate diffusion process to the diffusion of temperature.

    function(o,Cx){ # solves diffusion eq for empty cells 
    w=s_c(c_k(Cx$Kf),area=1) # area wts
    if(Cx$og)return(w)
    jk=Cx$Knb[,o] # points to neighbors
    i=cbind(c(t(jk)),rep(Cx$Kf[o],4))
    m=match(i,Cx$Kf)
    dim(m)=dim(i)
    mm=getu(m[,1]) # get list of unique destinations
    ml=length(mm)
    for(i in 1:92){
    ww=w
    for(im in mm){
    wh=w[m[im,2]]/4
    ww[m[im,1]]=ww[m[im,1]]+wh
    }
    ww[o]=ww[o]-w[o]
    we=sum(ww[o])
    #if(i%%4==0)print(round(we,4))
    w=ww
    if(we<1e-2)break;
    }
    w
    }

  • wtG(oJ,Cx)

    Simple grid on a cubed sphere; the mechanics are handled by wtI.

    function(oJ,Cx)wtI(oJ,Cx) 

  • wtI(oJ,Cx)

    C is a list make by the cubemake() function called in DeriveWeights(). It includes data and functions, and a doc. This fn sorts out the cells with data and the count (iu), calls diffuse, and divides by the total

    function(oJ,Cx){ 
    kf=Cx$kF
    ks=kf[which(oJ)] # cell score for each node with data
    ik=match(ks,Cx$Kf)
    iu=tabl(ik)
    ok=Cx$Kf<0;ok[ik]=T;
    # branch
    # integrate
    cw=diffuse(!ok,Cx)
    io=ok; io[ok]=iu$t; wy=cw[ik]/io[ik]
    wy;
    }

    Alphabetical

    convertcgH()
    convertH()
    convertoH()
    DeriveWeights()
    diffuse()
    dozen()
    FitGlobal()
    fixK()
    fx()
    getGL()
    getput()
    getsst()
    getw()
    getx()
    HHa()
    makewt()
    nearest()
    newinv()
    putw()
    putx()
    saveR()
    seidel()
    SortLandData()
    SortSSTData()
    todate()
    txt2dat()
    wtG()
    wtI()
    wtK()
    wtL()
    wtM()
    wtN()








  • 0 comments:

    Post a Comment