moyhu: Tips on File Download

Sunday, April 4, 2010

Tips on File Download

I see there have been some problems in getting files down and running, so here are some suggestions that may help.

Text version of download

To download direct from the repository page, the best way may be cut and paste - other ways may lead to HTML issues. But best may be to use the document repository. I'd suggest just downloading the All-R.zip - it's very small. The zipfile has a directory structure that collects files related to blog posts. If you download All-Files.zip you'll get the same directory structure, but with results and anything else that seemed relevant.

Input files for R

Here I haven't been systematic enough. Most R files call input files with a scan() or a read.fwf(). I've sometimes referenced the files with aspects of my directory structure, whereas of course they should reflect your structure.

I'll try to make sure to reference the input files as if in the current directory. But I'll also put at the start of the program a set of commands saying

loc="path/file"

These will list the input files. You should insert your own paths and filenames (if different)

Update I'll extend this idea to putting a section near the top where all the variables (including input files) )that you might want to change are initialised. Instead of loc=, for input files I'll write "input1 = " etc

Text conventions for repository

I have tried to separate R code from my text with font changes (italic). But I'll try to improve by marking the beginning and end with comment lines
######## Start of prog.R ###########

######## End of prog.R ###########

The name is just a suggestion 0 if you down load from there you'll have to supply a name, which can be anything you like. But it probably corresponds to the name in the doc drop.

Readme

I'll try to put readme files in the doc repository and the zip files. These should describe where the files belong.

General

The blog post should tell you where to find the input files needed. GHCN files for example come from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/

Rememner to uncompress files before passing to R.

Other

I've heard of trouble in lodging comments. The settings, as I understand them, should provide no restrictions. Let me know if you can't get through.

Blogspot makes pesky issues for commenting without identity. You won't be able to paste, for example. But if you use any identity (Google, Typekey, Wordpress etc) it's much better. This may also help if comments aren't being accepted.

If all else fails
My email name is anstokes, and you can find me at westnet + com + au

69 comments:

CarrickApril 5, 2010 at 6:42 AM
You might try putting the files in a zip or tarball archive. And include in the tarball a Readme.txt file that gives instructions on what order to do what.
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 6:46 AM
Carrick - yes, I've done that (this is new) - see "Text Downloads" above.
ReplyDelete
Replies
CarrickApril 5, 2010 at 8:50 AM
Thanks, Nick. I was just trying to save you as well as some of the less experienced readers some trouble.
ReplyDelete
Replies
UnknownApril 5, 2010 at 10:26 AM
Nick

Running code line by line from after #PROGRAM STARTS HERE#

First error shows up at:
con<-file(loc,"r");
ERROR: Cannot open the connection

By the way how do you use the utilities. Do they only show up if you are using rconsole or the cammand line?
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 11:00 AM
Sorry about that Steve, I think you must have downloaded at a critical moment while i was editing. The latest version says
con<-file("input1","r")
and that should work.

The utilities are used internally (further down in the code), although once you've sourced the file, you can use them at console level too.

Ver V1.2 is coming, which allows you to specify at top of file level what trends you want calculated, what year ranges plotted etc. I'll be using that convention of putting the vars the user might change at the top.
ReplyDelete
Replies
stevenApril 5, 2010 at 2:11 PM
Hi Nick,

I'm Struggling a Bit with some of the Code.. It's definately my learning curve on R. If you can help a bit with this code I would appreciate it:

My Question are in the code:

wd = c(3,5,3,31,7, 8,5,5,1,5, 2,2,2,2,1,2,16,1 );
colnames=c("country","wmo","mod","name", "lat", "lon","alt","ialt","pop","ipop",
"topo","sveg","loc","iloc","airpt", "town", "veg","urban");
tv = read.fwf(input2,widths = wd,col.names=colnames,comment.char="");
} #FF

## Questions "tv" is now a data.frame is that correct or a table?

###### Allocate stations to grid cells
nt = length(tv[,1]);
# is this length the number of stations in the *inv?
# the [,1] construction is opaque to me
yr0 = 1880; yr1 = 2009; yrn = rep(yr0,nt);
di=c(12,yr1 - yr0); hs=1:2;

wtt = sin((1:36-0.5)*pi/36); wt = rep(wtt,each=72) # 36x72 area weight;
d = floor(tv[,5:6]/5); # station lat and long;
# not sure exactly what this is doing?
cellnums = unlist(d[1]*36+d[2]+1333); # 5 deg x 5 deg cells numbered 1 to 72*36=2592;

# Define choice names (make your own)
title = c("Globe","NH","SH","ConUS","Airport","Rural","Urban");
choices = 1:7 #c(4,5,6,7);
restext=paste(c("Output",format(Sys.Date(),"%F:%k:%M"),title[choices],".txt"),collapse="_");
pr(restext); ic=numeric(0); results=ic;

for(choice in choices){

i01 = 1:nt; i10=rep(NA,nt); orn=rep(F,nt);
## not clear what this is doing
## i01 is a list of 1 to the number of stations
## i10 gets filled with nt repetitions of NA
## orn gets filled with nt intances of F
tc = title[choice]; pr(" ");
pr(paste(tc," time (s)",timer(time0)));
# Here you select the stations to analyse. tv is the table from the .inv file - see colnames above
if(tc == "NH") i01 = i01[cellnums>1296];
if(tc == "SH") i01 = i01[cellnums<1297];
if(tc == "ConUS") i01 = i01[tv$country == 425 && tv$lat<51 && tv$lon > -130];
if(tc == "Airport") i01 = i01[tv$airpt == "A"];
if(tc == "Rural") i01 = i01[tv$urban == "A"];
if(tc == "Urban") i01 = i01[tv$urban != "A"];
# what is the code above doing? what is being written in i01?

orn[i01] = T; nk=length(i01);
i10[i01] = 1:nk;

Any Help is appreciated

# Not sure how this is working.. is this masking?
ReplyDelete
Replies
CarrickApril 5, 2010 at 2:22 PM
Nick, I'm getting an error while running GlobTempLSv1.1.r

I'm including the output below. Any ideas?
-----------
Read 6179124 items
[1] "Output_2010-04-04: 0:00_Globe_NH_SH_ConUS_Airport_Rural_Urban_.txt"
[1] " "
[1] "Globe time (s) 8"
Error in x[ii[ok], j] = y[ok] : subscript out of bounds
ReplyDelete
Replies
stevenApril 5, 2010 at 5:06 PM
I think I got it figured out, the masking is kinda neat, More later.
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 5:16 PM
Hi Steve,
Glad it makes more sense. I had been working on a detailed response - here it is (italics mine):

wd = c(3,5,3,31,7, 8,5,5,1,5, 2,2,2,2,1,2,16,1 );
colnames=c("country","wmo","mod","name", "lat", "lon","alt","ialt","pop","ipop",
"topo","sveg","loc","iloc","airpt", "town", "veg","urban");
tv = read.fwf(input2,widths = wd,col.names=colnames,comment.char="");
} #FF

## Questions "tv" is now a data.frame is that correct or a table?
I believe it's a data frame

###### Allocate stations to grid cells
nt = length(tv[,1]);
# is this length the number of stations in the *inv?
# the [,1] construction is opaque to me
Yes. tv[,] also can be treated as a matrix, and in R you can just fix any variable, and what you yave is the array with everything else varying. So tv[,1] is the first column, and it's length is the number of rows, or stations.

yr0 = 1880; yr1 = 2009; yrn = rep(yr0,nt);
di=c(12,yr1 - yr0); hs=1:2;

wtt = sin((1:36-0.5)*pi/36); wt = rep(wtt,each=72) # 36x72 area weight;
d = floor(tv[,5:6]/5); # station lat and long;
# not sure exactly what this is doing?
cols 5 and 6 are lat and long in degrees. Divide by 5 for 5x5 cells, then floor() converts to (lower) integer
cellnums = unlist(d[1]*36+d[2]+1333); # 5 deg x 5 deg cells numbered 1 to 72*36=2592;
This then places the station in a list of gridcells from 1 (~S Pole) to 2592. 1:1296 is SH

# Define choice names (make your own)
title = c("Globe","NH","SH","ConUS","Airport","Rural","Urban");
choices = 1:7 #c(4,5,6,7);
restext=paste(c("Output",format(Sys.Date(),"%F:%k:%M"),title[choices],".txt"),collapse="_");
pr(restext); ic=numeric(0); results=ic;

for(choice in choices){

i01 = 1:nt; i10=rep(NA,nt); orn=rep(F,nt);
## not clear what this is doing
## i01 is a list of 1 to the number of stations
Yes. The point is that you use a logic test to select a subset, which is then a numerical list.
To explain the odd notation, we have a mixture of vectors, some length 7280 (station list) and some the length of the subset. I think of the first as type 0, and the second as type 1. i01 is a permutation vec that takes you from type 1 to 0, and i10 takes you back.
## i10 gets filled with nt repetitions of NA
## orn gets filled with nt intances of F
Yes, this is a typical way of initialising vectors. You often start with NA with the idea that some code should fill the vector, and so of any are not filled (NA) you want to kinow.
tc = title[choice]; pr(" ");
pr(paste(tc," time (s)",timer(time0)));
# Here you select the stations to analyse. tv is the table from the .inv file - see colnames above
if(tc == "NH") i01 = i01[cellnums>1296];
if(tc == "SH") i01 = i01[cellnums<1297];
if(tc == "ConUS") i01 = i01[tv$country == 425 && tv$lat<51 && tv$lon > -130];
if(tc == "Airport") i01 = i01[tv$airpt == "A"];
if(tc == "Rural") i01 = i01[tv$urban == "A"];
if(tc == "Urban") i01 = i01[tv$urban != "A"];
# what is the code above doing? what is being written in i01?
The subset vector. eg for SH, the 1648 stations there are now listed by number in the inv file. And the length, nk, is 1648

orn[i01] = T; nk=length(i01);
i10[i01] = 1:nk;
And this is how you form a reverse mapping. i10 has length 7280, but (for SH) only 1648 slots are filled. That's why the NA initialisation.
Any Help is appreciated
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 5:19 PM
Carrick, I'll check on that
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 5:38 PM
Carrick, while I'm figuring that out, here's an explanation of the partial output you saw. The full set, as it appears on the diary.txt file, is:

Output_2010-04-03: 0:00_Globe_NH_SH_ConUS_Airport_Rural_Urban_.txt

Globe time (s) 8
Number stations 7280 time 29
Now solve design matrix time 38
Trend 0.0760 0.2717 time (s) 90

NH time (s) 90
Number stations 6056 time 111
Now solve design matrix time 118
Trend 0.08242 0.31446 time (s) 158

SH time (s) 158
Number stations 1224 time 161
Now solve design matrix time 162
Trend 0.06903 0.11060 time (s) 170

ConUS time (s) 170
Number stations 0 time 171

Airport time (s) 171
Number stations 2390 time 180
Now solve design matrix time 183
Trend 0.05965 0.26280 time (s) 198

Rural time (s) 198
Number stations 2712 time 203
Now solve design matrix time 206
Trend 0.06061 0.25608 time (s) 223

Urban time (s) 223
Number stations 4568 time 251
Now solve design matrix time 255
Trend 0.07525 0.26313 time (s) 284

On the screen, the same info looks a bit different.

The diary file is a cumulative record of runs. The name of the output file is quite informative, so is used as the header. Then the "choices" - 4 lines each.

First is echoed the choice it is currently working on. The times refer to secs since the program started.
Then the number of stations that it found (my CONUS test was faulty). Then it reports that it is at the SS fitting stage, then it finally reports the trend. Here this is hard-coded to 1990- and 1978-, but in ver 1,2 you'll be able to choose your own range (easily).

After each stage you should see 2 jpeg files with names relating to the title vector.
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 6:00 PM
Oops - I said above
The latest version says con<-file("input1","r")
It should, of course, be input1 without quotes, as it is indeed in the current version. I think I just made the mistake in the comment - I don't think it is in any code.
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 6:33 PM
Error in x[ii[ok], j] = y[ok] : subscript out of bounds
Carrick, I may need your help here. This is actually the line which fills one year of x - the big data array, with data from 1 line of v2mean.txt. j just refers to the year (yr-yr0) and I don't think it should go wrong. So x is probably dealing with a station number higher than what is listed in the v2.temperature.inv that you have.

I'll have to look more into what might happen if the .inv file is out of synch - ie v2.mean references more stations than appear in the .inv file.

But anyway, if you could tell me what is dim(x) (2 nums) and what is ii[ok] (max 12 numbers) and j, that would help.
Thanks, Nick
ReplyDelete
Replies
Nick StokesApril 5, 2010 at 8:39 PM
Carrick,
I found a bug in ver 1.1, which I think may have been responsible for your problem. I've updated the previous post, and will relabel fixed codes as v1.11 (V1.2 coming soon). But you can fix it yourself. On line 87,

if( j>0 && orn[ii[1]] ){
should be simply
if( j>0 ){
ReplyDelete
Replies
stevenApril 6, 2010 at 2:07 AM
Thanks Nick, I posted something over to Lucia's. The whole data structure thing in R is like chinese to me ( ha it was like that with sas
ages ago. Anyways, On my side of things I was trying to build some simple functions that did something like this:

List_Of_Stations<-function(( Inventory_data_structure, Choices){

# where the inventory data structure is like your version of tv.
a data.frame read using fwf..
# where Choices is an slection criteria for a specfic
# column in Tv ( like country code = 403 )

And the the return would just be the list of station ids..
or array of ids..

Anyways, I've been fooling aruond with that mostly to learn R
and not to even look at temperature.. ha.

I'm even wondering if you can pass in grep..

Your thoughts appreciated.

I'm putting together an entire enviroment. Basically a single file
to download and then build up a directory structure, populate it with the files you need, abstract out some of 'which file where' problems that some guys like steven WH had. Also looking for a way to make the code more re usable. Which just means writing functions.

Ideally, the goal would be for somebody to be able to call your analysis as a routine and call ROMIN as a routine or function.

But I'm having fun struggling. A mac laptop is no place to write code on, especially after a 20 year hiatus
ReplyDelete
Replies
stevenApril 6, 2010 at 4:46 AM
Hi Got the same bug
[1] "Output_2010-04-05:11:43_Global_ConUS_Cut_Kept_BC_Africa_.txt"
[1] " "
[1] "Global time (s) 12"
Error in x[ii[ok], j] = y[ok] : subscript out of bounds

Will try your fix..

Its unclear to me if I have to set choice for [choice in choices]

where and how.. if at all. just trying to run what you sent
ReplyDelete
Replies
stevenApril 6, 2010 at 5:01 AM
Dump: value of j.

Running the file Fresh out of the box with no changes whatsoever.

[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
Error in x[ii[ok], j] = y[ok] : subscript out of bounds
ReplyDelete
Replies
stevenApril 6, 2010 at 5:13 AM
Ok...

I printed out dim(x)

j, ii[ok]

First entry to routine dumps this:

[1] " DIM of x added by steve"
[1] 87360 129
[1] 7281 29121 36401 50961 58241 65521 72801 80081
[1] 51
[1] 1 7281 21841 29121 43681 58241
[1] 52
[1] 29121 65521 72801 80081
[1] 53
[1] 1 7281 14561 21841 29121 43681 50961 58241 65521 72801 80081
[1] 54
[1] 1 7281 14561 21841 36401 58241 72801 80081
[1] 55
[1] 1 7281 14561 21841 36401 43681 50961 58241 65521 72801 80081
[1] 56
[1] 1 7281 14561 21841 36401 43681 50961 58241 65521 72801 80081
[1] 57
[1] 1 7281 14561 21841 29121 36401 43681 50961 58241 65521 72801 80081
[1] 58
[1] 1 7281 14561 21841 29121 36401 43681 50961 58241 65521 72801 80081
[1] 59
[1] 1 7281 14561 21841 29121 36401 43681 50961 58241 65521 72801 80081
[1] 60
[1] 1 14561 21841
ReplyDelete
Replies
stevenApril 6, 2010 at 5:15 AM
And here is the dump right before the Crash

here, j followed by ii[ok]

[1] 120
[1] 2 7282 14562 21842 29122 36402 43682 50962 65522 72802 80082
[1] 121
[1] 2 7282 14562 21842 29122 36402 58242 65522 72802 80082
[1] 122
[1] 2 7282 14562 21842 29122 36402 50962 65522 72802 80082
[1] 123
[1] 2 7282 14562 21842 29122 36402 50962 58242 65522 72802 80082
[1] 124
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 125
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 126
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 127
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 128
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 129
Error in x[ii[ok], j] = y[ok] : subscript out of bounds
>
ReplyDelete
Replies
stevenApril 6, 2010 at 5:20 AM
Switching the print to before the offending line

Found it!

[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 126
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 127
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 128
[1] 2 7282 14562 21842 29122 36402 43682 50962 58242 65522 72802 80082
[1] 129
[1] 2 7282 14562
[1] 130
Error in x[ii[ok], j] = y[ok] : subscript out of bounds

Looks like j goes to 130...for an array of 129
ReplyDelete
Replies
stevenApril 6, 2010 at 5:25 AM
I'm assuming that j is the number of years.. so from a start date
of 1880 130 years would put you past the end date.

I'll test by changing start date
ReplyDelete
Replies
stevenApril 6, 2010 at 5:27 AM
Still fails.

When I set start date to 1890 (109 years to the end date of 2009,
the value of j got to 110 and failed
ReplyDelete
Replies
stevenApril 6, 2010 at 5:37 AM
sorry I meant 1900 above.

I added a check on j to not exceed the total number of years
( J>0 && j<(yr1-yr0))
seems to be crunching.. hope that check makes sense at that point in the code.
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 6:54 AM
Thanks, Steve, that gives me a lot to work with. Looks like it's overrunning by a year. I don't know why it doesn't happen with me, but memory overflows are variable with R. It tends to just let them happen, and the consequences depend on how the local implementation arranged things.
ReplyDelete
Replies
stevenApril 6, 2010 at 7:18 AM
Hmm

It's running on and on.. the global chart got output

last year was a drop to zero..

Looking at your year start and year end dates file.. I have files that
go through 2010?

Anyways, maybe its just slow on the MAC, I'll check back in a few hours
tell me if there is anything you want run
ReplyDelete
Replies
stevenApril 6, 2010 at 7:34 AM
Walking through your code has really helped me understand R,

At times I was wondering if the fricking language was broken and not me.

I'll be back in 6 hours or so to see if it completed.. the first file global got done fast...

I think I just should have changed the choices list... and done them one at a time, but I didnt want to change anything till I ran..

I'll be back. and thanks
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 8:34 AM
Steven, thanks for that. I think I've worked out what happened. I set yr1 to 2009 which was OK for me, because I was using a v2.mean from January that didn't have any 2010 data. Your fix is the right idea -
if( j>0 && j <= dix[2]){
is more direct.
Another better remedy may be to just set yr1 to 2010 - it's one of the "user choice" variables. However, it should be possible to choose an end year short of what's in the data, so I'll need to fix that properly.

I'm surprised that it's running slowly for you - I'd be interested in the final time that it prints out. It may be a memory issue, but I've only got 1 Gb.

I do have ConUS results, which I'll post soon. The trends were:
1900-2009 0.04348 1978:2009 0.2652
It shows 1934 as hot, but not topping 1998.
ReplyDelete
Replies
UnknownApril 6, 2010 at 1:03 PM
OK guys I have been driving Nick crazy for 2 days so now it is your turn.

I cannot even get to 1st base here. ie I cannot successfully run preprocessv1.2.r.

When I use R commander I set the working directory as C:/R/GHCN. That is where I have v2.mean.

I have modified preprocessv1.2.r., called it preprocessv1.2a.r. I have commented out input1="v2.mean"; and #con<-file(input1,"r"); and put in:
time0=timer(0);
pr(c("Preprocessing v2.mean ",date()));
loc1="tmp3712.txt"; # temp file will appear in your dir
file.choose() # this line
#con<-file(input1,"r");

The script does go to GHCN and I can choose v2.mean. Nothing seems to happen and at the message line I get
ERROR:missing value where TRUE/FALSE Needed
and
ERROR:object 'jk' not found.

This is consistent with what happens when I just run the script normally. (Most of the time!!)

In Rcmdr Output window this line appears:
> file.choose()
[1] "C:\\R\\GHCN\\v2.mean" notice the double //

Now if I run my modified script in Rconsole as
> source("C:/R/GHCN/preprocessorv1.2a.r")

I get this:

[1] "Preprocessing v2.mean Mon Apr 05 19:58:58 2010"
[1] "Times 0 0 0"
Error in if (time0 > 0) { : missing value where TRUE/FALSE needed

And everything stops:

Help !!
ReplyDelete
Replies
UnknownApril 6, 2010 at 1:06 PM
Nick changed the name!!

I am using preprocessorv1.2.r modified by me to
preprocessorv1.2a.r
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 2:00 PM
Steve,
I changed the name of GlobTempLSv1.2,r to GlobTempLSv1.2.1.r because I;d fixed a bug - it's so people can know that the bug has been fixed. But only one line has changed.

It's possible that there are linux/windows problems that are causing you trouble with directory names. That's another reason to put everything in one dir so you don't have to worry about dir prefixes at all.
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 2:25 PM
Steve,
I've uploaded a zipfile (trythis.zip) which has all that you need - preprocessed data and code, to run the second stage. Anyone else has having difficulty might like to try it too. It doesn't needed anything from any other directory. You should be able to just extract to an empty directory, start R and type source("GlobTempLSv1.2.1.r"), and it should run.

I've emailed you the zipfile too.
ReplyDelete
Replies
stevenApril 6, 2010 at 2:31 PM
Nick,

It hung after doing the global chart. The global temp went to zero in the last year.. So maybe it ran off and started to excute garbage.

I just got in ( soory basketball )

I'll try your fix and report back

Steve WH.

Download the Zip. Unzip it. Make sure you have everything in the same
folder.. processor 2 and the other program.

Copy in the *.inv file and the v2.mean file.

Start your R, select source And run.

You should have NO file not found issues as Nick codes doesnt do any fiddling with the directory.

If you are working on a MAC with R go to the misc pull down.

Slect " change working directory. Then click on the folder that contains
the Source files and the data files. Again should be NO file issues.

If you have file issues just post your directory path down to the v2mean file and I'll tell you what to do.
ReplyDelete
Replies
stevenApril 6, 2010 at 2:55 PM
Nick on the Conus issue I'm more interested inhow it compares to GISS who use slightly different data for CoNus.

That gets us closer to an apples to apples look at the various methods
except for CRU, but I think Zeke can be used as a good approximation to CRU if he fiddles a bit with his base period. Joseph at residual ..interesting approach to the sampling issue so his can be in the mix and then jeffid as well.
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 3:24 PM
Steven, about the zeroes, I've put up a file on the repository which shows how you can make fixes without downloading a new file. The latest bug produces those zeroes, and I'll fix it in the next version. But here's the story:

One of the great things about R coding is that with a little knowledge you can update for bug fixes etc yourself, and keep your own modifications. So I'll try to post simple fixes here.

V1.1 had a problem with line 88
if( j>0 && orn[ii[1]] ){
This caused inaccuracy with subsets, especially small ones like SH, Replace in V1.1.1 with
if( j>0 ){

V1.2 had a problem with this same line, now #109. In v2.1.1 it is replaced by
if( j>0 && j<=dix[2] ){

Another non-critical bug arises in v1.2.1 with subsets that have missing years - ie no data for the whole region. Values had been set to zero which goes wrong with plots and trends. After L 171:
G = solve(S,b2); G = G - mean(G); # Global temp;
add
G[r2==0]=NA;
ReplyDelete
Replies
stevenApril 6, 2010 at 4:32 PM
Ha failed on find inv

forgot to change the flag back.
ReplyDelete
Replies
stevenApril 6, 2010 at 4:36 PM
[1] "Output_2010-04-05:23:33_Global_ConUS_Cut_Kept_BC_Africa_.txt"
[1] " "
[1] "Global time (s) 19"
[1] "Number stations 7280 time 39"
[1] "Now solve design matrix time 45"
Error in jpeg(paste(title[choice], ky[1], ".jpeg", collapse = "")) :
X11 module cannot be loaded
In addition: Warning message:
In jpeg(paste(title[choice], ky[1], ".jpeg", collapse = "")) :
unable to load shared library '/Library/Frameworks/R.framework/Resources/modules/i386/R_X11.so':
dlopen(/Library/Frameworks/R.framework/Resources/modules/i386/R_X11.so, 6): Library not loaded: /usr/X11/lib/libpng12.0.dylib
Referenced from: /Library/Frameworks/R.framework/Resources/modules/i386/R_X11.so
Reason: Incompatible library version: R_X11.so requires version 36.0.0 or later, but libpng12.0.dylib provides version 25.0.0
ReplyDelete
Replies
stevenApril 6, 2010 at 4:37 PM
argg maybe I should do my mac updates
ReplyDelete
Replies
stevenApril 6, 2010 at 4:46 PM
commented the plots out......got this

[1] "Output_2010-04-05:23:42_Global_ConUS_Cut_Kept_BC_Africa_.txt"
[1] " "
[1] "Global time (s) 0"
[1] "Number stations 7280 time 17"
[1] "Now solve design matrix time 22"
Error in model.frame.default(formula = G[kt] ~ kt, drop.unused.levels = TRUE) :
variable lengths differ (found for 'kt')
>
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 5:11 PM
Steven,
That looks like the trend calculation near the end:
res1[i+1]=paste( signif(lm(G[kt] ~ kt)$coef[2]*10,4));

Obviously G[kt] and kt do have the same length - I think it's an arcane issue about the setting of na.action default. I've set mine so the default is na.omit. Anyway, changing to
res1[i+1]=paste( signif(lm(G[kt] ~ kt, na.action=na.omit)$coef[2]*10,4));
should work
ReplyDelete
Replies
Nick StokesApril 6, 2010 at 5:22 PM
Steven, following up on that, you may need to add the same option to lm() in the plotyrs() function near the top
h = lm(G[k]~k)

It reminds me of another fix I made The plot command objected to
lines(ky, h$fitted, col = "red");
when there are missing values, so I changed it to:

lines(ky[1:length(h$fitted)], h$fitted, col = "red");
ReplyDelete
Replies
UnknownApril 7, 2010 at 6:06 AM
OK

So I downloaded trythis.zip. Expanded it to a directory in my R directory called trythis (:]) (ie C:/R/Trythis); changed my working directory to trythis and ran it 1st in Rcmdr. Same problems.

Ran it in Rconsole with:

> source("GLobTempLSv1.2.1.r")

Got this readout

Read 6183730 items
Read 21840 items
[1] "Output_:12:54_Global_ConUS_Cut_Kept_BC_Africa_.txt"
[1] " "
Error in if (time0 > 0) { : missing value where TRUE/FALSE needed

This is the same stopping point as with preprocessov

So are any of you running this in a Windows operating system?

I kind of get the impression you are not - seems you are using Linux, a Mac or ??

This may be a windows operating system glitch.

I could dig up an old Linux OS, partition my drive and run R.

Notice this line is causing the program not to run:

Error in if (time0 > 0) { : missing value where TRUE/FALSE needed
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 6:52 AM
Steve,
You could comment out the line (about line 46) in the code that has that - it's just diagnostics. It's a puzzle, though, and as you say, may be Windows related. I could also try to get R running on a windows machine here.

You're actually a fair way into the code, so maybe it's some special issue with passing arguments.
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 6:57 AM
Steve,
Yes, I think commenting out should work. I think it's to do with the Sys.time() command, which is implementation dependent. I suspect that it's returning NULL, and next time around, the program says that time0 isn't there.

The only purpose of timer() is to print estimates of elapsed time, to see how you're going. You'll just get zeroes.
ReplyDelete
Replies
UnknownApril 7, 2010 at 9:24 AM
Bingo

That did it!! Ya

Now's the time to play.
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 9:33 AM
Great news, Steve - congratulations!
ReplyDelete
Replies
UnknownApril 7, 2010 at 10:47 AM
Nick

OK now the learning begins.

What does this mean?

if(T){ # Change this to F if you have read in already

Say I have run the script already. Do I change this to F (I assume this is false) to do a second run if I have not exited the program?

I have noticed before that the Cities, towns in BC are written as XXXXX,BC, XXXXX,B.C., XXXXX.B
XXXXX etc. Knowing that computers are powerful, but stupid I am thinking that the script is going to miss some of these towns.

Although, I have not checked, I sure the other Provinces are likewise abbreviated wrong.

Also some cities (mostly) are XXXXX. Canada (No province designation)

I am assuming that it will be necessary to correct these shortcomings to get all the Canadian cities properly into the database.

Guess I could put them in Excel, FIND/REPLACE and save them back as a txt file? I could then post the corrected Canadian Data

And, have you put in the option to ask for only records of say 30+, 60 + or whatever years? I think I saw a possible pace but I'm not sure.
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 11:39 AM
Steve,
R keeps a workspace. During a session, any variables created by the program, unless removed, remain available. Even when you exit, if you ask it to keep the workspace they will still be there when you start.

This can be a trap, esp if you don't have a lot of Gb. But it's the default. This program keeps the data that it reads, so you don't have to read it again.

However, reading is only a few seconds, so I'd say leave it as T.

There are all sorts of ways you can put in a province list. The vector i01 is just a list of the stations you want, listed by their line number in v2.temperature.inv. If you can work that out by hand, or any other means, you can just read it in. The basic command is something like
mylist = c(i1,i2,...)
where i1 etc are the line numbers. You could source that from a file, or even just paste it in to the working window. Then you'd say
if(tc="BC") i01 = mylist;

You could get R to make a first pass with a grep command, and then fill in the ones you know should be there.

One crude but effective trick is just to replace v2,temperature.inv with a list with just Canadian stations (starting with 403). They are consecutive, so it's a simple edit. Then you could be pretty sure that anything containing ",B" was BC.

Yes, the option for years is there in v2 and above. tv$startyr, tv$endyr, and tv$length, which is total yrs (allowing for missings). You could check the "Cut" and "Kept" in my latest post.
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 11:42 AM
Another crude trick is that if you do modify the file, you can just go through and edit the names to have proper abbreviations. That might be quickest. In fact, you don't need to truncate to do that.
ReplyDelete
Replies
UnknownApril 7, 2010 at 12:54 PM
Can you get a printout of the data that goes into making a plot so I could plot the data in Excel?
ReplyDelete
Replies
stevenApril 7, 2010 at 2:35 PM
He might also get at BC by specifying a Lat Lon

But the cleanest way is probably a Special

Inv file.
ReplyDelete
Replies
UnknownApril 7, 2010 at 3:23 PM
Amazing program. Just done all of Canada 1900 to 1997 and 2009 in 20 minutes. (With uncorrected data).

Thanks Nick. You have saved me months of work!!
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 3:57 PM
That's great Steve - thanks. I was afraid I'd given you bad advice with truncating the .inv file - I just realised it won't work with v1.2. Anyway I'm very glad your tenacity had its reward..
ReplyDelete
Replies
Nick StokesApril 7, 2010 at 4:01 PM
Steve, if your still wondering, the text output is on a file called Output... I'm actually curious to know what it is called on your system, because I had embedded the date in the filename.
ReplyDelete
Replies
UnknownApril 8, 2010 at 11:49 AM
Nick,

This shows up in my diary.txt file:

Output_:18:18_BC_AL_SASK_MAN_ON_QU_NB_NS_YT_NWT_N_NFL_.txt

Along with info on each province, but nothing else.

However, if I search my drives no such file exits.

I went through v2.temperature.inv in excel and made all the references to the provinces consistent. However, after saving the file from Excel, something is different and the script fails. Something about expecting read in and getting SKIKDA.

Anyone got some ideas on how to modify the v2.tempature.inv without screwing it up?
ReplyDelete
Replies
Nick StokesApril 8, 2010 at 12:45 PM
Steve,
I think the Output file is there, but NULL's in the name (due to the trouble with Sys.time()) stop the file browser from showing it. You might be able to see (and rename) it in a DOS window.

The filename is "restext" in the code. If you find this line:
restext=paste(c("Output",format(Sys.time(),"%F:%H:%M"),title[choices],".txt"),collapse="_");
and take out the
format(Sys.time(),"%F:%H:%M"),
leaving the line as:
restext=paste(c("Output",title[choices],".txt"),collapse="_");
that should work.

Not sure about the modified file issue. Does the revised file still have 7280 lines?
SKIKDA is the first station on the file, so it has had trouble there.
ReplyDelete
Replies
UnknownApril 9, 2010 at 5:28 AM
Nick,

Re the filename. Thanks that worked. Ended up in my active directory as Output_Global blah, blah....

Now, getting out the Canada info might be presenting me with a few gotchas.

I noticed in running your trythis that BC comes up with number stations 262 - seems a bit high even considering multiple entries for a station. I'm assuming this is number of stations it thinks it is analysing for BC. BC has only about 145 stations. Now other Provinces eg Saskatchewan have 79 but only 10 are in number stations for Sask.

Is it possible to take the Canadian Chunk out of v2.temperature.inv and run it in your script as say v2.temperatureC.inv? Or are there some pitfalss here.

Also, some people have suggested using Lat Lon. How do you enter this? Just by editing this line:

if(tc == "NH") i01 = i01[tv$lat>0];

to say if(tc == "SASK") i01 = i01[tv$lat>49, <60, lon >-100, <-110];

Betcha that's not correct!!

Guess the timing problems would got away if I ran Linux? Thinking of running Linux off CD.
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 5:53 AM
Steve,
In the trythis file, I made a mistake, writing
if(tc == "BC") i01 = c(grep(",BC",tv$name),grep(",BC",tv$name));
The second ",BC" should have been ",B.C".
The consequence is that every station with ".BC" (most of them) was listed twice. THis doesn't matter except for the station count, but of course doesn't achieve the aim of including the stations with ",B.C".

Yes, there is a pitfall with using just the Canada chunk (which I had mistakenly suggested too), The preprocessed file relies on the line sequence of stations in the .inv file to get the numbering right,

The way to combine conditions is like this:
if(tc == "SASK") i01 = i01[tv$lat>49 & tv$lat<60 & tv$lon >-100 & tv&lon < -110];
A fine point - I separated the < and the - in the last bit. <- means something else in R. I'm not sure whether that would be a problem - just cautious.
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 6:24 AM
Steve,
On your last point of trying to run under linux, which means a new R installation. What you're currently missing is just info on time progress (just clock time) and having the date/time in the output file name. The latter has the benefit that if you run the same job a second time, you don't overwrite the previous result file, But it's not a big deal.
ReplyDelete
Replies
UnknownApril 9, 2010 at 7:53 AM
Nick

Re:
if(tc == "BC") i01 = c(grep(",BC",tv$name),grep(",BC",tv$name));

That is why I am trying to clean up the v2.temperature.inv file with regards to Canada. There are many variations of Province abbreviations in the file, including no province at all and just CANADA (mostly Ontario which used to be known in ancient times as Upper Canada) more than what your line above took into account.

Now I got your preprocessor script to run sucessfully commenting ou the line:

if(time0>0){tm=tm-time0;}

I ran that with my modified (for Canada) in Excel v2.temperature.inv file in the active directory where I was running preprocessor and I got the impression that worked, but I'm not sure. Anyway, might I try the same thing with just the Canada chunk of V2.tmperature.inv? Does the preprocessor script somehow synchronize the v2.mean with v2.temperature.inv file??

Oh,and on the blog is there a home page link so you can get back to the most recent posting. Find this a little confusing, but maybe all I need to do is click on the latest comment or something.
ReplyDelete
Replies
UnknownApril 9, 2010 at 8:32 AM
Nick

How do you do southern latitudes? This didn't seem to work.

if(tc == "SExTropics") i01 = i01[tv$lat>-20 & tv$lat<-40];

This did for northern climes

if(tc == "NExTropics") i01 = i01[tv$lat>20 & tv$lat<40];
ReplyDelete
Replies
UnknownApril 9, 2010 at 8:36 AM
Nick

Forget the blog question - I found it in your script. Would be nice to have a home button though.
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 9:02 AM
Steve,
Yes, I think the best way for you is to just go through the file editing the irregular province notations. I wouldn't use Excel, though - something like Wordpad would be simpler. It's important to keep everything lined up, since it's a formatted read, The name part should remain at exactly 30 characters.

As to
if(tc == "SExTropics") i01 = i01[tv$lat>-20 & tv$lat<-40];
I think again it's that pesky <-, which is R for assignment. Put a space in, < -, or write <=-.

Cutting down the .inv won't work, because the current setup does rely on the line numbering to know what's what. But you can edit the big file, as long as you don't rempve or add lines, and keep each field to the same number of characters.

On most blogs, incl Moyhu, you can click on the banner name at the top to get to the home page.
ReplyDelete
Replies
UnknownApril 9, 2010 at 9:44 AM
Nick on running with your suggestion (I think either a space or = works)got:
SExTropics time (s) NA
Number stations 0 time NA

SMidLat time (s) NA
Number stations 0 time NA

SHighLat time (s) NA
Number stations 0 time NA

Isn't that interesting - 0 stations? all three areas:
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 10:03 AM
Steve,
This is just the funny way we think in the SH. You have to turn the signs around:
if(tc == "SExTropics") i01 = i01[tv$lat< -20 & tv$lat>-40];
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 10:13 AM
Hey, "SExTropics" - that's where I live :)
ReplyDelete
Replies
UnknownApril 9, 2010 at 12:54 PM
Nick

I have scripts by latitude:
"Tropics","NExTropics","NMidLat","NHighLat","SExTropics","SMidLat","SHighLat"

and

by 10 deg north and south of the equator. You interested in them to post on the home blog?
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 1:13 PM
Steve,
Yes, I'd be very glad to host them. There are various options - you could write a post yourself, or send me the scripts and pictures. Anyway, we can sort it out by email - although for the next few hours I'm online but with no very good access to email.
ReplyDelete
Replies
UnknownApril 9, 2010 at 1:49 PM
Nick:

I'll do this tomorrow as it is too late now.

Also,how do I change this to reflect only one abbreviation (I've changed them all to one)

if(tc == "BC") i01 = c(grep(",BC",tv$name),grep(",BC",tv$name));

I'm not sure about the brackets.
ReplyDelete
Replies
Nick StokesApril 9, 2010 at 8:53 PM
Steve,
You don't really need to do anything. It doesn't hurt to search unsuccessfully. But you can take out a grep - in fact
if(tc == "BC") i01 = grep(",BC",tv$name);
There's also a logical form of grep:
if(tc == "BC") i01 = i01[grepl(",BC",tv$name)];
The point of this is that you can combine the grep with latitude conditions etc
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Sunday, April 4, 2010

Tips on File Download

Tips on File Download

69 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Sunday, April 4, 2010

Tips on File Download

Tips on File Download

69 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me