Just got into grad school! So finally settled in and ready to write more blog posts.
Here is an hour project that I worked on. My friend has a huge data file on a dung beetle population and recorded many important information daily. It looks something like this:
|
Figure 1: Fake Randomly Generated Data |
So he comes up to me and say,
Him: "I want to average the 5 different variables"
Me: "Okay, like weekly? So like every 8 rows?"
Him: "Oh no, I have a data file of the dates that I want to serve as start-date and end-date.
So the duration of the start-date and end date will be different"
He then opens up another CSV file
|
Figure 2: Fake Dates |
I want the data averaged from 1/1/1989 to 1/4/1989, 1/4/1989 to 1/5/1989, etc.
And then I threw a string of obscenities at him for making it harder. So basically, I run a for-loop (don't judge me).
Steps (Code is below: NOTE THAT YOU MUST CHANGE CERTAIN THINGS FOR YOUR OWN SPECIFIC NEEDS)
1) I uploaded the two different files into my R workstation. One will be the record (Figure 2) and the other will be the main data. My friend had some other variables in his record file and I just wanted the dates so I created a new variable, x.
2) I created an empty list so I can put newly averaged data into the list.
3) I ran a for-loop on the record data (x).
4) I use the "match" function to find the index of the main data and use that to subset the portion of the maindata (from the start-date to the end-date)
5) I then average the columns
6) Because the averages are in a column form, I transpose it
7) I then column bind the start date, end date, and the averages
8) I put this into the list
REPEAT
9) After I finished, I turn the list into a data.frame