Friday, May 8, 2015

How to: Plotting multiple smooth regression lines in a single graph with R (ggplot2) (Star Wars)

Hopefully you're going to be a Star Wars fan, because that's the example we're going to use.
So in the second and third episode of the prequel trilogy, there's war going on between the Republic  (The Goodies) and the Separatists (The Baddies)

All you really have to know is that:

On one side (The "Good" side)  there are the Jedi Masters (the ones with the lightsabres) and there are Clone Troopers
(Skip to Scenario if you want to see the R codes)

Yes, I am aware that Satele Shan was not in the Clone Wars.
On the "evil" side:

We have Sith Lords and then we have the droids:
Yeah, yeah- breaking the Rule of Two. Shush

SCENARIO:

Even with a war going on there are still people who use R!
They want to see if the number of fatalities for the droids, Sith Lords, and Clone troopers are affected by the number of Jedi Masters on the battlefield. Do these arrogant, know-it-all hippies actually contribute?  To keep this simple, the data analysts decide to use smooth regressions!  And they want to show on one graph, the:

Droids Destroyed versus Number of Jedi Masters
Clone Troopers killed versus Number of Jedi Masters
Sith Lords killed versus Number of Jedi Masters.

For the Jedi supporters, they probably want the number of droids and Sith Lord fatalities to increase with more Jedi Masters on the field and less clone trooper deaths with more Jedi Masters.

Because I'm lazy, I decided to randomly generate numbers in an Excel file!

Now I kinda set it up in this format:
Download from Google Sheet (StarWars
Note that is an Excel file and I'm working with CSV



Well, let's do it!

(Disclaimer: There are multiple and better way of getting the same-end product! Seriously, please free to point out "Hey, why did you write 15 lines of code when it's easy with just 2 lines?". That's how I learn! Also this example is for ggplot2 not for stats) 

So let's say you downloaded the file!
Set your work station where you put the file!
And then let's write some code!

One thing we're going to need is the reshape2 package and the ggplot2 package.

So one problem that I found when I was using a data file that was in the format above is that I couldn't figure out a way to say to R (specifically ggplot2), "Hey, the y value is going to be the droids, Jedis, and Sith Lords. The x value is just the number of  Jedi Masters"

So after some minutes of fooling around, I decided to use the reshape package

After, I melted the data it came to look like this:




So let's plot the points!

Graph 1: Just the points of Fatalities versus Jedi Masters 


Hm... Not seeing much a pattern. Maybe Order 66 was the right thing to do.

Now lets fit some lines.



The Chancellor demands that we take out the distracting points.


Miscellaneous: Change the Legend!

While we need to actually run some statistical tests to verify, from the graph, the presence of Jedi Masters seem to have little effect on the number of fatalities. 





No comments:

Post a Comment