Module # 7 Visual Distribution Analysis

 

How to Visualize & Compare Distributions in R.

1) This week we are doing a distribution analysis visual. This time we are going to use grid to enhance the comparisons between our plots. Then we are going to go over Few's recommendations for these visuals. With out further ado, lets get to it.

2) The data I am going to be using is mtcars. Why, because everyone who has RStudio has it, its free and simple to use, and I don't have much time to complete this assignment, a hurricane knocked out my power for a while so we are goign to do this as fast as I can in the simplest way I can come up with. For our distribution analysis we are going to take the variable in mtcar and compare them to mpg, so we can visually see how these variables affect the cars' fuel efficiency. We are also going to make them all appear on the same plot grid in R so it don't have to click through them. They are all together and easy to see. We are not going to do anything fancy to add extra confusion, just some simple, clean graphics.  

> library(gridExtra)
Warning message:
package ‘gridExtra’ was built under R version 4.3.3 
> data("mtcars")
> # Create scatter plots to compare variables
> p1 <- ggplot(mtcars, aes(x=wt, y=mpg)) + 
+   geom_point() + 
+   theme_minimal() +
+   ggtitle("Weight vs MPG")
> 
> p2 <- ggplot(mtcars, aes(x=hp, y=mpg)) + 
+   geom_point() + 
+   theme_minimal() +
+   ggtitle("Horsepower vs MPG")
> 
> p3 <- ggplot(mtcars, aes(x=disp, y=mpg)) + 
+   geom_point() + 
+   theme_minimal() +
+   ggtitle("Displacement vs MPG")
> 
> p4 <- ggplot(mtcars, aes(x=cyl, y=mpg)) + 
+   geom_point() + 
+   theme_minimal() +
+   ggtitle("Cylinders vs MPG")
> 
> # Arrange the plots in a grid
> grid.arrange(p1, p2, p3, p4, ncol=2)
> 


3) We are using gridExtra just so we can fit them all on the same plot in RStudio, ggoplot2 to make the graphs easier to work with. From our 4 visuals we can see that we are getting better fuel efficiency at beginning of each scatter plot. That is to say that for the first graph, the lighter the car is the better fuel efficiency, the less horse power it has the more fuel efficiency, the less displacement the car has the more fuel efficient, and the less cylinders it has, generally the more fuel efficient it is.  

4) Now what does Few have to say about all this? Well, Few recommends to use consistent interval sizes and avoid clutter in visualization. Few would also like the grid layout since it allows for an organized comparison between variables, reducing the cognitive load on the viewer. Few also prefers 2D scatter plots over 3D visualizations with too much noise. What we could do that Few would appreciate would be to also one, make another axis for the cylinders since they are directly correlated to mpg, and then color code them across the other scatter plots. So lets do that. 




5) Now everything is color coded, side by side, and clearly laid out with grid lines. This makes the individual points much easier to make sense of and distinguish which cars have how many cylinder across the scatter plots and where they generally fall on the graphs in groups. 

6) If you would like to see the code presented here in its full form, I will post it up on my GitHub here. 






















Comments