10 June 2014

#pairs function short tutorial

#Pairs function short tutorial

#Pairs function short tutorial

Pairs function creates beautiful correlation matrix plot in between parameters in the dataset.

  1. First you need to format your dataset. The first row will be the headers, like: No, temp, tds, etc. The next rows will be the samples or cases. The columns will be the parameters of variables.

  2. Then you should save the dataset as csv (comma separated values). R can read xls format directly, but you will need separate package, foreign package. The dataset contains eight columns: No, elev, lith, turb, pH, hard, tds, and temp; and 51 samples (from number 1 to 51).

  3. Type the following code

# Pairs function By Dasapta Erwin Irawan and Willem Vervoort
# ---------------------------------------------

# Don't forget to set the working directory

# Load data (named 'testpairs.csv)
testpairs = read.csv("testpairs.csv")

# Using Pairs
pairs = testpairs[, c("elev", "lith", "turb", "pH", "hard", "tds", "temp")]
pairs(testpairs, labels = colnames(testpairs), main = "Pairs matrix", pch = 21, 
    bg = c("red", "green3", "blue", "yellow")[unclass(testpairs$lith)], upper.panel = NULL)

# adding legend
legend(x = 0.5, y = 2, levels(testpairs$lith), pt.bg = c("red", "green3", "blue", 
    "yellow"), pch = 21, bty = "n", ncol = 2)

plot of chunk pairs plot

We use cbind (column bind) to bind the columns that we are going to analyzed. So if you have say 20 columns as measured parameters, you can always cbind them to suite your needs. It might be seven columns (the one on the code) as the first cbind, seven other columns as the second cbind, and six other columns as the third cbind. You can cbind different columns that are not stand next to each other. For instance, you cbind column no 6 with no 8 and no 12, etc.

In this example, we used 'lith' (lithology) column to group the sample with [unclass(testpairs$lith)] command. We also used four colors to show four different lithologies that we had in the dataset.

The legend may not popped out in the example. We are still looking the best way to show it, because it really depends on the size of the pairs matrix.

Good luck.

This code was run on:

  • R-base version 3.1.0 (Spring Dance)
  • R studio version 0.98.507
  • Linux-Ubuntu 13.10

No comments: