R

What is R?
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formula where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

R commands
Start R and change the working directory to where the statistics files are found. R setwd("/home/student/Genomes/") getwd [1] "/home/student/Genomes"

Reshape table to matrix (heatmap)
This example illustrate a formatting situation that you might run into in working with multiple values per genome. State, Year, Value KY, 1998, 56 KY, 1997, 78 IL, 1998, 48 IL, 1997, 72

and I want: State, 1997_value, 1998_value KY, 78, 56 IL, 72, 48

You want to use the reshape function. reshape(data, idvar="State", timevar="Year", direction="wide")

Reference the last column of data-frame
codon[,length(codon)]

Codon usage heatmap
To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow. install.packages("gplots") library(gplots) codon <- read.table("codonUsage.all") colnames(codon) <- c( 'Name', 'codon', 'score', 'count') codon <- codon[1:3] test <- reshape(codon, idvar="Name", timevar="codon", direction="wide") codonMatrix <- data.matrix(test[2:length(test)]) rownames(codonMatrix) <- test$Name codon_heatmap <- heatmap.2(codonMatrix, scale="column",  main="Codon usage",  xlab="Codon fraction",  ylab="Organism",  trace="none",  margins=c(8, 20)) dev.print(pdf, "codonUsage.pdf") dev.off

The formats of each data structure is shown bellow: > codon Name codon  score 1            Acidaminococcus_fermentans_DSM_20731   AAA 3.05528 2            Acidaminococcus_fermentans_DSM_20731   CAA 0.30650 ........ > test Name score.AAA score.CAA score.GAA score.TAA 1	Acidaminococcus_fermentans_DSM_20731  3.05528   0.30650   5.23985    0.15237 65	Acidaminococcus_intestini_RyC-MR95  3.02789   0.91191   4.91588   0.16988 ........ > codonMatrix score.AAA score.CAA score.GAA score.TAA score.ACA Acidaminococcus_fermentans_DSM_20731	3.05528  0.30650   5.23985   0.15237   0.34499 Acidaminococcus_intestini_RyC-MR95	3.02789  0.91191   4.91588   0.16988   0.98450

Amino acid heatmap
To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow. library(gplots) aa <- read.table("aaUsage.all") colnames(aa) <- c( 'Name', 'aa', 'score') test <- reshape(aa, idvar="Name", timevar="aa", direction="wide") aaMatrix <- data.matrix(test[2:length(test)]) rownames(aaMatrix) <- test$Name stat_heatmap <- heatmap.2(aaMatrix, scale="column",  main="Amino acid usage",  xlab="Amino acid fraction",  ylab="Organism",  trace="none",  margins=c(8, 20), col = cm.colors(256)) dev.print(pdf, "aaUsage.pdf") dev.off

The formats of each data structure is shown bellow: > aa        V1                                      V2      V3 1 Acidaminococcus_fermentans_DSM_20731	G	8.1275 2 Acidaminococcus_fermentans_DSM_20731	A	9.0013 ........ > str(aa) 'data.frame':	620 obs. of 3 variables: $ Name : Factor w/ 31 levels "Acidaminococcus_fermentans_DSM_20731",..: 1 ... $ aa   : Factor w/ 20 levels "A","C","D","E",..: 6 1 18 10 8 5 20 19 7 9 ... $ score: num  8.13 9 7.3 10.12 5.8 ... > test Name                           score.G score.A score.V score.L score.I score.F 1  Acidaminococcus_fermentans_DSM_20731	8.1275  9.0013  7.2975 10.1203  5.7992  3.8577 21 Acidaminococcus_intestini_RyC-MR95	7.7623 8.7881  7.0802  9.7019  6.3094  4.0698 ........ > aaMatrix score.G score.A score.V score.L score.I score.F Acidaminococcus_fermentans_DSM_20731	8.1275 9.0013 7.2975 10.1203 5.7992 3.8577 Acidaminococcus_intestini_RyC-MR95	7.7623 8.7881 7.0802 9.7019 6.3094 4.0698