User:Jimbotyson/sandbox

R- a first course [TOC] R and R-studio: the difference R is a programming language. It is the open-source version of the language S-Plus. R is designed to be good for statistical programming and graphics. For many computer systems, you can install an interpreter for R. This will allow you to send single line commands to R and R will respond. This is the R command line. Alternately you can create a text file of R code in a text editor (Windows notepad for example or vi). You can then run the complete program in your file and R will execute the commands in sequence and show you the final results. RStudio is an integraged environment for developing R programs. It combines access to the command line interpreter with an editor to create code files, a file browser, a graphics browser to let you see what graphs you have produced and a data browser so that you can inspect data as it is created in your R session. The Rstudio environment also allows you to integrate R and LaTeX or Markdown and produce documents that combine formatted text and the results from R programs. Many people find RStudio a very productive way to use R and it is a very good tool to use when you are learning R. The R-studio interface When you open RStudio, intially you will see three frames or panes. The main frame on the left shows you the R command line interpreter or console. this is where you will issue commands and see the results. On the top right you see your workspace and on a second tab the history. The workspace contains your data objects and the history shows a list of commands issued in your session. Below this are the file and graph browser. You can browse files that might contain data or R code and you can browse graphs and charts that you produce in R. Explain and use basic R-studio command line R statements The console when it is ready for you to enter a command shows a prompt: &gt; For our first R command we will enter a bit of simple arithmetic: &gt; 2+2 [1] 4 &gt; R acts as a simple calcultor here and it returns the result of its calculation to the screen for us. But what is the [1] telling us? When R displays data for us it will wrap the data lines to fit your window width. The numer in square brackets at the beginning of each line tells you with which element that line begins (the first, tenth, etc). It is entirely cosmetic. You can see this by entering the following command: &gt; x&lt;-c(1:100) &gt; x which displays [1]  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16 [17]  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48 [49]  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 [81]  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96 [97]  97  98  99 100  What does this code snippet mean? R uses the symbol &lt;- to take values on the right hand side and assign them to variables on the left hand side. The symbol &lt;- is read as gets. In this case the right hand side is c(1:100) Which means combine into a vector the numbers from 1 to 100. Often the c function is read as collect. Because R understands vectors we can easily operate on all the elements of our collected values. For example &gt; y&lt;-x^2 &gt; y [1]     1     4     9    16    25    36    49    64    81   100   121 [12]  144   169   196   225   256   289   324   361   400   441   484 [23]   529   576   625   676   729   784   841   900   961  1024  1089 [34]  1156  1225  1296  1369  1444  1521  1600  1681  1764  1849  1936 [45]  2025  2116  2209  2304  2401  2500  2601  2704  2809  2916  3025 [56]  3136  3249  3364  3481  3600  3721  3844  3969  4096  4225  4356 [67]  4489  4624  4761  4900  5041  5184  5329  5476  5625  5776  5929 [78]  6084  6241  6400  6561  6724  6889  7056  7225  7396  7569  7744 [89]  7921  8100  8281  8464  8649  8836  9025  9216  9409  9604  9801 [100] 10000 squares all the elements of x. (You notice that we use ^ for raising to a power). Of course, you may not wish to create all your data objects by typing in values at the command line. Next we will see how R provides ways for you to read data from external sources - files created in other applications such as, for example, Microsoft Excel. Import datasets in to the R-studio workspace from plain text and Excel file formats You can import data into R from a file containing plain text. The file should contain data that can be understood as cases and variables. Most commonly, the file will contain cases in rows and variables in columns. The columns should be separated by some character - often a tab or a comma. Here is an example, just the first few lines of a file where the columns are separated by commas: patient,gender,surname,age,postcode,income,smoker,hbefore,hafter 735745,1,ROBBINS,32,KT13,46000,N,94.58,88.79 1009009,2,MCGREGOR,33,KY1,58000,N,106.12,78.25 845260,2,KUMAR,38,FY7,47000,Y,88.11,102.45 780768,1,ALLINSON-HENRY,51,KA11,55000,N,83.62,63.82 1176135,2,OLDER,44,DE23,28000,N,72.31,77.50 1275041,1,BERRY,41,S80,28000,Y,84.37,68.21 845772,1,SMITH,56,BT43,61000,N,76.95,81.47 It isn't very easy to undersand set out like this, so let's line up the columns visually: patient gender       surname age postcode income smoker hbefore hafter 735745     1        ROBBINS  32     KT13  46000      N   94.58  88.79 1009009     2       MCGREGOR  33      KY1  58000      N  106.12  78.25 845260     2          KUMAR  38      FY7  47000      Y   88.11 102.45 780768     1 ALLINSON-HENRY  51     KA11  55000      N   83.62  63.82 1176135     2          OLDER  44     DE23  28000      N   72.31  77.50 1275041     1          BERRY  41      S80  28000      Y   84.37  68.21 845772     1          SMITH  56     BT43  61000      N   76.95  81.47 We can see that this is a file with nine columns - nine variables if you will. Some of the variables are numeric and some are character data. Because this is mixed data - numeric and text - we will read it into a R list. Here is the command &gt;dat&lt;-read.csv(&quot;excsv.txt&quot;,header=TRUE, sep=&quot;,&quot;) We have used the function read.csv because of the format of the data - it is separated by commas ( sep=&quot;,&quot;). And we have told R that the first line contains headers ( header=TRUE_). As we said, this object has the type list which is the array like structure for heterogenous data. R will also treat this as a datafram: a two dimensional array (a table like object) with variables in columns and obervations or cases in rows. You can read tab separated data with the read.table command. You can also read spreadsheet data. If our example data were in an excel spreadsheet instead of a text file, we could read it if the package xlsx were installed in R. We will look at installing packages later, but the correct comand would be: &gt;dat&lt;-read.xlsx(&quot;datafile.xlsx&quot;,header=TRUE) You see that it's very similar in form. R help and documentation Add on packages in R Explain and use some R data structures eg vectors matrices dataframes Perform basic data manipulation routines in R eg label variables and factor levels derive new variables from existing data transform variables eg use log transformation of skewed data create ordinal from continuous data testing data for normality Summary statistics and tables in R location diffusion shape <h1 id="simple-r-plots">Simple R plots <h1 id="statistical-testing-in-r">Statistical testing in R <h2 id="the-general-linear-model">the general linear model <h3 id="the-t-test">the t-test <h3 id="the-anova">the anova <h3 id="simple-linear-regression">simple linear regression <h3 id="multiple-regression">multiple regression <h3 id="categorical-predictors">categorical predictors <h2 id="tabulations-and-tests-of-independence">tabulations and tests of independence <h1 id="basic-literate-programming-with-r-and-knitr-reproducible-analyses">Basic literate programming with R and Knitr: reproducible analyses