User:Timhowardriley

<!--

Why the sky is blue.

 * Teacher: The sky is blue because ...
 * Student: Wrong. The clouds are white.
 * Teacher: The sky is blue — except for the white clouds — because ...
 * Student: That's improper and meaningless mumbo jumbo. With an overcast, the sky is gray.
 * Teacher: The sky is blue — except for the white clouds and not with an overcast — because ...
 * Student: You're not qualified to teach. At night the sky is black.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, and not at night — because ...
 * Student: That's rubbish. Air pollution is gray.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, not at night, and away from pollution — because ...
 * Student: You lack knowledge and expertise, and are not a natural teacher. During a solar eclipse, the sky is dark.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, not at night, away from pollution, and not during a solar eclipse — because ...
 * Student: That's black magic nonsense. On the moon, the sky is dark.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, not at night, away from pollution, not during a solar eclipse, and here on Earth — because ...
 * Student: That's sloppy, and you have a poor writing skills and a very narrow view of the subject. During evenings, the sky is orange.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, not at night, away from pollution, not during a solar eclipse, here on Earth, and not during evenings — because ...
 * Student: The problem is your own brain. You have done nothing but poisoning the atmosphere with your negativity, ignorance, frivolous demands. Smoke from forest fires cover the sky.
 * Dr. Stair: The sky is blue — except for the white clouds, not with an overcast, not at night, away from pollution, not during a solar eclipse, here on Earth, not during evenings, and not near a forest fire — because ...
 * Student: Dr. Stair, you're wrong. In my closet, the sky is a ceiling.
 * Teacher: The sky is blue — except for the white clouds, not with an overcast, not at night, away from pollution, not during a solar eclipse, here on Earth, not during evenings, not near a forest fire, and away from a ceiling — because nitrogen and oxygen absorb all the light wavelengths except for the blue.

-->

R programming language
R is a programming language for statistical computing, data visualization, and data analysis.

Mean -- a measure of center
A numeric data set may have a central tendency — where some of the most typical data points reside. The arithmetic mean (average) is the most commonly used measure of central tendency. The mean of a numeric data set is the sum of the data points divided by the number of data points.


 * Let $$x$$ = a list of data points.
 * Let $$n$$ = the number of data points.
 * Let $$\bar{x}$$ = the mean of a data set.
 * $$\bar{x} = \frac{x_1+x_2+\cdots +x_n}{n}$$

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.


 * Let $$x$$ = a list of degrees Celsius data points of 30, 27, 31, 28.

This R computer program will output the mean of $$x$$:

Note: R can have the same identifier represent both a function name and its result. For more information, visit scope.

Output:

This R program will execute the native  function to output the mean of $$$$x:

Output:

Standard Deviation -- a measure of dispersion
A standard deviation of a numeric data set is an indication of the average distance all the data points are from the mean. For a data set with a small amount of variation, then each data point will be close to the mean, so the standard deviation will be small.


 * Let $$x$$ = a list of data points.
 * Let $$n$$ = the number of data points.
 * Let $$s$$ = the standard deviation of a data set.
 * $$s = \sqrt{\frac{\sum\left(x_i - \bar{x}\right)^2}{n - 1}}$$

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.


 * Let $$x$$ = a list of degrees Celsius data points of 30, 27, 31, 28.

This R program will output the standard deviation of $$x$$:

Output:

This R program will execute the native  function to output the standard deviation of $$x$$:

Output:

Linear regression -- a measure of relation
A phenomenon may be the result of one or more observable events. For example, the phenomenon of skiing accidents may be the result of having snow in the mountains. A method to measure whether or not a numeric data set is related to another data set is linear regression.


 * Let $$x$$ = a data set of independent data points, in which each point occurred at a specific time.
 * Let $$y$$ = a data set of dependent data points, in which each point occurred at the same time of an independent data point.

If a linear relationship exists, then a scatter plot of the two data sets will show a pattern that resembles a straight line. If a straight line is embedded into the scatter plot such that the average distance from all the points to the line is minimal, then the line is called a regression line. The equation of the regression line is called the regression equation.

The regression equation is a linear equation; therefore, it has a slope and y-intercept. The format of the regression equation is $$\hat{y} = b_{0} + b_{1}x$$.


 * Let $$b_{1}$$ = the slope of the regression equation.
 * $$b_{1} = \frac{\sum\left(x - \bar{x}\right)\left(y - \bar{y}\right)}{\sum\left(x - \bar{x}\right)^2}$$


 * Let $$b_{0}$$ = the y-intercept of the regression equation.
 * $$b_{0} = \bar{y} - b_{1}\bar{x}$$

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart. At the same time, the thermometer was switched to Fahrenheit temperature and another measurement was taken.


 * Let $$x$$ = a list of degrees Celsius data points of 30, 27, 31, 28.
 * Let $$y$$ = a list of degrees Fahrenheit data points of 86.0, 80.6, 87.8, 82.4.

This R program will output the slope and y-intercept of a linear relationship in which $$y$$ depends upon $$x$$:

Output:

This R program will execute the native functions to output the slope and y-intercept:

Output:

Coefficient of determination -- a percentage of variation
The coefficient of determination determines the percentage of variation explained by the independent variable. It always lies between 0 and 1. A value of 0 indicates no relationship between the two data sets, and a value near 1 indicates the regression equation is extremely useful for making predictions.


 * Let $$\hat{y}$$ = the data set of predicted response data points when the independent data points are passed through the regression equation.
 * Let $$r^{2}$$ = the coefficient of determination in a relationship between an independent variable and a dependent variable.


 * $$r^{2} = \frac{\sum\left(\hat{y} - \bar{y}\right)^2}{\sum\left(y - \bar{y}\right)^2}$$

This R program will output the coefficient of determination of the linear relationship between $$x$$ and $$y$$:

Output:

This R program will execute the native functions to output the coefficient of determination:

Output:

Scatter plot
This R program will display a scatter plot with an embedded regression line and regression equation illustrating the relationship between $$x$$ and $$y$$:

Output:

Programming
R is an interpreted language, so programmers typically access it through a command-line interpreter. If a programmer types  at the R command prompt and presses enter, the computer replies with. Programmers also save R programs to a file then execute the batch interpreter Rscript.

Object
R stores data inside an object. An object is assigned a name which the computer program uses to set and retrieve a value. An object is created by placing its name to the left of the symbol-pair. The symbol-pair  is called the assignment operator.

To create an object named  and assign it the integer value  :

Output:

The  displayed before the number is a subscript. It shows the container for this integer is index one of an array.

Vector
The most primitive R object is the vector. A vector is a one dimensional array of data. To assign multiple elements to the array, use the  function to "combine" the elements. The elements must be the same data type. R lacks scalar data types, which are placeholders for a single word — usually an integer. Instead, a single integer is stored into the first element of an array. The single integer is retrieved using the index subscript of.

R program to store and retrieve a single integer:

Output:

Element-wise operation
When an operation is applied to a vector, R will apply the operation to each element in the array. This is called an element-wise operation.

This example creates the object named  and assigns it integers 1 through 3. The object is displayed and then again with one added to each element:

Output:

To achieve the many additions, R implements vector recycling. The numeral one following the plus sign is converted into an internal array of three ones. The  operation simultaneously loops through both arrays and performs the addition on each element pair. The results are stored into another internal array of three elements which is returned to the  function.

Numeric vector
A numeric vector is used to store integers and floating point numbers. The primary characteristic of a numeric vector is the ability to perform arithmetic on the elements.

Integer vector
By default, integers (numbers without a decimal point) are stored as floating point. To force integer memory allocation, append an  to the number. As an exception, the sequence operator  will, by default, allocate integer memory.

R program:

Output:

R program:

Output:

R program:

Output:

Double vector
A double vector stores real numbers, which are also known as floating point numbers. The memory allocation for a floating point number is double precision. Double precision is the default memory allocation for numbers with or without a decimal point.

R program:

Output:

R program:

Output:

Logical vector
A logical vector stores binary data — either  or. The purpose of this vector is to store the result of a comparison. A logical datum is expressed as either,  ,  , or. The capital letters are required, and no quotes surround the constants.

R program:

Output:

Two vectors may be compared using the following logical operators:

Character vector
A character vector stores character strings. Strings are created by surrounding text in double quotation marks.

R program:

Output:

R program:

Output:

Factor
A Factor is a vector that stores a categorical variable. The  function converts a text string into an enumerated type, which is stored as an integer.

In experimental design, a factor is an independent variable to test (an input) in a controlled experiment. A controlled experiment is used to establish causation, not just association. For example, one could notice that an increase in hot chocolate sales is associated with an increase in skiing accidents.

An experimental unit is an item that an experiment is being performed upon. If the experimental unit is a person, then it is known as a subject. A response variable (also known as a dependent variable) is a possible outcome from an experiment. A factor level is a characteristic of a factor. A treatment is an environment consisting of a combination of one level (characteristic) from each of the input factors. A replicate is the execution of a treatment on an experimental unit and yields response variables.

This example builds two R programs to model an experiment to increase the growth of a species of cactus. Two factors are tested:
 * 1) water levels of none, light, or medium
 * 2) superabsorbent polymer levels of not used or used

R program to setup the design:

Output:

R program to store and display the results:

Output:

Data frame
A data frame stores a two-dimensional array. The horizontal dimension is a list of vectors. The vertical dimension is a list of rows. It is the most useful structure for data analysis. Data frames are created using the  function. The input is a list of vectors (of any data type). Each vector becomes a column in a table. The elements in each vector are aligned to form the rows in the table.

R program:

Output:

Data frames can be deconstructed by providing a vector's name between double brackets. This returns the original vector. Each element in the returned vector can be accessed by its index number.

R program to extract the word "world". It is stored in the second element of the "string" vector:

Output:

Vectorized coding
Vectorized coding is a method to produce quality R computer programs that take advantage of R's strengths. The R language is designed to be fast at logical testing, subsetting, and element-wise execution. On the other hand, R does not have a fast  loop. For example, R can search-and-replace faster using logical vectors than by using a  loop.

For loop
A  loop repeats a block of code for a specific amount of iterations.

Example to search-and-replace using a  loop:

Output:

Subsetting
R's syntax allows for a logical vector to be used as an index to a vector. This method is called subsetting.

R example:

Output:

Change a value using an index number
R allows for the assignment operator  to overwrite an existing value in a vector by using an index number.

R example:

Output:

Change a value using subsetting
R also allows for the assignment operator  to overwrite an existing value in a vector by using a logical vector.

R example:

Output:

Vectorized code to search-and-replace
Because a logical vector may be used as an index, and because the logical operator returns a vector, a search-and-replace can take place without a  loop.

R example:

Output:

Functions
A function is an object that stores computer code instead of data. The purpose of storing code inside a function is to be able to reuse it in another context.

Native functions
R comes with over 1,000 native functions to perform common tasks. To execute a function:
 * 1) type in the function's name
 * 2) type in an open parenthesis
 * 3) type in the data to be processed
 * 4) type in a close parenthesis

This example rolls a die one time. The native function's name is. The data to be processed are:
 * 1) a numeric integer vector from one to six
 * 2) the   parameter instructs   to execute the roll one time

Possible output:

The R interpreter provides a help screen for each native function. The help screen is displayed after typing in a question mark followed by the function's name:

Partial output:

Function parameters
The  function has available four input parameters. Input parameters are pieces of information that control the function's behavior. Input parameters may be communicated to the function in a combination of three ways:
 * 1) by position separated with commas
 * 2) by name separated with commas and the equal sign
 * 3) left empty

For example, each of these calls to  will roll a die one time:

Every input parameter has a name. If a function has many parameters, setting  will make the source code more readable. If the parameter's name is omitted, R will match the data in the position order. Usually, parameters that are rarely used will have a default value and may be omitted.

Data coupling
The output from a function may become the input to another function. This is the basis for data coupling.

This example executes the function  and sends the result to the function. It simulates the roll of two dice and adds them up.

Possible output:

Functions as parameters
A function has parameters typically to input data. Alternatively, a function (A) can use a parameter to input another function (B). Function (A) will assume responsibility to execute function (B).

For example, the function  has an input parameter that is a placeholder for another function. This example will execute  once, and   will execute   five times. It will simulate rolling a die five times:

Possible output:

Uniform distribution
Because each face of a die is equally likely to appear on top, rolling a die many times generates the uniform distribution. This example displays a histogram of a die rolled 10,000 times:

The output is likely to have a flat top:

Central limit theorem
Whereas a numeric data set may have a central tendency, it also may not have a central tendency. Nonetheless, a data set of the arithmetic mean of many samples will have a central tendency to converge to the population's mean. The arithmetic mean of a sample is called the sample mean. The central limit theorem states for a sample size of 30 or more, the distribution of the sample mean ($$\bar{x}$$) is approximately normally distributed, regardless of the distribution of the variable under consideration ($$x$$). A histogram displaying a frequency of data point averages will show the distribution of the sample mean resembles a bell-shaped curve.

For example, rolling one die many times generates the uniform distribution. Nonetheless, rolling 30 dice and calculating each average ($$\bar{x}$$) over and over again generates a normal distribution.

R program to roll 30 dice 10,000 times and plot the frequency of averages:

The output is likely to have a bell shape:

Programmer-created functions
To create a function object, execute the  statement and assign the result to a name. A function receives input both from global variables and input parameters (often called arguments). Objects created within the function body remain local to the function.

R program to create a function:

Usage output:

Function arguments are passed in by value.

Generic functions
R supports generic functions, which is also known as polymorphism. Generic functions act differently depending on the class of the argument passed in. The process is to dispatch the method specific to the class. A common implementation is R's  function. It can print almost every class of object. For example,.

If statements
R program illustrating if statements:

Output:

Programming shortcuts
R provides three notable shortcuts available to programmers.

Omit the print function
If an object is present on a line by itself, then the interpreter will send the object to the  function.

R example:

Output:

Omit the return statement
If a programmer-created function omits the  statement, then the interpreter will return the last unassigned expression.

R example:

Usage output:

Alternate assignment operator
The symbol-pair  assigns a value to an object. Alternatively,  may be used as the assignment operator. However, care must be taken because  closely resembles the logical operator for equality, which is.

R example:

Output:

Normal distribution
If a numeric data set has a central tendency, it also may have a symmetric looking histogram — a shape that resembles a bell. If a data set has an approximately bell-shaped histogram, it is said to have a normal distribution.

Chest size of Scottish militiamen data set
In 1817, a Scottish army contractor measured the chest sizes of 5,732 members of a militia unit. The frequency of each size was:

Create a comma-separated values file
R has the  function to convert a data frame into a CSV file.

R program to create chestsize.csv:

Import a data set
The first step in data science is to import a data set.

R program to import chestsize.csv into a data frame:

Output:

Transform a data set
The second step in data science is to transform the data into a format that the functions expect. The chest-size data set is summarized to frequency; however, R's normal distribution functions require a numeric double vector.

R function to convert a summarized to frequency data frame into a vector:

R has the  function to include another R source file into the current program.

R program to load and display a summary of the 5,732 member data set:

Output:

Visualize a data set
The third step in data science is to visualize the data set. If a histogram of a data set resembles a bell shape, then it is normally distributed.

R program to display a histogram of the data set:

Output:

Standardized variable
Any variable ($$x_i$$) in a data set can be converted into a standardized variable ($$z_i$$). The standardized variable is also known as a z-score. To calculate the z-score, subtract the mean and divide by the standard deviation.


 * Let $$x$$ = a set of data points.
 * Let $$\bar{x}$$ = the mean of the data set.
 * Let $$\sigma$$ = the standard deviation of the data set.
 * Let $$x_i$$ = the $$i^{th}$$ element in the set.
 * Let $$z_i$$ = the z-score of the $$i^{th}$$ element in the set.
 * $$z_i = \frac{x_i - \bar{x}}{\sigma}$$

R function to convert a measurement to a z-score:

R program to convert a chest size measurement of 38 to a z-score:

Output:

R program to convert a chest size measurement of 42 to a z-score:

Output:

Standardized data set
A standardized data set is a data set in which each member of an input data set was run through the  function.

R function to convert a numeric vector into a z-score vector:

Standardized chest size data set
R program to standardize the chest size data set:

Output:



Standard normal curve
A histogram of a normally distributed data set that is converted to its standardized data set also resembles a bell-shaped curve. The curve is called the standard normal curve or the z-curve. The four basic properties of the z-curve are:


 * 1) The total area under the curve is 1.
 * 2) The curve extends indefinitely to the left and right. It never touches the horizontal axis.
 * 3) The curve is symmetric and centered at 0.
 * 4) Almost all of the area under the curve lies between -3 and 3.

Area under the standard normal curve
The probability that a future measurement will be a value between a designated range is equal to the area under the standard normal curve of the designated range's two z-scores.

For example, suppose the Scottish militia's quartermaster wanted to stock up on uniforms. What is the probability that the next recruit will need a size between 38 and 42?

R program:

Output:

The  function can compute the probability between a range without first calculating the z-score.

R program:

Output:



XMLHttpRequest
XMLHttpRequest is a JavaScript class containing methods to asynchronously transmit HTTP requests from a web browser to a web server. The methods allow a browser-based application to make a fine-grained server call and store the result in the XMLHttpRequest  attribute. The XMLHttpRequest class is a component of Ajax programming. Without Ajax, the "Submit" button will send to the server an entire HTML form. The server will respond by returning an entire HTML page to the browser.

Constructor
Generating an asynchronous request to the web server requires first to instantiate (allocate the memory of) the XMLHttpRequest object. The allocated memory is assigned to a variable. The programming statement in JavaScript to instantiate a new object is  . The   statement is followed by the constructor function of the object. The custom for object-oriented language developers is to invoke the constructor function using same name as the class name. In this case, the class name is XMLHttpRequest. To instantiate a new XMLHttpRequest and assign it to the variable named :

The open method
The open method prepares the XMLHttpRequest. It can accept up to five parameters, but requires only the first two.


 * RequestMethod: The HTTP request method may be  for smaller quantities of data. Among the other request methods available,   will handle substantial quantities of data. After the return string is received, then send the   request method to   to free the XMLHttpRequest memory. If   is sent, then the SubmitURL parameter may be.


 * SubmitURL: The SubmitURL is a URL containing the execution filename and any parameters that get submitted to the web server. If the URL contains the host name, it must be the web server that sent the HTML document. Ajax supports the same-origin policy.
 * AsynchronousBoolean: If supplied, it should be set to true. If set to false, then the browser will wait until the return string is received. Programmers are discouraged to set AsynchronousBoolean to false, and browsers may experience an exception error.
 * UserName: If supplied, it will help authenticate the user.
 * Password: If supplied, it will help authenticate the user.

The setRequestHeader method
If the request method of  is invoked, then the additional step of sending the media type of   is required. The  method allows the program to send this or other HTTP headers to the web server. Its usage is. To enable the  request method:

The send method
If the request method of  is invoked, then the web server expects the form data to be read from the standard input stream. To send the form data to the web server, execute, where FormData is a text string. If the request method of  is invoked, then the web server expects only the default headers. To send the default headers, execute.

The onreadystatechange event listener
is a callback method that is periodically executed throughout the Ajax lifecycle. To set a callback method named, the syntax is. For convenience, the syntax allows for an anonymous method to be defined. To define an anonymous callback method:

The XMLHttpRequest lifecycle progresses through several stages – from 0 to 4. Stage 0 is before the  method is invoked, and stage 4 is when the text string has arrived. To monitor the lifecycle, XMLHttpRequest has available the  attribute. Stages 1-3 are ambiguous and interpretations vary across browsers. Nonetheless, one interpretation is:
 * Stage 0: Uninitialized
 * Stage 1: Loading
 * Stage 2: Loaded
 * Stage 3: Interactive
 * Stage 4: Completed

When  reaches 4, then the text string has arrived and is set in the   attribute.

Linux examples
Upon request, the browser will execute a JavaScript function to transmit a request for the web server to execute a computer program. The computer program may be the PHP interpreter, another interpreter, or a compiled executable. In any case, the JavaScript function expects a text string to be transmitted back and stored in the  attribute.

To create an example JavaScript function:


 * Edit a file named :
 * Edit a file named :

PHP example
PHP is a scripting language designed specifically to interface with HTML. Because the PHP engine is an interpreter – interpreting program statements as they are read – there are programming limitations and performance costs. Nonetheless, its simplicity may place the XMLHttpRequest set of files in the same working directory – probably.

PHP server component
The server component of a PHP XMLHttpRequest is a file located on the server that does not get transmitted to the browser. Instead, the PHP interpreter will open this file and read in its PHP instructions. The XMLHttpRequest protocol requires an instruction to output a text string.


 * Edit a file named :
 * Edit a file named :

PHP browser component
The browser component of a PHP XMLHttpRequest is a file that gets transmitted to the browser. The browser will open this file and read in its HTML instructions.


 * Edit a file named :
 * Edit a file named :


 * 1) Point your browser to
 * 2) Type in your name.
 * 3) Press

CGI example
The Common Gateway Interface (CGI) process allows a browser to request the web server to execute a compiled computer program.

CGI server component
The server component of a CGI XMLHttpRequest is an executable file located on the server. The operating system will open this file and read in its machine instructions. The XMLHttpRequest protocol requires an instruction to output a text string.

Compiled programs have two files: the source code and a corresponding executable.


 * Edit a file named :
 * Edit a file named :


 * Compile the source code to create the executable:

CGI browser component
The CGI browser component is the same as the PHP browser component, except for a slight change in the. The syntax to tell the web server to execute an executable is  followed by the filename. For security, the executable must reside in a chroot jail. In this case, the jail is the directory.


 * Edit a file named :
 * Edit a file named :


 * 1) Point your browser to
 * 2) Type in your name.
 * 3) Press