Learning Objectives

Having finished this chapter, you should be able to:

- Interact with an RMarkdown notebook in RStudio
- Describe the difference between a variable and a function
- Describe the different types of variables
- Create a vector or data frame and access its elements
- Install and load an R library
- Load data from a file and view the data frame

This chapter is the first of several distributed throughout the book that will introduce you to increasingly sophisticated things that you can do using the R programming language. The name “R” is a play on the names of the two authors of the software package (Ross Ihaka and Robert Gentleman) as well as an homage to an older statistical software package called “S”. R has become one of the most popular programming languages for statistical analysis and “data science”. Unlike general-purpose programming languages such as Python or Java, R is purpose-built for statistics. That doesn’t mean that you can’t do more general things with it, but the place where it really shines is in data analysis and statistics.

## Reading Questions - R Intro

To check your answers put them in the appropriate box and click the 'Check' button. Every checker box can do arithmetic and calculate standard functions (see calculator help). If you give decimal answers, give them to at least 3 decimal places.

As you work you should have pencil and paper handy for calculations and thinking!

Note: some questions ask for a formula. For the checker we ask you to plug a value into the formula. For your pset you still need to give the whole formula.

At this point you should have installed R and R studio. If not, you should do that now.

The first thing to do with R is to make sure you can start it and then we will do some simple calculations. Give your answers to at least 2 decimal places of accuracy. Go ahead and start R Studio. You should see a window with 4 panes. The **command prompt** (the >) is in the bottom left window.

## R as a programming environment

R is a programming environment for statistical computing and graphics.

- serves as a data analysis and storage facility
- is designed to perform operations on vectors and matrices
- uses a well-developed but simple programming language (called S )
- allows for rapid development of new tools according to user demand

These tools are distributed as packages, which any user can download to customize the R environment.

## Saving your code

When you analyze your own data, we strongly recommend that you keep a record of all commands used, along with copious notes, so that weeks or years later you can retrace the steps of your earlier analysis.

In RStudio, you can create a text file (sometimes called a script), which contains R commands that can be reloaded and used at a later date. Under the menu at the top, choose “File”, then “New File”, and then “R Script”. This will create a new section in RStudio with the temporary name “Untitled1” (or similar). You can copy and paste any commands that you want from the Console, or type directly here. (When you copy and paste, it’s better to not include the > prompt in the script.)

If you want to keep this script for later, just hit Save under the File menu. In the future you can open this file in all the normal ways to have those commands available for use again.

It is best to type all your commands in the script window and run them from there, rather than typing directly into the console. This lets you save a record of your session so that you can more easily re-create what you have done later.

## Conditional Statements

There is yet another way to combine two statements. Suppose we have in mind a specific integer *a*. Consider the following statement about *a*.

*R* : If the integer a is a multiple of 6, then a is divisible by 2.

We immediately spot this as a true statement based on our knowledge of integers and the meanings of the words “if” and “then.” If integer a is a multiple of 6, then a is even, so therefore a is divisible by 2. Notice that *R* is built up from two simpler statements:

P : The integer a is a multiple of 6.

Q : The integer a is divisible by 2.

R : If P, then Q.

In general, given any two statements *P* and *Q* whatsoever, we can form the new statement “*If P, then Q*.” This is written symbolically as *P* ⇒ *Q* which we read as “*If P, then Q*,” or “*P implies Q*.” Like ∧ and ∨, the symbol ⇒ has a very specific meaning. When we assert that the statement *P* ⇒ *Q* is true, we mean that *if* *P* is true *then* *Q* must also be true. (In other words we mean that the condition *P* being true forces *Q* to be true.) A statement of form *P* ⇒ *Q* is called a **conditional** statement because it means *Q* will be true *under the condition* that *P* is true.

You can think of *P* ⇒ *Q* as being a promise that whenever *P* is true, *Q* will be true also. There is only one way this promise can be broken (i.e. be false) and that is if *P* is true but *Q* is false. Thus the truth table for the promise *P* ⇒ *Q* is as follows:

P | Q | P ⇒ Q |
---|---|---|

T | T | T |

T | F | F |

F | T | T |

F | F | T |

Perhaps you are bothered by the fact that *P* ⇒ *Q* is true in the last two lines of this table. Here’s an example to convince you that the table is correct. Suppose your professor makes the following promise:

**If** you pass the final exam, **then** you will pass the course.

Your professor is making the promise

(You pass the exam) ⇒ (You pass the course).

Under what circumstances did she lie? There are four possible scenarios, depending on whether or not you passed the exam and whether or not you passed the course. These scenarios are tallied in the following table.

You pass exam | You pass course | (You pass exam) ⇒ (You pass course) |
---|---|---|

T | T | T |

T | F | F |

F | T | T |

F | F | T |

The first line describes the scenario where you pass the exam and you pass the course. Clearly the professor kept her promise, so we put a *T* in the third column to indicate that she told the truth. In the second line, you passed the exam, but your professor gave you a failing grade in the course. In this case she broke her promise, and the *F* in the third column indicates that what she said was untrue.

Now consider the third row. In this scenario you failed the exam but still passed the course. How could that happen? Maybe your professor felt sorry for you. But that doesn’t make her a liar. Her only promise was that if you passed the exam then you would pass the course. She did not say passing the exam was the **only way** to pass the course. Since she didn’t lie, then she told the truth, so there is a *T* in the third column.

Finally look at the fourth row. In that scenario you failed the exam and you failed the course. Your professor did not lie she did exactly what she said she would do. Hence the *T* in the third column.

In mathematics, whenever we encounter the construction “*If P, then Q*” it means exactly what the truth table for ⇒ expresses. But of course there are other grammatical constructions that also mean *P* ⇒ *Q*. Here is a summary of the main ones.

These can all be used in the place of (and mean exactly the same thing as) “*If P, then Q*.” You should analyze the meaning of each one and convince yourself that it captures the meaning of *P* ⇒ *Q*. For example, *P* ⇒ *Q* means the condition of *P* being true is enough (i.e., sufficient) to make *Q* true hence “*P is a sufficient condition for Q*.”

The wording can be tricky. Often an everyday situation involving a conditional statement can help clarify it. For example, consider your professor’s promise:

(You pass the exam) ⇒ (You pass the course)

This means that your passing the exam is a sufficient (though perhaps not necessary) condition for your passing the course. Thus your professor might just as well have phrased her promise in one of the following ways.

Passing the exam is a sufficient condition for passing the course.

For you to pass the course, it is sufficient that you pass the exam.

However, when we want to say “*If P, then Q*” in everyday conversation, we do not normally express this as “*Q is a necessary condition for P*” or “*P only if Q*.” But such constructions are not uncommon in mathematics. To understand why they make sense, notice that *P* ⇒ *Q* being true means that it’s impossible that *P* is true but *Q* is false, so in order for *P* to be true it is necessary that *Q* is true hence “*Q is a necessary condition for P*.” And this means that *P* can only be true if *Q* is true, i.e., “*P only if Q*.”

## Study Guide :: Unit 3

In Unit 2, given a probability experiment, you assigned probabilities to various events based on probability concepts and rules. In Unit 3, you will use specific *probability distributions* to compute probabilities for various events.

You will begin by discussing probability distributions at a general level. A probability distribution describes a list of all possible outcomes for an experiment, along with the probabilities of each of these outcomes. Each outcome is described as specific values of a *random variable*. A random variable (*x*) represents a numerical value associated with each outcome of a probability experiment. After you construct a probability distribution, you can compute the mean or expected value, variance, and standard deviation of the probability distribution.

Once you understand the notion of a probability distribution in a general sense, you will examine *discrete probability distributions*. These distributions involve *discrete random variables*. Discrete random variables can assume only certain distinct values, typically determined through a counting process. In this unit, you will study the *discrete binomial probability distribution*.

The most common probability distribution that statisticians deal with is a continuous probability distribution called the *normal probability distribution*. Such a distribution is also called a bell curve or a mound-shaped curve, terms that describe the shape of the graphical representation of the probability distribution: a smooth, bell-shaped curve that is symmetrical around the mean of the distribution.

The exact shape of the normal curve, and therefore the probability distribution, is determined by the *mean* and the *standard deviation* of the distribution. A normal distribution with a mean of *zero* and a standard deviation of *one unit* is called a *standard normal probability distribution*. Any normal distribution can be transformed into a standard normal distribution using a transformation formula that converts statistical observations into the standardized values of a standard normal distribution. This transformation will enable you to compute probabilities using the Standard Normal Distribution table of probabilities (Table 4, at the back of your textbook).

When you have completed this unit, you will be ready to study topics of inferential statistics, in which you will make statements about population parameters based on sample statistics.

### Probability Distributions

##### Learning Objectives

After completing the readings and exercises assigned for this topic, you should be able to:

- Explain the meaning of the key terms:
- discrete probability distribution standard deviation of a discrete probability distribution
- mean (expected value)
- random variables discrete random variables continuous random variables
- variance

- Given a probability experiment, construct a discrete probability distribution in table and graph format.
- Given a discrete probability distribution, compute the mean, variance, and standard deviation of this distribution.
- Compute the expected value of a discrete probability distribution. Interpret your results in terms of the context of the problem.

**Important Note**: For help accessing the e-text resources referred to below, see the navigation notes under eText on the course home page.

##### Required Reading

*Elementary Statistics*, Chapter 4, Section 4.1 Probability Distributions (pages 190-196)

##### Try It Yourself Examples

Work through each Try It Yourself example in this section of the e-textbook. Check your work against the solutions provided.

##### Exercises in Your e-Textbook

Do the following exercises in your e-textbook:

Chapter 4, Section 4.1 Exercises 5, 13, 15, 25, 27, 29, 31, 37 (pages 197-200). Write out the step-by-step solutions or explanations. Check your work against the solutions provided.

### Binomial Distributions

##### Learning Objectives

After completing the readings and exercises assigned for this topic, you should be able to:

- Explain the meaning of the key terms:
- binomial experiment
- mean and standard deviation of a binomial distribution.

- Given a word problem, identify the problem as a binomial experiment.
- Compute binomial probabilities using binomial tables.
- Given a binomial probability distribution, compute the mean and standard deviation of this distribution.

**Important Note**: For help accessing the e-text resources referred to below, see the navigation notes under eText on the course home page.

##### Required Reading

*Elementary Statistics*, Chapter 4, Section 4.2 Binomial Distributions (pages 201-209)

##### Try It Yourself Examples

Work through each Try It Yourself example in this section of the e-textbook. Check your work against the solutions provided.

##### Exercises in Your e-Textbook

Chapter 4, Section 4.2 Exercises 11, 13, 15, 19 (pages 210-211). Write out the step-by-step solutions or explanations. Check your work against the solutions provided.

##### Optional Multimedia Resources

Additional optional multimedia resources related to Chapter 4 Section 4.2 are available on the textbook publisher&rsquos MyStatLab website.

### Chapter 4 Review ( Extra Online Practice )

For more practice working with the topics in this chapter of the e-textbook, work through this review. Or, if you feel you have mastered this material, you may skip to Computer Lab 3A.

##### Review Learning Objectives

Before proceeding to the online exercises, briefly review the Learning Objectives for each of the following topics, which are presented in previous sections of this study guide.

##### Optional Practice in Study Plan at MyStatLab

For more practice on the topics/sections of this chapter of your e-textbook, visit MyStatLab, and work interactively through the exercises in the Study Plan. For help accessing this resource, see MyStatLab navigation hints on the course home page.

### Computer Lab 3A

##### Computer Lab 3A Detailed Instructions

In Computer Lab 3A, you will learn to use StatCrunch to develop solutions to exercises related to topics in Chapter 4 of your e-text.

For Computer Lab 3A activities, and step-by-step instructions (Guided Solutions) to familiarize you with StatCrunch, see the Computer Lab 3A file.

##### Computer Lab 3A Quick Reviews

The Quick Reviews (QRs) summarize a few key steps (but not all steps) needed to complete each Activity in Computer Lab 3A. These QRs will be useful when you are preparing for the computer components of the assignments, midterm exam, and final exam. To access, the QRs, click Computer Lab 3A QRs.

### Introduction to Normal Distributions

##### Learning Objectives

After completing the readings and exercises assigned for this topic, you should be able to:

- Explain the meaning of the key terms:
- continuous random variable
- normal distribution
- standard normal distribution
*z*-score

- Describe the key properties of a normal distribution.
- Describe the key properties of a standard normal distribution.
- Using standard normal distribution tables, find the numerical values for areas under the standard normal curve.
- Using standard normal distribution tables, find the probabilities associated with different
*z*-score intervals.*We strongly suggest that you first sketch the corresponding area under the standard normal curve, before using the standard normal distribution tables.*

**Important Note**: For help accessing the e-text resources referred to below, see the navigation notes under eText on the course home page.

##### Required Reading

*Elementary Statistics*, Chapter 5, Section 5.1 Introduction to Normal Distributions and the Standard Normal Distribution

##### Try It Yourself Examples

Work through each Try It Yourself example in this section of the e-textbook. Check your work against the solutions provided.

##### Exercises in Your e-Textbook

Do the following exercises in your e-textbook:

Chapter 5, Section 5.1 Exercises 17, 19, 21, 23, 27, 31. Write out the step-by-step solutions or explanations. Check your work against the solutions provided.

##### Optional Multimedia Resources

Additional optional multimedia resources related to Chapter 5 Section 5.1 are available on the textbook publisher&rsquos MyStatLab website.

### Normal Distributions: Finding Probabilities

##### Learning Objective

After completing the readings and exercises assigned for this topic, you should be able to achieve the following learning objective.

- Using the standard normal distribution tables at the back of the textbook, find the probabilities associated with different
*x*intervals for normal distributions with any mean and standard deviation.*We strongly suggest that you first sketch the corresponding area under the normal curve before using the standard normal distribution tables.*

**Important Note**: For help accessing the e-text resources referred to below, see the navigation notes under eText on the course home page.

##### Required Reading

*Elementary Statistics*, Chapter 5, Section 5.2

##### Try It Yourself Examples

##### Exercises in Your e-Textbook

Do the following exercises in your e-textbook:

Chapter 5, Section 5.2 Exercises 13, 15, 17, 19. Write out the step-by-step solutions or explanations. Check your work against the solutions provided.

##### Optional Multimedia Resources

Additional optional multimedia resources related to Chapter 5 Section 5.2 are available on the textbook publisher&rsquos MyStatLab website.

### Normal Distributions: Finding Values

##### Learning Objectives

After completing the readings and exercises assigned for this topic, you should be able to:

- Using standard normal distribution tables (e.g., Appendix B, Table 4), find the
*z*-scores associated with different areas under the normal curve. - Using standard normal distribution tables (e.g., Appendix B in the e-text), find the
*z*-scores associated with different percentiles. - Find the
*x*-value corresponding to a given*z*-score. - Given a normal probability for a normal distribution with any mean and standard deviation, first sketch the given area (probability) under the normal curve, and then use standard normal distribution tables (e.g., Appendix B in the e-text) to find a specific data value (
*x*-value).

**Important Note**: For help accessing the e-text resources referred to below, see the navigation notes under eText on the course home page.

##### Required Reading

*Elementary Statistics*, Chapter 5, Section 5.3 Normal Distributions: Finding Values (pages 252-256)

##### Try It Yourself Examples

##### Exercises in Your e-Textbook

Do the following exercises in your e-textbook:

Chapter 5, Section 5.3 Exercises 1, 3, 5, 9, 13, 17, 19, 21, 31, 39 (pages 257-259). Write out the step-by-step solutions or explanations. Check your work against the solutions provided.

##### Optional Multimedia Resources

Additional optional multimedia resources related to Chapter 5 Section 5.3 are available on the textbook publisher&rsquos MyStatLab website.

### Chapter 5 Review ( Extra Online Practice )

For more practice working with the topics in Sections 1-3 of chapter 5 of the e‑textbook, work through this review. Or, if you feel you have mastered this material, you may skip Computer Lab 3B.

##### Review Learning Objectives

Before proceeding to the online exercises, briefly review the Learning Objectives for each of the following topics, which are presented in previous sections of this study guide.

- Introduction to Normal Distributions
- Normal Distributions: Finding Probabilities
- Normal Distributions: Finding Values

##### Optional Practice in Study Plan at MyStatLab

For more practice on the topics/sections of this chapter of your e-textbook, visit MyStatLab, and work interactively through the exercises in the Study Plan. For help accessing this resource, see MyStatLab navigation hints on the course home page.

### Computer Lab 3B

##### Computer Lab 3B Detailed Instructions

In Computer Lab 3B, you will learn to use StatCrunch to develop solutions to exercises related to topics in Chapter 5 of your e-text.

Your Computer Lab 3B activities, and step-by-step instructions (Guided Solutions) to familiarize you with StatCrunch, are in the Computer Lab 3B file on your course home page.

##### Computer Lab 3B Quick Reviews

The Quick Reviews (QRs) summarize a few key steps (but not all steps) needed to complete each Activity in Computer Lab 3B. These QRs will be useful when you are preparing for the computer components of the assignments, midterm exam, and final exam. To access, the QRs, click Computer Lab 3B QRs.

### Self-Test 3

To access Self-Test 3, click MATH 216 Self-Test 3.

It is important that you work through all the exercises in the unit self-tests and the e-text chapter quizzes. No grades are assigned to the self-tests. They are designed to, along with the unit assignments, help you master the content presented in each unit.

Each unit self-test has two parts: one on theory (A) and one on computer work (B). Working through these will help you review key exercises in the unit, which will help you prepare for assignments and exams.

### Assignment 3

After completing Self-Test 3, complete Assignment 3, which you will find on the course home page. Submit your solutions to this assignment to your tutor for marking.

## Linear Algebra Concepts

### Vectors

Each vertex of a character can be referred to as a vector with *n*-number of components. Each of these components represent a displacement along the x, y or z-axis. For example, a vertex represented as a vector (2,3,1) represents a *displacement* of two units along the x-axis three units along the y-axis one unit along the z-axis.

Vectors have no concept of position. Two vectors located at **different** positions in a coordinate system are identical if they have the same magnitude and direction.

### Matrix

The usefulness of a matrix in computer graphics is its ability to transform geometric data into different coordinate systems. A matrix is composed of elements arranged in rows and columns. The rows and columns of a matrix determines the **dimension** of a matrix.

A matrix containing 2 rows and 3 columns is of dimension **2x3**. Dimensions in matrix arithmetic is very important, since some operations are not possible unless matrices have identical dimensions.

### Transformation

A vector’s coordinate system can be rotated, scaled or skewed. How this occurs depends on the elements of the **Transformation Matrix**. Transformation matrices that rotate, scale or skew a coordinate system are called **Rotation**, **Scale** and **Skew** transformation matrices, respectively.

When a vector is multiplied by a **Rotation Transformation Matrix**, the elements of the matrix manipulate the vector and rotate its coordinate system. The same applies to scale or skew transformations.

A rotation transformation matrix can rotate a coordinate system about the x, y or z-axis. Rotation transformation matrices can also be combined to form double or triple rotations. These type of rotations are called **Euler Rotations**. For example, we can combine a rotation *about the x-axis* with a rotation *about the y-axis*, producing a new transformation matrix that will rotate the coordinate system about the x and y axis, simultaneously.

## 3: Introduction to R - Mathematics

Hagelstein's weak triangle inequality for weak L1 norm used in this paper is discussed here:

R. Vershynin, Weak triangle inequalities for weak L1 norm, unpublished.

Conference version:

On the effective measure of dimension in total variation minimization, *Sampling Theory and Applications (SampTA)*, 2015, IEEE, 593--597.

Conference version: D. Needell, R. Vershynin, Signal recovery from incomplete and inaccurate measurements via ROMP, *SAMPTA'09 (8th international conference on Sampling Theory and Applications)*, 2009.

Another conference version: D. Needell, R. Vershynin, Greedy signal recovery and uncertainty principles, *Computational Imaging VI, IS&T/SPIE 19th Annual Symposium*, 2008. Proc. SPIE Vol. 6814, 68140J.

Conference version: T. Strohmer, R. Vershynin, A randomized solver for linear systems with exponential convergence, *Approximation, randomization and combinatorial optimization*, 499--507, Lecture Notes in Comput. Sci., 4110, Springer, Berlin, 2006.

Conference version: R. Vershynin, Beyond Hirsch Conjecture: walks on random polytopes and smoothed complexity of the simplex method, *FOCS 2006 (47th Annual Symposium on Foundations of Computer Science)*, 133--142.

Conference version: M. Rudelson, R. Vershynin, Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements, *CISS'06 (40th Annual Conference on Information Sciences and Systems)*, 2006, 207--212.

Conference version: E. Candes, M. Rudelson, T. Tao, R. Vershynin, Error correction via Linear Programming, *FOCS 2005 (46th Annual Symposium on Foundations of Computer Science)*, 668--681.

## Introduction to Multilevel Modeling, Chapter 3 | R Textbook Examples

Note: This page is designed to show the how multilevel model can be done using R and to be able to compare the results with those in the book.

On this page we will use the

lmerfunction which is found in thelme4package. There are several other possible choices but we will go withlmer.The data were downloaded in Stata format from here and imported into R using the

foreignlibrary from a directory calledrdataon the local computer. This page is updated using R 2.11.1 in January, 2011.

Table 3.2, page 46. OLS regression lines over 10 schools.

Two equations at the top of page 47.

Equation near bottom of page 47 and Table 3.3.

Equation near bottom of page 49 and Table 3.4.

Equation at the bottom of page 50 and Table 3.5. The negative value for the interaction coefficient in the book is probably a typo error, it should be positive.

## 3: Introduction to R - Mathematics

A (very) short introduction to R

Here you'll find three documents that my colleague Paul Torfs and I wrote about learning R:

The base document, with 10 pages of background and exercises and 2 pages listing useful functions (to use as a reference). Working through this document takes 1 to 2 hours (depending on your background). An old version of this document can also be downloaded from the R website (as contributed document), but the newest version can always be found here.

Instead of reading the pdf and doing the ToDo exercises, you can also go through the text and exercises in an interactive environment called swirl (developed by swirlstats.com). This short manual gets you started with the (very) short introduction to R. It also points you to some nice follow-up classes created by others.

This swirl course is relatively new, so it may still contain errors. If you find any, let me know on the issues page.

After learning the basics, you have to gain experience in building R scripts. In this document you learn to set up a script step by step. The examples are from hydrology, but the exercises are useful for everyone.

In case you want to take R everywhere you go (you may want to install the programs on a USB stick in case of administrator rights issues).

Here we collect scripts for hydrological data analysis, which you can adapt for your own application.

To learn R step by step, we made 8 self study modules of 1-3 hours (depending on your background):

- A (very) short introduction to R
- R Programming MOOC first part
- R Programming MOOC second part
- Basic plotting
- Pretty plotting
- Reading data files
- Matrix operations
- Spatial data

Modules 2 and 3 are swirl lessons belonging to the course R Programming (the credits go to the developers of that course). Modules 4-8 are script writing assignments. More information can be found in the folder called Self study modules.

/>

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.