(1.) The table provided below shows paired data for the heights of a certain country's presidents and their main opponents
in the election campaign.
(a.) Construct a scatterplot.
(b.) Does there appear to be a correlation between the president's height and his opponent's height?
A. Yes, there appears to be a correlation. As the president's height increases, his opponent's height decreases.
B. Yes, there appears to be a correlation. As the president's height increases, his opponent's height increases.
C. Yes, there appears to be a correlation. The candidate with the highest height usually wins.
D. No, there does not appear to be a correlation because there is no general pattern to the data.
(1.) Step 1: Open the dataset in Excel
(2.) Step 2: Save as a text file
(a.)
(b.)
(3.) Step 3: Open the text file in RStudio
(a.)
(b.)
(c.)
(d.)
(e.)
(4.) Step 4: Rename the file with a suitable file name and import it into RStudio
(a.)
I used the file name:
PresidentHeightVersusOpponentHeight
This is easier because I can connect it with
XaxisVersusYaxis
The
x-axis is the President's Height
The
y-axis is the Opponent's Height
It is highly recommended to use meaningful file names in the context of the data.
(b.)
As we can see, there are 16 obs (observations) and 2 variables in the
PresidentHeightVersusOpponentHeight
dataset.
(5.)
1st Solution: plot
function with only one argument
The function is
plot
The argument is the file name:
PresidentHeightVersusOpponentHeight
By default, RStudio displays first variable (variable in the first column) as the
x-axis and the second variaable (variable in the second column) as the
y-axis.
This is a quick and easy solution
In the console window, type the command:
plot(PresidentHeightVersusOpponentHeight)
(a.)
(b.)
But here's the reason why we need more arguments:
(I.) Some people may be confused whether the correct option is Option
A. or Option
C.
Although after expanding both options and carefully comparing them with the RStudio graph, you may see the correct option.
Be it as it may, we want the graph in RStudio to exactly match the correct one in the option.
The minimum and maximum values used on the graphs in the options are different from the minimum and maximum values on the graph in RStudio
So, it is better we use adjust the one in RStudio to match the one in the options.
We shall use the arguments, each separated by a comma:
xlim = c(160, 200)
ylim = c(160, 200)
where:
xlim
is the limit for the
x-axis. This includes the minimum value and the maximum value for the
x-axis
ylim
is the limit for the
y-axis. This includes the minimum value and the maximum value for the
y-axis
c
is the function that selects and combines the values into a list. It is used when we need to pass a list (in this case: the values in both axis) as a parameter.
(II.) The points on the graph in RStudio are circles (open cirles) while the ones in the options are filled circles (closed circles).
By default, RStudio displays the points as open circles. But we want filled/shaded circles.
To fix this, we shall use the argument:
pch = 16
where:
pch
is the Plot Character
pch
= 16 is the value of the plot character for filled circle
(III.) The labels on the graph in the options are not exactly the same from the those in the RStudio graph
To label the one in RStudio accordingly, we use the argument:
xlab = "President's height"
ylab = "Opponent's height"
(6.)
2nd Solution: Let us use more arguments (the ones we just listed) with the
plot
function
plot(PresidentHeightVersusOpponentHeight, xlab = "President's height", ylab = "Opponent's height", xlim = c(160, 200), ylim = c(160, 200), pch = 16)
(a.)
(b.)
We now see that the correct option is Option
C.
The points are scattered. There is no clear trend.
Hence, there does not appear to be a correlation because there is no general pattern to the data.