Computers and Technology, 27.08.2021 01:00 heavenwagner

We will practice building a machine learning algorithm using a new dataset, iris, that provides multiple predictors for us to use to train. To start, we will remove the setosa species and we will focus on the versicolor and virginica iris species using the following code: library(caret)
data(iris)
iris <- iris[-which(iris$Species=='setosa') , ]
y <- iris$Species
The following questions all involve work with this dataset.
1. First let us create an even split of the data into train and test partitions using createDataPartition() from the caret package. The code with a missing line is given below:
# set. seed(2) # if using R 3.5 or earlier
set. seed (2, sample. kind="Rounding") # if using R 3.6 or later
# line of code
test <- iris[test_index, ]
train <- iris[-test_index, ]
2. Which code should be used in place of # line of code above?
a. test_index <- createDataPartition(y, times=1, p=0.5)
b. test_index <- sample(2, length(y), replace=FALSE)
c. test_index <- createDataPartition(y, times=1, p=0.5, list=FALSE)
d. test_index <- rep(1, length(y))
Note: for this question, you may ignore any warning message generated by the code. If you have R 3.6 or later, you should always use the sample. kind argument in set. seed for this course.
3. Next we will figure out the singular feature in the dataset that yields the greatest overall accuracy when predicting species. You can use the code from the introduction and from Q1 to start your analysis.
Using only the train iris dataset, for each feature, perform a simple search to find the cutoff that produces the highest accuracy, predicting virginica if greater than the cutoff and versicolor otherwise. Use the seq function over the range of each feature by intervals of 0.1 for this search. Which feature produces the highest accuracy?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. Width
4. For the feature selected in Q8, use the smart cutoff value from the training data to calculate overall accuracy in the test data. What is the overall accuracy?
Notice that we had an overall accuracy greater than 96% in the training data, but the overall accuracy was lower in the test data. This can happen often if we overtrain. In fact, it could be the case that a single feature is not the best choice. For example, a combination of features might be optimal. Using a single feature and optimizing the cutoff as we did on our training data can lead to overfitting.
Given that we know the test data, we can treat it like we did our training data to see if the same feature with a different cutoff will optimize our predictions. Repeat the analysis in Q8 but this time using the test data instead of the training data. Which feature best optimizes our overall accuracy when using the test set?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. width
5. Now we will perform some exploratory data analysis on the data.
plot(iris, pch=21, bg=iris$Species)
Notice that Petal. Length and Petal. width in combination could potentially be more information than either feature alone. Optimize the the cutoffs for Petal. Length and Petal. width separately in the train dataset by using the seq function with increments of 0.1. Then, report the overall accuracy when applied to the test dataset by creating a rule that predicts virginica if Petal. Length is greater than the length cutoff OR Petal. Width is greater than the width cutoff, and versicolor otherwise. What is the overall accuracy for the test data now?

Answers: 2

Show answers

Another question on Computers and Technology

Computers and Technology, 22.06.2019 13:00

Why the bear has a slunky tail determine the meaning of the word slunk in the story

Answers: 1

Answer

Computers and Technology, 23.06.2019 02:30

People with high self-esteem: accept their strengths and weaknesses. believe that failed experiences are failures of their whole self. feel good about who they are only when they reach total success. need positive external experiences to counteract negative feelings that constantly plague them.

Answers: 1

Answer

Computers and Technology, 23.06.2019 09:30

Facial expressions and gestures are examples of messages.

Answers: 3

Answer

Computers and Technology, 23.06.2019 11:00

This chapter lists many ways in which becoming computer literate is beneficial. think about what your life will be like once you’re started in your career. what areas of computing will be most important for you to understand? how would an understanding of computer hardware and software you in working from home, working with groups in other countries and contributing your talents.

Answers: 1

Answer

You know the right answer?

We will practice building a machine learning algorithm using a new dataset, iris, that provides mult...

Questions