subject

Assume you have the following code

/* Accumulate in temporary */
void inner4(vec_ptr u, vec_ptr v, data_t *dest)
{
long int i;
int length = vec_length(u);
data_t *udata = get_vec_start(u);
data_t *vdata = get_vec_start(v);
data_t sum = (data_t) 0;
for (i = 0; i < length; i++) {
sum = sum + udata[i] * vdata[i];
}
*dest = sum;
}
and you modify the code to use 4-way loop unrolling and four parallel accumulators. Measurements for this function with the x86-64 architecture shows it achieves a CPE of 2.0 for all types of data.

Assuming the model of the Intel i7 architecture shown in class (one branch unit, two arithmetic units, one load and one store unit), the performance of this loop with any arithmetic operation can not get below 2.0 CPE because of Answerthe number of available registersthe number of available load unitsthe number of available integer unitsthe number of available floating point units.

When the same 4x4 code is compiled for the IA32 architecture, it achieves a CPE of 2.75, worse than the CPE of 2.25 achieved with just four-way unrolling. The mostly likely reason this occurs is because of Answerthe number of available registersthe number of available load unitsthe number of available integer unitsthe number of available floating point units.

ansver
Answers: 3

Another question on Computers and Technology

question
Computers and Technology, 24.06.2019 01:30
Hazel has just finished adding pictures to her holiday newsletter. she decides to crop an image. what is cropping an image?
Answers: 1
question
Computers and Technology, 24.06.2019 11:00
Each row in a database is a set of unique information called a(n) ? a.) table. b.) record. c.) object. d.) field.
Answers: 2
question
Computers and Technology, 24.06.2019 15:30
What is not a type of text format that will automatically be converted by outlook into a hyperlink?
Answers: 1
question
Computers and Technology, 24.06.2019 17:00
Aman travel 200m towards east< br /> from his house then takes left< br /> to turn and moves 200 m north< br /> find the displacement & distance.< br />
Answers: 3
You know the right answer?
Assume you have the following code

/* Accumulate in temporary */
void inner4(vec_p...
Questions
question
Mathematics, 07.10.2019 18:00
question
Mathematics, 07.10.2019 18:00
Questions on the website: 13722363