subject

You do not need to import any libraries or modules about K-means clustering because you will implement it from scratch. The template of the code is provided, and you just need to write your code at specified locations with "your code is here". Download the dataset ‘k_means_clustering_data. csv’ and save it into your working directory where we can find your source code about this homework. The dataset has two columns (‘x’ and ‘y’) and 42 records. They are 42 points in a 2D plane. Your goal is to group them into K clusters using K-means clustering algorithm. The basic step of k-means clustering is simple. Initially, we determine number of cluster K and select K centroid or center of these clusters from the dataset randomly. Then the K-means algorithm will iterate at the following steps until convergence. a. Update each centroid coordinate based on the data points in the cluster b. Measure the distance of each point in the dataset to the K centroids c. Group the point based on minimum distance this is the code provided please fill it in

def k_means_clustering(data, centroids, k):
centroid_current = centroids
centroid_last = pd. DataFrame()
clusters = pd. DataFrame()
data = pd. read_csv('k_means_clustering_data. csv')
data = [(float(x),float(y)) for x, y in data[['x','y']].values]
# iterate until convergence
while not centroid_current. equals(centroid_last):

cluster_count = 0 #it counts the number of clusters. Cluster IDs start from 0.
# calculate the distance of each point to the K centroids
for idx, position in centroid_current. iterrows():
# your code is here. Save the Euclidean distances into 'clusters'

# your code ends
cluster_count += 1

# update cluster, assign the points to clusters
clusterIDs = []
for row_idx in range(len(clusters)):
# your code is here. Check the distances at every row in 'clusters'. Save the assigned cluster IDs to points. The IDs start from 0

# your code ends
# assign points to clusters. The information is saved in the list and assigned to the dataset.
data['Cluster'] = clusterIDs

# store previous cluster
centroid_last = centroid_current

# Update the centroid of each cluster. All information are in 'data'. You have to calculate the new centroids based on the points in the same cluster.
# The centroid is the center of a list of points. For example, (x1, y1), (x2, y2), ..., (xn, yn). The centroid is (x, y), where x = the mean of (x1, x2, ..., xn) and y = the mean of (y1, y2, ..., yn).
centroids =[]
points= [] # save k lists of points in the list. The points in the same list are in the same cluster.
# your code is here. The K centroids will be saved in 'centroids', e. g. [[1, 2], [3, 4], [5, 6]]

# your code ends
centroid_current = pd. DataFrame(data=centroids, columns = ['x', 'y'])

print("No updates on clusters: ", centroid_current. equals(centroid_last))

print("Convergence! Final centroids:", centroid_current)
# plotting
print('Plotting...')
colors= ['b', 'g', 'r', 'c', 'm', 'y', 'k']

# scatter plot all points. All points are colored circles
for i in range(k):
p = np. array(points[i])
x, y = p[:,0], p[:, 1]

plt. scatter(x, y, color = colors[i])
plt. scatter(centroid_current['x'], centroid_current['y'], marker='^', color = colors[i])

# scatter plot all centroids. All points are colored triangles
for j in range(k):
plt. scatter(centroid_current. iloc[j][0], centroid_current. iloc[j][1], marker='^', color= colors[j])

plt. show()

And this is he data provided

x y
1 0
1 1
1 2
2 0
2 1
2 2
2 7
2 9
3 0
3 2
3 4
3 6
3 8
4 4
4 7
4 9
5 5
5 6
5 7
5 8
5 9
5 10
6 2
6 3
6 8
7 0
7 1
7 2
7 4
7 7
7 9
7 10
8 0
8 1
8 2
8 3
8 8
9 0
9 2
9 3
4 2
5 3

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 18:40
Adna sequence encoding a five-amino acid polypeptide is given below. …… …… locate the sequence encoding the five amino acids of the polypeptide, and identify the template and coding strand.
Answers: 1
question
Computers and Technology, 22.06.2019 10:50
Write a 3-4 page apa formatted paper comparing your organization’s it strategy with the best practices outlined in your course text. content should include, but not be limited to: developing and delivering on the it value, developing it strategy for the business value and linking it to business metrics. your paper should include an abstract and a conclusion and a reference page with 3-5 references
Answers: 1
question
Computers and Technology, 22.06.2019 16:10
When copying and pasting text, the first step is move your cursor type the text select the copy command select the paste command
Answers: 2
question
Computers and Technology, 22.06.2019 21:50
Answer the following questions regarding your system by using the commands listed in this chapter. for each question, write the command you used to obtain the answer. a. what are the total number of inodes in the root filesystem? how many are currently utilized? how many are available for use? b. what filesystems are currently mounted on your system? c. what filesystems are available to be mounted on your system? d. what filesystems will be automatically mounted at boot time?
Answers: 1
You know the right answer?
You do not need to import any libraries or modules about K-means clustering because you will impleme...
Questions
question
Mathematics, 13.03.2021 03:40
question
Chemistry, 13.03.2021 03:40
question
Mathematics, 13.03.2021 03:40
question
Mathematics, 13.03.2021 03:40
Questions on the website: 13722363