</p> <p class="has-line-data" data-line-end="3" data-line-start="0">Among clustering algorithms, K-means is the most popular. It is fast, scalable and easy to interpret. Therefore, it is almost the default first choice when data scientists want to cluster data and get insight into the inner structure of a dataset. A good introduction to this method can be found in the <a href="https://blogs.oracle.com/ai-and-datascience/post/introduction-to-k-means-clustering">Oracle datascience blog</a>.</p> <p class="has-line-data" data-line-end="3" data-line-start="0">However, there is one key parameter for K-means clustering that needs to be selected appropriately. That is K, the number of clusters for the algorithm to generate, which is left for the user to choose. For a given dataset without any prior knowledge, we cannot know for sure how many clusters naturally exist in the dataset. Choosing the wrong K often leads to an undesirable result.</p> <p class="has-line-data" data-line-end="3" data-line-start="0">There are many methods proposed to solve this problem, such as the Elbow method, <a href="https://en.wikipedia.org/wiki/Silhouette_(clustering)">Silhouette</a>, and the <a href="https://statweb.stanford.edu/~gwalther/gap">gap statistic</a>. Among these methods, the elbow method is the most intuitive to understand and easiest to implement. In this blog, we will show how to implement the elbow method to choose the parameter K using <a href="https://docs.oracle.com/en/database/oracle/machine-learning/oml4py/1/index.html">OML4Py</a> for both performance and scalability.</p> <p class="has-line-data" data-line-end="7" data-line-start="5">The elbow method works as follows. Assuming the best K lies within a range [1, n], search for the best K by running K-means over each K = 1, 2, …, n. Based on each K-means result, calculate the mean distance between data points and their cluster centroid. For short, we call it mean in-cluster distance. Naturally, if we increase K, this number will decrease. This is because the more centers, the smaller mean in-cluster distance there will be. Imagine if we have the number of clusters equal to the number of data points in the dataset: the distances will be all zero.</p> <p class="has-line-data" data-line-end="7" data-line-start="5">To select the best K, we need to plot the mean in-cluster distance for each K. As K increases from 1, before reaching the optimal K, the decrease speed is relatively fast because the number of centers are too low from the very beginning and each new center will incur a large decrease in the mean distance. But after the optimal K, the decrease is slower since the correct cluster structure is already discovered and any newly added center will appear in a certain cluster already formed. That will not decrease the mean in-cluster distance too much. The entire curve looks like an L shape and the best K lies in the turning point or the elbow of the L shape. The plot usually looks like below.</p> <p class="has-line-data" data-line-end="9" data-line-start="8"><img alt src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/0a15d11805c985a2611cf3e8c194825f/elbow_annotated.png?w=1440&ssl=1" style="width: 1008px; height: 648px;" data-recalc-dims="1" data-lazy-src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/0a15d11805c985a2611cf3e8c194825f/elbow_annotated.png?w=1440&is-pending-load=1#038;ssl=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class=" jetpack-lazy-image"><noscript><img alt="" src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/0a15d11805c985a2611cf3e8c194825f/elbow_annotated.png?w=1440&ssl=1" style="width: 1008px; height: 648px;" data-recalc-dims="1"/></noscript></p> <p class="has-line-data" data-line-end="9" data-line-start="8"><a id="NYC_Yellow_Taxi_Data_10"/>NYC Yellow Taxi Data</p> <p class="has-line-data" data-line-end="13" data-line-start="11">To illustrate this method, we use a real-world dataset from NYC taxi data, which is available <a href="http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml">online</a>.</p> <p class="has-line-data" data-line-end="13" data-line-start="11">Let’s take a look at the data:</p> <pre> <code class="has-line-data" data-line-end="17" data-line-start="14">NYC_DF = oml.sync(query = <span class="hljs-string">'select START_LON, START_LAT, END_LON, END_LAT FROM NYC_DATA_200901'</span>) NYC_DF.head() </code><code class="has-line-data" data-line-end="40" data-line-start="18"> VENDOR_NAME TRIP_PICKUP_DATETIME TRIP_DROPOFF_DATETIME PASSENGER_COUNT <span class="hljs-number">1</span> VTS <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">26</span> <span class="hljs-number">23</span>:<span class="hljs-number">04</span>:<span class="hljs-number">00</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">26</span> <span class="hljs-number">23</span>:<span class="hljs-number">05</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> CMT <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">10</span>:<span class="hljs-number">11</span>:<span class="hljs-number">33</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">10</span>:<span class="hljs-number">36</span>:<span class="hljs-number">24</span> <span class="hljs-number">1</span> <span class="hljs-number">3</span> CMT <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">11</span>:<span class="hljs-number">40</span>:<span class="hljs-number">37</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">12</span>:<span class="hljs-number">07</span>:<span class="hljs-number">09</span> <span class="hljs-number">1</span> <span class="hljs-number">4</span> VTS <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">26</span> <span class="hljs-number">16</span>:<span class="hljs-number">32</span>:<span class="hljs-number">00</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">26</span> <span class="hljs-number">16</span>:<span class="hljs-number">32</span>:<span class="hljs-number">00</span> <span class="hljs-number">2</span> <span class="hljs-number">5</span> CMT <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">06</span>:<span class="hljs-number">20</span>:<span class="hljs-number">49</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">23</span> <span class="hljs-number">06</span>:<span class="hljs-number">23</span>:<span class="hljs-number">44</span> <span class="hljs-number">1</span> <span class="hljs-number">6</span> DDS <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">06</span> <span class="hljs-number">07</span>:<span class="hljs-number">50</span>:<span class="hljs-number">21</span> <span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">06</span> <span class="hljs-number">08</span>:<span class="hljs-number">06</span>:<span class="hljs-number">08</span> <span class="hljs-number">4</span> TRIP_DISTANCE START_LON START_LAT RATE_CODE STORE_AND_FORWARD END_LON END_LAT <span class="hljs-number">1</span> <span class="hljs-number">0.71</span> -<span class="hljs-number">73.97717</span> <span class="hljs-number">40.74292</span> NA NA -<span class="hljs-number">73.98488</span> <span class="hljs-number">40.73889</span> <span class="hljs-number">2</span> <span class="hljs-number">3.70</span> -<span class="hljs-number">73.98622</span> <span class="hljs-number">40.76235</span> NA NA -<span class="hljs-number">74.00987</span> <span class="hljs-number">40.72123</span> <span class="hljs-number">3</span> <span class="hljs-number">7.50</span> -<span class="hljs-number">73.99476</span> <span class="hljs-number">40.68473</span> NA NA -<span class="hljs-number">73.98017</span> <span class="hljs-number">40.75152</span> <span class="hljs-number">4</span> <span class="hljs-number">0.99</span> -<span class="hljs-number">73.95994</span> <span class="hljs-number">40.77095</span> NA NA -<span class="hljs-number">73.94618</span> <span class="hljs-number">40.77272</span> <span class="hljs-number">5</span> <span class="hljs-number">0.60</span> -<span class="hljs-number">73.97818</span> <span class="hljs-number">40.75410</span> NA NA -<span class="hljs-number">73.97782</span> <span class="hljs-number">40.76318</span> <span class="hljs-number">6</span> <span class="hljs-number">2.10</span> -<span class="hljs-number">73.95949</span> <span class="hljs-number">40.77119</span> NA NA -<span class="hljs-number">73.97552</span> <span class="hljs-number">40.79190</span> PAYMENT_TYPE FARE_AMT SURCHARGE MTA_TAX TIP_AMT TOLLS_AMT TOTAL_AMT <span class="hljs-number">1</span> CASH <span class="hljs-number">4.1</span> <span class="hljs-number">0.5</span> NA <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">4.6</span> <span class="hljs-number">2</span> Cash <span class="hljs-number">14.9</span> <span class="hljs-number">0.0</span> NA <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">14.9</span> <span class="hljs-number">3</span> Cash <span class="hljs-number">21.7</span> <span class="hljs-number">0.0</span> NA <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">21.7</span> <span class="hljs-number">4</span> Credit <span class="hljs-number">4.9</span> <span class="hljs-number">1.0</span> NA <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">6.9</span> <span class="hljs-number">5</span> Cash <span class="hljs-number">4.1</span> <span class="hljs-number">0.0</span> NA <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">4.1</span> <span class="hljs-number">6</span> CASH <span class="hljs-number">9.7</span> <span class="hljs-number">0.0</span> NA <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">9.7</span> </code></pre> <p class="has-line-data" data-line-end="42" data-line-start="40">It will be interesting if we can generate clusters of the pickup and drop off locations, i.e., the longitude and latitude of both start and end points of taxi trips. This will display the usual patterns of trips across NYC. For example, perhaps there are more frequent trips starting from midtown to downtown Manhattan. Naturally, those particular start and end points will form clusters. Therefore, we want to do K-means clustering on the four coordinate features: START_LON, START_LAT, END_LON, END_LAT.</p> <p class="has-line-data" data-line-end="42" data-line-start="40">The remaining problem is to determine the number of clusters. We will show how to generate the elbow plot using OML4Py.</p> <p class="has-line-data" data-line-end="45" data-line-start="44">Before we start to do clustering, we need to make sure the data is clean. Usually the data contain outliers, which are extremely far from the center of the distribution. The clustering algorithm tends to form small clusters that consist of outliers and cluster everything else together, which is not useful. Let us take a look at the distribution of the coordinate features.</p> <pre> <code class="has-line-data" data-line-end="57" data-line-start="46">NYC_DF.describe().round(<span class="hljs-number">2</span>) START_LON START_LAT END_LON END_LAT count <span class="hljs-number">14092413.00</span> <span class="hljs-number">14092413.00</span> <span class="hljs-number">14092413.00</span> <span class="hljs-number">14092413.00</span> mean -<span class="hljs-number">72.85</span> <span class="hljs-number">40.14</span> -<span class="hljs-number">72.87</span> <span class="hljs-number">40.15</span> std <span class="hljs-number">9.10</span> <span class="hljs-number">4.99</span> <span class="hljs-number">8.97</span> <span class="hljs-number">4.96</span> min -<span class="hljs-number">775.45</span> -<span class="hljs-number">7.34</span> -<span class="hljs-number">784.30</span> -<span class="hljs-number">7.34</span> <span class="hljs-number">25</span>% -<span class="hljs-number">73.99</span> <span class="hljs-number">40.74</span> -<span class="hljs-number">73.99</span> <span class="hljs-number">40.74</span> <span class="hljs-number">50</span>% -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> <span class="hljs-number">75</span>% -<span class="hljs-number">73.97</span> <span class="hljs-number">40.77</span> -<span class="hljs-number">73.96</span> <span class="hljs-number">40.77</span> max <span class="hljs-number">3555.91</span> <span class="hljs-number">935.53</span> <span class="hljs-number">0.10</span> <span class="hljs-number">1809.96</span> </code></pre> <p class="has-line-data" data-line-end="58" data-line-start="57">Note that NYC is approximately located around (40, -73). But the max value of the coordinates lies above 1000. These are trips with a destination far away from NYC. Since our focus is the traffic pattern within the range of NYC, we need to remove those outliers. The following function removes rows with outliers of a given feature based on quantiles.</p> <pre> <code class="has-line-data" data-line-end="73" data-line-start="59"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">IQR</span><span class="hljs-params">(SUMMARY_DF, features)</span>:</span> result = [<span class="hljs-number">0</span>]*len(features) <span class="hljs-keyword">for</span> i, feature <span class="hljs-keyword">in</span> enumerate(features): result[i] = abs(SUMMARY_DF[feature][<span class="hljs-string">'75%'</span>] - SUMMARY_DF[feature][<span class="hljs-string">'25%'</span>]) <span class="hljs-keyword">return</span> result <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_outlier</span><span class="hljs-params">(DF, SUMMARY_DF, features)</span>:</span> iqrs = IQR(SUMMARY_DF, features) <span class="hljs-keyword"> for</span> i, iqr <span class="hljs-keyword">in</span> enumerate(iqrs): H = <span class="hljs-number">1.5</span>*iqr DF = DF[ ( DF[features[i]] > SUMMARY_DF[features[i]][<span class="hljs-string">'25%'</span>] - H ) & ( DF[features[i]] < SUMMARY_DF[features[i]][<span class="hljs-string">'75%'</span>] + H )] print(DF.shape) <span class="hljs-keyword">return</span> DF </code></pre> <p class="has-line-data" data-line-end="74" data-line-start="73">We remove the outliers by applying the function above to each coordinate, then save the result as a new table. Note that all of this computation occurs in the database, leveraging Oracle Database as a high performance compute engine through OML4Py Transparency Layer.</p> <pre> <code class="has-line-data" data-line-end="83" data-line-start="75">SUMMARY_DF = NYC_DF.describe() NYC_DF_CLEAN = remove_outlier(NYC_DF, SUMMARY_DF, NYC_DF.columns) <span class="hljs-keyword">try</span>: oml.drop(table = <span class="hljs-string">'NYC_DF_CLEAN'</span>) <span class="hljs-keyword">except</span>: print(<span class="hljs-string">"No such table"</span>) _ = NYC_DF_CLEAN.materialize(table = <span class="hljs-string">'NYC_DF_CLEAN'</span>) </code></pre> <p class="has-line-data" data-line-end="84" data-line-start="83">Let’s take a look at the range of the coordinates after the filtering.</p> <pre> <code class="has-line-data" data-line-end="96" data-line-start="85">NYC_DF_CLEAN.describe().round(<span class="hljs-number">2</span>) START_LON START_LAT END_LON END_LAT count <span class="hljs-number">12468206.00</span> <span class="hljs-number">12468206.00</span> <span class="hljs-number">12468206.00</span> <span class="hljs-number">12468206.00</span> mean -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> std <span class="hljs-number">0.02</span> <span class="hljs-number">0.02</span> <span class="hljs-number">0.02</span> <span class="hljs-number">0.02</span> min -<span class="hljs-number">74.03</span> <span class="hljs-number">40.69</span> -<span class="hljs-number">74.03</span> <span class="hljs-number">40.68</span> <span class="hljs-number">25</span>% -<span class="hljs-number">73.99</span> <span class="hljs-number">40.74</span> -<span class="hljs-number">73.99</span> <span class="hljs-number">40.74</span> <span class="hljs-number">50</span>% -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> -<span class="hljs-number">73.98</span> <span class="hljs-number">40.75</span> <span class="hljs-number">75</span>% -<span class="hljs-number">73.97</span> <span class="hljs-number">40.77</span> -<span class="hljs-number">73.97</span> <span class="hljs-number">40.77</span> max -<span class="hljs-number">73.93</span> <span class="hljs-number">40.82</span> -<span class="hljs-number">73.92</span> <span class="hljs-number">40.82</span> </code></pre> <p class="has-line-data" data-line-end="97" data-line-start="96">The range now looks much better and we are good to go!</p> <p class="has-line-data" data-line-end="103" data-line-start="98">Although there exists an implementation of K-means in sklearn, sometimes it is not the most desirable choice. If a data set is large, building each clustering model can take a significant amount of time. The default sklearn kmeans is in-memory and single threaded. It may not scale to bigger data. Especially when memory is limited, sklearn kmeans cannot even build a model.</p> <p class="has-line-data" data-line-end="103" data-line-start="98">In contrast, algorithms provided by OML have a huge advantage. Using parallel, distributed, in-database algorithm implementations can well address the issues of scalability and performance. Moreover, we can build multiple models in parallel, which also improves overall execution performance. We will leave that topic for a future blog.<br />Using the NYC Yellow Taxi dataset as an example, let us measure the time taken for both open source and in-database algorithms. Suppose we do a K-means with K =4.<br />Let us first look at the result using open source sklearn kmeans. Assume we start from the data stored in Oracle Database. We need to pull the data from the database into memory first. Here, the hardware environment we are using is an Exadata server with two nodes. Because there is sufficient memory, it is safe to do the pull. Suppose the environment does not have enough memory, then it is not even impossible to use the open source solution.</p> <p class="has-line-data" data-line-end="103" data-line-start="98">Let us see how long it takes to load the data.</p> <pre> <code class="has-line-data" data-line-end="106" data-line-start="104">X = NYC_DF_CLEAN.pull().values </code></pre> <p class="has-line-data" data-line-end="107" data-line-start="106">This takes 11.99s. Then we can train the model on the data in memory:</p> <pre> <code class="has-line-data" data-line-end="111" data-line-start="108"><span class="hljs-keyword">from</span> sklearn.cluster <span class="hljs-keyword">import</span> KMeans km_skmod = KMeans(n_clusters=<span class="hljs-number">4</span>, random_state=<span class="hljs-number">0</span>).fit(X) </code></pre> <p class="has-line-data" data-line-end="113" data-line-start="111">Training takes 513s. The two steps cost 523s.<br />Using OML K-Means, we can kick off the training inside database right away.</p> <pre> <code class="has-line-data" data-line-end="116" data-line-start="114">km_mod = oml.km(n_clusters = <span class="hljs-number">4</span>, **setting).fit(NYC_TRIMMED_1G) </code></pre> <p class="has-line-data" data-line-end="117" data-line-start="116">This takes 45.72s. We can see that the in-database algorithm is nearly 10 times faster.</p> <p class="has-line-data" data-line-end="119" data-line-start="118"><img alt src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/40c747c7b895a0349105ceba01a96e17/c2.PNG?w=1440&ssl=1" style="width: 604px; height: 368px;" data-recalc-dims="1" data-lazy-src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/40c747c7b895a0349105ceba01a96e17/c2.PNG?w=1440&is-pending-load=1#038;ssl=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class=" jetpack-lazy-image"><noscript><img alt="" src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/40c747c7b895a0349105ceba01a96e17/c2.PNG?w=1440&ssl=1" style="width: 604px; height: 368px;" data-recalc-dims="1"/></noscript></p> <p class="has-line-data" data-line-end="122" data-line-start="121">To generate the elbow plot, we need to compute the in-cluster sum of square errors for each K. In OML4Py, we can use the API .score() to get this metric. Remember the result produced by .score() is a negative number of the original metric, we need to add an abs(). With this API, we need only to create a loop to generate all the metrics needed for the elbow plot.</p> <pre> <code class="has-line-data" data-line-end="129" data-line-start="123">incluster_sum = [] <span class="hljs-keyword">for</span> cluster <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">9</span>): setting = {<span class="hljs-string">'kmns_iterations'</span>: <span class="hljs-number">5</span>, <span class="hljs-string">'KMNS_RANDOM_SEED'</span>: <span class="hljs-number">1</span>} km_mod = oml.km(n_clusters = cluster, **setting).fit(NYC_DF_CLEAN) incluster_sum.append(abs(km_mod.score(NYC_DF_CLEAN))) </code></pre> <p class="has-line-data" data-line-end="131" data-line-start="130">After we run the code, the following graph is obtained. As the number of clusters K increases from 1 to 10, the in-cluster centroid sum is decreasing fast at the first few steps and then slows down after K = 4. The point where K=4 is the elbow of this graph. This suggests to choose K=4 for clustering this dataset.</p> <p class="has-line-data" data-line-end="133" data-line-start="132"><img alt src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/78742e611bb0bcca8374aeba52c5799f/elbow2_anno.png?w=1440&ssl=1" style="width: 1008px; height: 648px;" data-recalc-dims="1" data-lazy-src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/78742e611bb0bcca8374aeba52c5799f/elbow2_anno.png?w=1440&is-pending-load=1#038;ssl=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class=" jetpack-lazy-image"><noscript><img alt="" src="https://i2.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/78742e611bb0bcca8374aeba52c5799f/elbow2_anno.png?w=1440&ssl=1" style="width: 1008px; height: 648px;" data-recalc-dims="1"/></noscript></p> <p class="has-line-data" data-line-end="138" data-line-start="135">How can we know K= 4 is a reasonable choice? Let us visualize the clustering result. In this case, the dimension of features is four, which is not easy to visualize directly. One way is to plot the start locations and then mark the start location and end location from the centroid of the clusters to have an idea of the patterns.</p> <p class="has-line-data" data-line-end="138" data-line-start="135">The figure shows that the start points form a rough shape of NYC, with most of the dots concentrated in Manhattan. Notice the square gap in the upper side of the shape: central park. We also plot the centroids, which consist of both the start and end points, with black arrows from the start to the end.</p> <p class="has-line-data" data-line-end="138" data-line-start="135">The arrows indicates the general direction of the traffic. Note that the arrow lengths at cluster 3 and 5 are very short because clusters 3 and 5 have start and end points very close to each other. This means that the trips are within uptown and downtown. The other two clusters represent traffic flows from midtown to downtown and from midtown to uptown.</p> <p class="has-line-data" data-line-end="140" data-line-start="139"><img alt src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/b8460c992c38ec0457a361f389d7219a/c4.png?w=1440&ssl=1" style="width: 1008px; height: 1008px;" data-recalc-dims="1" data-lazy-src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/b8460c992c38ec0457a361f389d7219a/c4.png?w=1440&is-pending-load=1#038;ssl=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class=" jetpack-lazy-image"><noscript><img alt="" src="https://i0.wp.com/cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/00b40098-051d-415c-be23-4ceb933d5311/Image/b8460c992c38ec0457a361f389d7219a/c4.png?w=1440&ssl=1" style="width: 1008px; height: 1008px;" data-recalc-dims="1"/></noscript></p> <p class="has-line-data" data-line-end="143" data-line-start="142">In this blog, we demonstrated how to use OML4Py and the in-database K-means clustering algorithm to implement the elbow method to choose the best number of clusters. We used data from NYC yellow taxi as an example. After the best value for K is found, we then generated a plot to visualize the results and some interesting patterns are found.</p> </p></div> <p><br /> <br /><a href="https://blogs.oracle.com/machinelearning/selecting-the-best-number-of-clusters-at-scale-using-oml4py-%E2%80%93-elbow-method"> Source link </a></p> <div class="post-views post-1512 entry-meta"> <span class="post-views-icon dashicons dashicons-chart-bar"></span> <span class="post-views-label">Post Views:</span> <span class="post-views-count">49</span> </div><!-- AddThis Advanced Settings above via filter on the_content --><!-- AddThis Advanced Settings below via filter on the_content --><!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons above via filter on the_content --><!-- AddThis Share Buttons below via filter on the_content --><div class="at-below-post addthis_tool" data-url="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/"></div><!-- AddThis Share Buttons generic via filter on the_content --><div class="sharedaddy sd-sharing-enabled"><div class="robots-nocontent sd-block sd-social sd-social-official sd-sharing"><h3 class="sd-title">Share this:</h3><div class="sd-content"><ul><li class="share-twitter"><a href="https://twitter.com/share" class="twitter-share-button" data-url="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" data-text="Selecting the best number of clusters at scale using OML4Py – Elbow Method" data-via="sitworld" >Tweet</a></li><li class="share-facebook"><div class="fb-share-button" data-href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" data-layout="button_count"></div></li><li class="share-linkedin"><div class="linkedin_button"><script type="in/share" data-url="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" data-counter="right"></script></div></li><li class="share-reddit"><div class="reddit_button"><iframe src="https://www.reddit.com/static/button/button1.html?newwindow=true&width=120&url=https%3A%2F%2Fmachinelearningmastery.in%2F2021%2F06%2F30%2Fselecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method%2F&title=Selecting%20the%20best%20number%20of%20clusters%20at%20scale%20using%20OML4Py%20%E2%80%93%20Elbow%20Method" height="22" width="120" scrolling="no" frameborder="0"></iframe></div></li><li class="share-telegram"><a rel="nofollow noopener noreferrer" data-shared="" class="share-telegram sd-button" href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/?share=telegram" target="_blank" title="Click to share on Telegram"><span>Telegram</span></a></li><li class="share-jetpack-whatsapp"><a rel="nofollow noopener noreferrer" data-shared="" class="share-jetpack-whatsapp sd-button" href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/?share=jetpack-whatsapp" target="_blank" title="Click to share on WhatsApp"><span>WhatsApp</span></a></li><li class="share-print"><a rel="nofollow noopener noreferrer" data-shared="" class="share-print sd-button" href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/#print" target="_blank" title="Click to print"><span>Print</span></a></li><li class="share-tumblr"><a class="tumblr-share-button" target="_blank" href="https://www.tumblr.com/share" data-title="Selecting the best number of clusters at scale using OML4Py – Elbow Method" data-content="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" title="Share on Tumblr">Share on Tumblr</a></li><li class="share-pinterest"><div class="pinterest_button"><a href="https://www.pinterest.com/pin/create/button/?url=https%3A%2F%2Fmachinelearningmastery.in%2F2021%2F06%2F30%2Fselecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method%2F&media=https%3A%2F%2Fi1.wp.com%2Fmachinelearningmastery.in%2Fwp-content%2Fuploads%2F2021%2F07%2Fc1.png%3Ffit%3D717%252C385%26ssl%3D1&description=Selecting%20the%20best%20number%20of%20clusters%20at%20scale%20using%20OML4Py%20%E2%80%93%20Elbow%20Method" data-pin-do="buttonPin" data-pin-config="beside"><img src="https://i2.wp.com/assets.pinterest.com/images/pidgets/pinit_fg_en_rect_gray_20.png?w=1440" data-recalc-dims="1" data-lazy-src="https://i2.wp.com/assets.pinterest.com/images/pidgets/pinit_fg_en_rect_gray_20.png?w=1440&is-pending-load=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class=" jetpack-lazy-image"><noscript><img src="https://i2.wp.com/assets.pinterest.com/images/pidgets/pinit_fg_en_rect_gray_20.png?w=1440" data-recalc-dims="1" /></noscript></a></div></li><li class="share-skype"><div class="skype-share" data-href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" data-lang="en-US" data-style="small" data-source="jetpack" ></div></li><li class="share-email"><a rel="nofollow noopener noreferrer" data-shared="" class="share-email sd-button" href="https://machinelearningmastery.in/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/?share=email" target="_blank" title="Click to email this to a friend"><span>Email</span></a></li><li class="share-end"></li></ul></div></div></div><div class='sharedaddy sd-block sd-like jetpack-likes-widget-wrapper jetpack-likes-widget-unloaded' id='like-post-wrapper-170785677-1512-6145efb0e0b0b' data-src='https://widgets.wp.com/likes/#blog_id=170785677&post_id=1512&origin=machinelearningmastery.in&obj_id=170785677-1512-6145efb0e0b0b' data-name='like-post-frame-170785677-1512-6145efb0e0b0b'><h3 class="sd-title">Like this:</h3><div class='likes-widget-placeholder post-likes-widget-placeholder' style='height: 55px;'><span class='button'><span>Like</span></span> <span class="loading">Loading...</span></div><span class='sd-text-color'></span><a class='sd-link-color'></a></div> <div id='jp-relatedposts' class='jp-relatedposts' > <h3 class="jp-relatedposts-headline"><em>Related</em></h3> </div> </div> </div><!-- .entry-content --> <div class="screen-reader-text" itemprop="datePublished" itemtype="https://schema.org/Date">2021-06-30</div> </article><!-- .entry --> <div id="loop-nav-wrap" class="loop-nav"><div class="prev">Previous Post: <a href="https://machinelearningmastery.in/2021/06/30/ethics-fairness-and-bias-in-ai/" rel="prev">Ethics, Fairness, and Bias in AI</a></div><div class="next">Next Post: <a href="https://machinelearningmastery.in/2021/06/30/managing-your-reusable-python-code-as-a-data-scientist/" rel="next">Managing Your Reusable Python Code as a Data Scientist</a></div></div><!-- .loop-nav --> <section id="comments-template"> <div id="respond" class="comment-respond"> <h3 id="reply-title" class="comment-reply-title">Leave a Reply <small><a rel="nofollow" id="cancel-comment-reply-link" href="/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/#respond" style="display:none;">Cancel reply</a></small></h3><form action="https://machinelearningmastery.in/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate><p class="comment-notes"><span id="email-notes">Your email address will not be published.</span></p><p class="comment-form-comment"><label for="comment">Comment</label> <textarea id="comment" name="comment" cols="45" rows="8" maxlength="65525" required="required"></textarea></p><p class="comment-form-author"><label for="author">Name</label> <input id="author" name="author" type="text" value="" size="30" maxlength="245" /></p> <p class="comment-form-email"><label for="email">Email</label> <input id="email" name="email" type="email" value="" size="30" maxlength="100" aria-describedby="email-notes" /></p> <p class="comment-form-url"><label for="url">Website</label> <input id="url" name="url" type="url" value="" size="30" maxlength="200" /></p> <p class="comment-form-cookies-consent"><input id="wp-comment-cookies-consent" name="wp-comment-cookies-consent" type="checkbox" value="yes" /> <label for="wp-comment-cookies-consent">Save my name, email, and website in this browser for the next time I comment.</label></p> <p class="form-submit"><input name="submit" type="submit" id="submit" class="submit" value="Post Comment" /> <input type='hidden' name='comment_post_ID' value='1512' id='comment_post_ID' /> <input type='hidden' name='comment_parent' id='comment_parent' value='0' /> </p><p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="97d61ad3b3" /></p><input type="hidden" id="ak_js" name="ak_js" value="189"/><textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100" style="display: none !important;"></textarea></form> </div><!-- #respond --> </section><!-- #comments-template --> </div><!-- #content-wrap --> </main><!-- #content --> <aside id="sidebar-primary" class="sidebar sidebar-primary hgrid-span-3 layout-narrow-right " role="complementary" itemscope="itemscope" itemtype="https://schema.org/WPSideBar"> <div class=" sidebar-wrap"> <section id="tag_cloud-3" class="widget widget_tag_cloud"><h3 class="widget-title"><span>Categories</span></h3><div class="tagcloud"><a href="https://machinelearningmastery.in/category/articles/" class="tag-cloud-link tag-link-404 tag-link-position-1" style="font-size: 11.529411764706pt;" aria-label="Articles (7 items)">Articles</a> <a href="https://machinelearningmastery.in/category/automation-anywhere/" class="tag-cloud-link tag-link-158 tag-link-position-2" style="font-size: 9.0588235294118pt;" aria-label="Automation Anywhere (2 items)">Automation Anywhere</a> <a href="https://machinelearningmastery.in/category/certification/" class="tag-cloud-link tag-link-12 tag-link-position-3" style="font-size: 10.352941176471pt;" aria-label="Certification (4 items)">Certification</a> <a href="https://machinelearningmastery.in/category/cloud/" class="tag-cloud-link tag-link-289 tag-link-position-4" style="font-size: 10.352941176471pt;" aria-label="Cloud (4 items)">Cloud</a> <a href="https://machinelearningmastery.in/category/code/" class="tag-cloud-link tag-link-511 tag-link-position-5" style="font-size: 8pt;" aria-label="Code (1 item)">Code</a> <a href="https://machinelearningmastery.in/category/database-2/" class="tag-cloud-link tag-link-593 tag-link-position-6" style="font-size: 8pt;" aria-label="Database (1 item)">Database</a> <a href="https://machinelearningmastery.in/category/data-science/" class="tag-cloud-link tag-link-9 tag-link-position-7" style="font-size: 12.117647058824pt;" aria-label="Data Science (9 items)">Data Science</a> <a href="https://machinelearningmastery.in/category/data-science-topics/" class="tag-cloud-link tag-link-530 tag-link-position-8" style="font-size: 9.0588235294118pt;" aria-label="data science topics (2 items)">data science topics</a> <a href="https://machinelearningmastery.in/category/data-science-update/" class="tag-cloud-link tag-link-13 tag-link-position-9" style="font-size: 22pt;" aria-label="Data Science Update (475 items)">Data Science Update</a> <a href="https://machinelearningmastery.in/category/deep-learning/" class="tag-cloud-link tag-link-290 tag-link-position-10" style="font-size: 11.235294117647pt;" aria-label="Deep Learning (6 items)">Deep Learning</a> <a href="https://machinelearningmastery.in/category/financial-assistance/" class="tag-cloud-link tag-link-8 tag-link-position-11" style="font-size: 8pt;" aria-label="Financial assistance (1 item)">Financial assistance</a> <a href="https://machinelearningmastery.in/category/google-cloud/" class="tag-cloud-link tag-link-583 tag-link-position-12" style="font-size: 9.0588235294118pt;" aria-label="Google Cloud (2 items)">Google Cloud</a> <a href="https://machinelearningmastery.in/category/interview-tips/" class="tag-cloud-link tag-link-181 tag-link-position-13" style="font-size: 8pt;" aria-label="Interview tips (1 item)">Interview tips</a> <a href="https://machinelearningmastery.in/category/machine-learning/" class="tag-cloud-link tag-link-11 tag-link-position-14" style="font-size: 15.647058823529pt;" aria-label="Machine Learning (39 items)">Machine Learning</a> <a href="https://machinelearningmastery.in/category/open-data-source/" class="tag-cloud-link tag-link-207 tag-link-position-15" style="font-size: 9.7647058823529pt;" aria-label="Open Data Source (3 items)">Open Data Source</a> <a href="https://machinelearningmastery.in/category/power-bi/" class="tag-cloud-link tag-link-341 tag-link-position-16" style="font-size: 9.0588235294118pt;" aria-label="Power BI (2 items)">Power BI</a> <a href="https://machinelearningmastery.in/category/project-management/" class="tag-cloud-link tag-link-409 tag-link-position-17" style="font-size: 9.7647058823529pt;" aria-label="Project Management (3 items)">Project Management</a> <a href="https://machinelearningmastery.in/category/python/" class="tag-cloud-link tag-link-2 tag-link-position-18" style="font-size: 12.117647058824pt;" aria-label="Python (9 items)">Python</a> <a href="https://machinelearningmastery.in/category/quiz-of-the-day/" class="tag-cloud-link tag-link-429 tag-link-position-19" style="font-size: 8pt;" aria-label="Quiz of the Day (1 item)">Quiz of the Day</a> <a href="https://machinelearningmastery.in/category/robotic-process-automation/" class="tag-cloud-link tag-link-159 tag-link-position-20" style="font-size: 9.0588235294118pt;" aria-label="Robotic Process Automation (2 items)">Robotic Process Automation</a> <a href="https://machinelearningmastery.in/category/r-programming/" class="tag-cloud-link tag-link-157 tag-link-position-21" style="font-size: 9.0588235294118pt;" aria-label="R Programming (2 items)">R Programming</a> <a href="https://machinelearningmastery.in/category/sas/" class="tag-cloud-link tag-link-156 tag-link-position-22" style="font-size: 10.823529411765pt;" aria-label="SAS (5 items)">SAS</a> <a href="https://machinelearningmastery.in/category/statistics/" class="tag-cloud-link tag-link-337 tag-link-position-23" style="font-size: 10.352941176471pt;" aria-label="Statistics (4 items)">Statistics</a> <a href="https://machinelearningmastery.in/category/tableau/" class="tag-cloud-link tag-link-340 tag-link-position-24" style="font-size: 9.0588235294118pt;" aria-label="Tableau (2 items)">Tableau</a> <a href="https://machinelearningmastery.in/category/visualization/" class="tag-cloud-link tag-link-10 tag-link-position-25" style="font-size: 12.352941176471pt;" aria-label="visualization (10 items)">visualization</a></div> </section><section id="newsletterwidget-2" class="widget widget_newsletterwidget"><div class="tnp tnp-widget"><form method="post" action="https://machinelearningmastery.in/?na=s"> <input type="hidden" name="nr" value="widget"><input type="hidden" name="nlang" value=""><div class="tnp-field tnp-field-email"><label for="tnp-email">Email</label> <input class="tnp-email" type="email" name="ne" value="" required></div> <div class="tnp-field tnp-field-button"><input class="tnp-submit" type="submit" value="Subscribe" > </div> </form> </div></section> <section id="recent-posts-2" class="widget widget_recent_entries"> <h3 class="widget-title"><span>Recent Posts</span></h3> <ul> <li> <a href="https://machinelearningmastery.in/2021/09/17/paradoxes-in-data-science-kdnuggets/">Paradoxes in Data Science – KDnuggets</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/17/what-2-years-of-self-teaching-data-science-taught-me/">What 2 years of self-teaching data science taught me</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/17/introducing-tensorflow-similarity-kdnuggets/">Introducing TensorFlow Similarity – KDnuggets</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/16/what-is-the-real-difference-between-data-engineers-and-data-scientists/">What Is The Real Difference Between Data Engineers and Data Scientists?</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/16/adventures-in-mlops-with-github-actions-iterative-ai-label-studio-and-nbdev/">Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/16/the-machine-deep-learning-compendium-open-book/">The Machine & Deep Learning Compendium Open Book</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/16/easy-sql-in-native-python/">Easy SQL in Native Python</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/15/launch-amazon-sagemaker-studio-from-external-applications-using-presigned-urls/">Launch Amazon SageMaker Studio from external applications using presigned URLs</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/15/kdnuggets-top-blogs-rewards-for-august-2021/">KDnuggets Top Blogs Rewards for August 2021</a> </li> <li> <a href="https://machinelearningmastery.in/2021/09/15/datacated-expo-oct-5-live-streamedexplore-new-ai-data-science-tech/">DATAcated Expo, Oct 5, Live-streamed,Explore new AI / Data Science Tech</a> </li> </ul> </section> </div><!-- .sidebar-wrap --> </aside><!-- #sidebar-primary --> </div><!-- .main-content-grid --> </div><!-- #main --> <footer id="footer" class="site-footer footer hgrid-stretch inline-nav" role="contentinfo" itemscope="itemscope" itemtype="https://schema.org/WPFooter"> <div class="hgrid"> <div class="hgrid-span-6 footer-column"> <section id="hootkit-ticker-9" class="widget widget_hootkit-ticker"> <div class="ticker-widget ticker-usercontent ticker-simple ticker-userstyle ticker-withbg ticker-style1" style="background:#f1f1f1;color:#ff4530;" ><i class="fa-weixin fab ticker-icon"></i> <div class="ticker-msg-box" data-speed='0.03'> <div class="ticker-msgs"> <div class="ticker-msg"><div class="ticker-msg-inner">Subscribe for the latest news, updates, tips and more delivered right to your inbox.</div></div> </div> </div> </div></section> </div> <div class="hgrid-span-3 footer-column"> <section id="media_image-13" class="widget widget_media_image"><img width="220" height="49" src="https://i2.wp.com/machinelearningmastery.in/wp-content/uploads/2019/12/Machine-Learning-Mastery-banner.gif?fit=220%2C49&ssl=1" class="image wp-image-127 attachment-full size-full jetpack-lazy-image" alt="" loading="lazy" style="max-width: 100%; height: auto;" data-lazy-src="https://i2.wp.com/machinelearningmastery.in/wp-content/uploads/2019/12/Machine-Learning-Mastery-banner.gif?fit=220%2C49&ssl=1&is-pending-load=1" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" /></section> </div> <div class="hgrid-span-3 footer-column"> <section id="hootkit-social-icons-8" class="widget widget_hootkit-social-icons"> <div class="social-icons-widget social-icons-small"><a href="https://github.com/machinelearningmasteryindia" class=" social-icons-icon fa-github-block" target="_blank"> <i class="fa-github fab"></i> </a><a href="mailto:machinelearningmasteryindia@gmail.com" class=" social-icons-icon fa-envelope-block"> <i class="fa-envelope fas"></i> </a><a href="https://www.linkedin.com/in/machine-learning-b065081a9/" class=" social-icons-icon fa-linkedin-block" target="_blank"> <i class="fa-linkedin-in fab"></i> </a><a href="https://twitter.com/sitworld" class=" social-icons-icon fa-twitter-block" target="_blank"> <i class="fa-twitter fab"></i> </a></div></section> </div> </div> </footer><!-- #footer --> <div id="post-footer" class=" post-footer hgrid-stretch linkstyle"> <div class="hgrid"> <div class="hgrid-span-12"> <p class="credit small"> <a class="privacy-policy-link" href="https://machinelearningmastery.in/privacy-policy/">Privacy Policy</a> Designed using <a class="theme-link" href="https://wphoot.com/themes/unos/" title="Unos WordPress Theme">Unos</a>. Powered by <a class="wp-link" href="https://wordpress.org">WordPress</a>. </p><!-- .credit --> </div> </div> </div> </div><!-- #page-wrapper --> <!--googleoff: all--><div id="cookie-law-info-bar" data-nosnippet="true"><span>This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. <a role='button' tabindex='0' class="cli_settings_button" style="margin:5px 20px 5px 20px;" >Cookie settings</a><a role='button' tabindex='0' data-cli_action="accept" id="cookie_action_close_header" class="medium cli-plugin-button cli-plugin-main-button cookie_action_close_header cli_action_button" style="display:inline-block; margin:5px; ">ACCEPT</a></span></div><div id="cookie-law-info-again" style="display:none;" data-nosnippet="true"><span id="cookie_hdr_showagain">Privacy & Cookies Policy</span></div><div class="cli-modal" data-nosnippet="true" id="cliSettingsPopup" tabindex="-1" role="dialog" aria-labelledby="cliSettingsPopup" aria-hidden="true"> <div class="cli-modal-dialog" role="document"> <div class="cli-modal-content cli-bar-popup"> <button type="button" class="cli-modal-close" id="cliModalClose"> <svg class="" viewBox="0 0 24 24"><path d="M19 6.41l-1.41-1.41-5.59 5.59-5.59-5.59-1.41 1.41 5.59 5.59-5.59 5.59 1.41 1.41 5.59-5.59 5.59 5.59 1.41-1.41-5.59-5.59z"></path><path d="M0 0h24v24h-24z" fill="none"></path></svg> <span class="wt-cli-sr-only">Close</span> </button> <div class="cli-modal-body"> <div class="cli-container-fluid cli-tab-container"> <div class="cli-row"> <div class="cli-col-12 cli-align-items-stretch cli-px-0"> <div class="cli-privacy-overview"> <h4>Privacy Overview</h4> <div class="cli-privacy-content"> <div class="cli-privacy-content-text">This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.</div> </div> <a class="cli-privacy-readmore" aria-label="Show more" tabindex="0" role="button" data-readmore-text="Show more" data-readless-text="Show less"></a> </div> </div> <div class="cli-col-12 cli-align-items-stretch cli-px-0 cli-tab-section-container"> <div class="cli-tab-section"> <div class="cli-tab-header"> <a role="button" tabindex="0" class="cli-nav-link cli-settings-mobile" data-target="necessary" data-toggle="cli-toggle-tab"> Necessary </a> <div class="wt-cli-necessary-checkbox"> <input type="checkbox" class="cli-user-preference-checkbox" id="wt-cli-checkbox-necessary" data-id="checkbox-necessary" checked="checked" /> <label class="form-check-label" for="wt-cli-checkbox-necessary">Necessary</label> </div> <span class="cli-necessary-caption">Always Enabled</span> </div> <div class="cli-tab-content"> <div class="cli-tab-pane cli-fade" data-id="necessary"> <div class="wt-cli-cookie-description"> Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information. </div> </div> </div> </div> <div class="cli-tab-section"> <div class="cli-tab-header"> <a role="button" tabindex="0" class="cli-nav-link cli-settings-mobile" data-target="non-necessary" data-toggle="cli-toggle-tab"> Non-necessary </a> <div class="cli-switch"> <input type="checkbox" id="wt-cli-checkbox-non-necessary" class="cli-user-preference-checkbox" data-id="checkbox-non-necessary" checked='checked' /> <label for="wt-cli-checkbox-non-necessary" class="cli-slider" data-cli-enable="Enabled" data-cli-disable="Disabled"><span class="wt-cli-sr-only">Non-necessary</span></label> </div> </div> <div class="cli-tab-content"> <div class="cli-tab-pane cli-fade" data-id="non-necessary"> <div class="wt-cli-cookie-description"> Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website. </div> </div> </div> </div> </div> </div> </div> </div> <div class="cli-modal-footer"> <div class="wt-cli-element cli-container-fluid cli-tab-container"> <div class="cli-row"> <div class="cli-col-12 cli-align-items-stretch cli-px-0"> <div class="cli-tab-footer wt-cli-privacy-overview-actions"> <a id="wt-cli-privacy-save-btn" role="button" tabindex="0" data-cli-action="accept" class="wt-cli-privacy-btn cli_setting_save_button wt-cli-privacy-accept-btn cli-btn">SAVE & ACCEPT</a> </div> </div> </div> </div> </div> </div> </div> </div> <div class="cli-modal-backdrop cli-fade cli-settings-overlay"></div> <div class="cli-modal-backdrop cli-fade cli-popupbar-overlay"></div> <!--googleon: all--> <div id="fb-root"></div> <script async defer crossorigin="anonymous" src="https://connect.facebook.net/en_US/sdk.js#xfbml=1&version=v8.0&appId=683648729088349&autoLogAppEvents=1"> </script> <!--Start of Tawk.to Script (0.5.5)--> <script type="text/javascript"> var Tawk_API=Tawk_API||{}; var Tawk_LoadStart=new Date(); (function(){ var s1=document.createElement("script"),s0=document.getElementsByTagName("script")[0]; s1.async=true; s1.src='https://embed.tawk.to/5ec04a92967ae56c521a742a/default'; s1.charset='UTF-8'; s1.setAttribute('crossorigin','*'); s0.parentNode.insertBefore(s1,s0); })(); </script> <!--End of Tawk.to Script (0.5.5)--> <script type="text/javascript"> window.WPCOM_sharing_counts = {"https:\/\/machinelearningmastery.in\/2021\/06\/30\/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method\/":1512}; </script> <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> <div id="fb-root"></div> <script>(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = 'https://connect.facebook.net/en_US/sdk.js#xfbml=1&appId=249643311490&version=v2.3'; fjs.parentNode.insertBefore(js, fjs); }(document, 'script', 'facebook-jssdk'));</script> <script> document.body.addEventListener( 'is.post-load', function() { if ( 'undefined' !== typeof FB ) { FB.XFBML.parse(); } } ); </script> <script type="text/javascript"> ( function () { var currentScript = document.currentScript; // Helper function to load an external script. function loadScript( url, cb ) { var script = document.createElement( 'script' ); var prev = currentScript || document.getElementsByTagName( 'script' )[ 0 ]; script.setAttribute( 'async', true ); script.setAttribute( 'src', url ); prev.parentNode.insertBefore( script, prev ); script.addEventListener( 'load', cb ); } function init() { loadScript( 'https://platform.linkedin.com/in.js?async=true', function () { if ( typeof IN !== 'undefined' ) { IN.init(); } } ); } if ( document.readyState === 'loading' ) { document.addEventListener( 'DOMContentLoaded', init ); } else { init(); } document.body.addEventListener( 'is.post-load', function() { if ( typeof IN !== 'undefined' ) { IN.parse(); } } ); } )(); </script> <script id="tumblr-js" type="text/javascript" src="https://assets.tumblr.com/share-button.js"></script> <script type="text/javascript"> ( function () { // Pinterest shared resources var s = document.createElement( 'script' ); s.type = 'text/javascript'; s.async = true; s.setAttribute( 'data-pin-hover', true ); s.src = window.location.protocol + '//assets.pinterest.com/js/pinit.js'; var x = document.getElementsByTagName( 'script' )[ 0 ]; x.parentNode.insertBefore(s, x); // if 'Pin it' button has 'counts' make container wider function init() { var shares = document.querySelectorAll( 'li.share-pinterest' ); for ( var i = 0; i < shares.length; i++ ) { var share = shares[ i ]; if ( share.querySelector( 'a span:visible' ) ) { share.style.width = '80px'; } } } if ( document.readyState !== 'complete' ) { document.addEventListener( 'load', init ); } else { init(); } } )(); </script> <script> (function(r, d, s) { r.loadSkypeWebSdkAsync = r.loadSkypeWebSdkAsync || function(p) { var js, sjs = d.getElementsByTagName(s)[0]; if (d.getElementById(p.id)) { return; } js = d.createElement(s); js.id = p.id; js.src = p.scriptToLoad; js.onload = p.callback sjs.parentNode.insertBefore(js, sjs); }; var p = { scriptToLoad: 'https://swx.cdn.skype.com/shared/v/latest/skypewebsdk.js', id: 'skype_web_sdk' }; r.loadSkypeWebSdkAsync(p); })(window, document, 'script'); </script> <div id="sharing_email" style="display: none;"> <form action="/2021/06/30/selecting-the-best-number-of-clusters-at-scale-using-oml4py-elbow-method/" method="post"> <label for="target_email">Send to Email Address</label> <input type="email" name="target_email" id="target_email" value="" /> <label for="source_name">Your Name</label> <input type="text" name="source_name" id="source_name" value="" /> <label for="source_email">Your Email Address</label> <input type="email" name="source_email" id="source_email" value="" /> <input type="text" id="jetpack-source_f_name" name="source_f_name" class="input" value="" size="25" autocomplete="off" title="This field is for validation and should not be changed" /> <img style="float: right; display: none" class="loading" src="https://machinelearningmastery.in/wp-content/plugins/jetpack/modules/sharedaddy/images/loading.gif" alt="loading" width="16" height="16" /> <input type="submit" value="Send Email" class="sharing_send" /> <a rel="nofollow" href="#cancel" class="sharing_cancel" role="button">Cancel</a> <div class="errors errors-1" style="display: none;"> Post was not sent - check your email addresses! </div> <div class="errors errors-2" style="display: none;"> Email check failed, please try again </div> <div class="errors errors-3" style="display: none;"> Sorry, your blog cannot share posts by email. </div> </form> </div> <script data-cfasync="false" type="text/javascript">if (window.addthis_product === undefined) { window.addthis_product = "wpp"; } if (window.wp_product_version === undefined) { window.wp_product_version = "wpp-6.2.6"; } if (window.addthis_share === undefined) { window.addthis_share = {}; } if (window.addthis_config === undefined) { window.addthis_config = {"data_track_clickback":true,"ui_atversion":"300"}; } if (window.addthis_plugin_info === undefined) { window.addthis_plugin_info = {"info_status":"enabled","cms_name":"WordPress","plugin_name":"Share Buttons by AddThis","plugin_version":"6.2.6","plugin_mode":"AddThis","anonymous_profile_id":"wp-2f16336e765908d13c2d341ff0393457","page_info":{"template":"posts","post_type":["post","page","e-landing-page"]},"sharing_enabled_on_post_via_metabox":false}; } (function() { var first_load_interval_id = setInterval(function () { if (typeof window.addthis !== 'undefined') { window.clearInterval(first_load_interval_id); if (typeof window.addthis_layers !== 'undefined' && Object.getOwnPropertyNames(window.addthis_layers).length > 0) { window.addthis.layers(window.addthis_layers); } if (Array.isArray(window.addthis_layers_tools)) { for (i = 0; i < window.addthis_layers_tools.length; i++) { window.addthis.layers(window.addthis_layers_tools[i]); } } } },1000) }()); </script><script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/photon/photon.min.js?ver=20191001' id='jetpack-photon-js'></script> <script src='https://machinelearningmastery.in/wp-includes/js/comment-reply.min.js?ver=5.6.5' id='comment-reply-js'></script> <script id='hoverIntent-js-extra'> var hootData = {"stickySidebar":"disable","contentblockhover":"enable","contentblockhovertext":"disable"}; </script> <script src='https://machinelearningmastery.in/wp-includes/js/hoverIntent.min.js?ver=1.8.1' id='hoverIntent-js'></script> <script src='https://machinelearningmastery.in/wp-content/themes/unos/js/jquery.superfish.min.js?ver=1.7.5' id='jquery-superfish-js'></script> <script src='https://machinelearningmastery.in/wp-content/themes/unos/js/jquery.fitvids.min.js?ver=1.1' id='jquery-fitvids-js'></script> <script src='https://machinelearningmastery.in/wp-content/themes/unos/js/jquery.parallax.min.js?ver=1.4.2' id='jquery-parallax-js'></script> <script id='ap-frontend-js-js-extra'> var ap_form_required_message = ["This field is required","accesspress-anonymous-post"]; var ap_captcha_error_message = ["Sum is not correct.","accesspress-anonymous-post"]; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/accesspress-anonymous-post/js/frontend.js?ver=2.8.1' id='ap-frontend-js-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/hootkit/assets/jquery.lightSlider.min.js?ver=1.1.2' id='jquery-lightSlider-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/hootkit/assets/widgets.min.js?ver=2.0.7' id='hootkit-widgets-js'></script> <script id='hootkit-miscmods-js-extra'> var hootkitMiscmodsData = {"ajaxurl":"https:\/\/machinelearningmastery.in\/wp-admin\/admin-ajax.php"}; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/hootkit/assets/miscmods.min.js?ver=2.0.7' id='hootkit-miscmods-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/page-links-to/dist/new-tab.js?ver=3.3.5' id='page-links-to-js'></script> <script src='https://s7.addthis.com/js/300/addthis_widget.js?ver=5.6.5#pubid=ra-5e0c443d44eeeb15' id='addthis_widget-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/vendor/automattic/jetpack-lazy-images/src/js/intersectionobserver-polyfill.min.js?ver=1.1.2' id='jetpack-lazy-images-polyfill-intersectionobserver-js'></script> <script id='jetpack-lazy-images-js-extra'> var jetpackLazyImagesL10n = {"loading_warning":"Images are still loading. Please cancel your print and try again."}; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/vendor/automattic/jetpack-lazy-images/src/js/lazy-images.min.js?ver=1.1.2' id='jetpack-lazy-images-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/postmessage.min.js?ver=9.8.1' id='postmessage-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/jquery.jetpack-resize.min.js?ver=9.8.1' id='jetpack_resize-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/likes/queuehandler.min.js?ver=9.8.1' id='jetpack_likes_queuehandler-js'></script> <script src='https://machinelearningmastery.in/wp-content/themes/unos/js/hoot.theme.min.js?ver=2.9.11' id='hoot-theme-js'></script> <script src='https://machinelearningmastery.in/wp-content/plugins/youtube-embed-plus/scripts/fitvids.min.js?ver=13.4.3' id='__ytprefsfitvids__-js'></script> <script id='wpgdprc.js-js-extra'> var wpgdprcData = {"ajaxURL":"https:\/\/machinelearningmastery.in\/wp-admin\/admin-ajax.php","ajaxSecurity":"5f20633e8d","isMultisite":"","path":"\/","blogId":""}; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/wp-gdpr-compliance/dist/js/front.min.js?ver=1629244814' id='wpgdprc.js-js'></script> <script src='https://machinelearningmastery.in/wp-includes/js/wp-embed.min.js?ver=5.6.5' id='wp-embed-js'></script> <script id='jetpack_related-posts-js-extra'> var related_posts_js_options = {"post_heading":"h4"}; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/related-posts/related-posts.min.js?ver=20210604' id='jetpack_related-posts-js'></script> <script defer src='https://machinelearningmastery.in/wp-content/plugins/akismet/_inc/form.js?ver=4.1.12' id='akismet-form-js'></script> <script id='sharing-js-js-extra'> var sharing_js_options = {"lang":"en","counts":"1","is_stats_active":"1"}; </script> <script src='https://machinelearningmastery.in/wp-content/plugins/jetpack/_inc/build/sharedaddy/sharing.min.js?ver=9.8.1' id='sharing-js-js'></script> <script id='sharing-js-js-after'> var windowOpen; ( function () { function matches( el, sel ) { return !! ( el.matches && el.matches( sel ) || el.msMatchesSelector && el.msMatchesSelector( sel ) ); } document.body.addEventListener( 'click', function ( event ) { if ( ! event.target ) { return; } var el; if ( matches( event.target, 'a.share-facebook' ) ) { el = event.target; } else if ( event.target.parentNode && matches( event.target.parentNode, 'a.share-facebook' ) ) { el = event.target.parentNode; } if ( el ) { event.preventDefault(); // If there's another sharing window open, close it. if ( typeof windowOpen !== 'undefined' ) { windowOpen.close(); } windowOpen = window.open( el.getAttribute( 'href' ), 'wpcomfacebook', 'menubar=1,resizable=1,width=600,height=400' ); return false; } } ); } )(); var windowOpen; ( function () { function matches( el, sel ) { return !! ( el.matches && el.matches( sel ) || el.msMatchesSelector && el.msMatchesSelector( sel ) ); } document.body.addEventListener( 'click', function ( event ) { if ( ! event.target ) { return; } var el; if ( matches( event.target, 'a.share-telegram' ) ) { el = event.target; } else if ( event.target.parentNode && matches( event.target.parentNode, 'a.share-telegram' ) ) { el = event.target.parentNode; } if ( el ) { event.preventDefault(); // If there's another sharing window open, close it. if ( typeof windowOpen !== 'undefined' ) { windowOpen.close(); } windowOpen = window.open( el.getAttribute( 'href' ), 'wpcomtelegram', 'menubar=1,resizable=1,width=450,height=450' ); return false; } } ); } )(); </script> <iframe src='https://widgets.wp.com/likes/master.html?ver=202137#ver=202137' scrolling='no' id='likes-master' name='likes-master' style='display:none;'></iframe> <div id='likes-other-gravatars'><div class="likes-text"><span>%d</span> bloggers like this:</div><ul class="wpl-avatars sd-like-gravatars"></ul></div> <script>!function(){window.advanced_ads_ready_queue=window.advanced_ads_ready_queue||[],advanced_ads_ready_queue.push=window.advanced_ads_ready;for(var d=0,a=advanced_ads_ready_queue.length;d<a;d++)advanced_ads_ready(advanced_ads_ready_queue[d])}();</script><script src='https://stats.wp.com/e-202137.js' defer></script> <script> _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:9.8.1',blog:'170785677',post:'1512',tz:'-5.5',srv:'machinelearningmastery.in'} ]); _stq.push([ 'clickTrackerInit', '170785677', '1512' ]); </script> </body> </html> <!-- Page generated by LiteSpeed Cache 4.4.1 on 2021-09-18 08:24:57 -->