Abstrakt:
The correlation-based context similarity coefficient (CSC) metric is a pattern-based fingerprint similarity metric that has gained interest in fingerprint database clustering operations. The performance of the traditional distance-based metric in fingerprint vector similarity determination has been known to be influenced by the size of the fingerprint vector. However, as for the correlation-based CSC metric, there is no comprehensive research on how fingerprint vector size affects its performance in similarity determination and subsequently clustering performance. As such, this paper examines the impact of fingerprint vector size on the similarity determination performance of the correlation-based CSC metric with the k-medoids algorithm employed for clustering. The analysis is performed across four synthetic and two experimentally generated fingerprint databases with varying fingerprint vector sizes. The impact analysis is carried out with the clustering algorithm set to generate K is an element of [3, 5] clusters. Additionally, the results are compared against three distance-based metrics: squared Euclidean, Manhattan, and cosine. With silhouette score as the clustering performance metric, the simulation result shows that the size of the fingerprint vector influences the similarity determination performance of the correlation-based CSC metric. Additionally, the number of clusters in which the clustering algorithm is set to generate also contributes to how the correlation-based CSC metric performs in similarity determination. The similarity determination time complexity of the correlation-based CSC metric increases with fingerprint vector size, making efficient clustering more challenging as fingerprint vector sizes increase. For optimal performance, the correlation-based CSC metric is recommended for us as a similarity metric on a database with a fingerprint vector size of N <= 4 and a clustering algorithm configured to generate no more than 3 clusters.