Sensitivity to distance metric of revealed patterns in democratic indices
Abstract
The use of different distance metrics may affect the results in analyzing data through spatial embedding. Here, we investigate the sensitivity of three distance metrics: Jensen-Shannon (JSD), cosine, and Euclidean distance. We perform this study using the Varieties of Democracy (V-dem) data on 24 of its democracy indicators. We generate concatenated distributions that describe the indicator measurements categorized by either the year or country. We perform a cross-sensitivity analysis on these concatenated distributions and study their effects in clustering. Specific to the categorizations applied in the V-dem dataset, our results suggest that the JSD is more sensitive to capturing differences between entries that are more similar to each other, while the same is true for the cosine distance when the entries are less similar to each other. Relative sensitivity between the distance measurements are affected by binning and sample sizes, where a suitable binning is important to prevent a breakaway in the sensitivity of the distance metric. Adaptive binning and weighting of the variables maybe used to better highlight certain aspects in the dataset.