Content

時系列データ群の騰落変動パターンに基づくクラスタリングアルゴリズムによる分類およびその結果比較です。
非類似度分類アルゴリズム(quoted from https://www.rdocumentation.org/packages/TSclust/versions/1.2.4/topics/diss )
  1. “ACF” Autocorrelation-based method. diss.ACF.
  2. “AR.LPC.CEPS” Linear Predictive Coding ARIMA method. This method has two value-per-series arguments, the ARIMA order, and the seasonality. diss.AR.LPC.CEPS.
  3. “AR.MAH” Model-based ARMA method. diss.AR.MAH.
  4. “AR.PIC” Model-based ARMA method. This method has a value-per-series argument, the ARIMA order. diss.AR.PIC.
  5. “CDM” Compression-based dissimilarity method. diss.CDM.
  6. “CID” Complexity-Invariant distance. diss.CID.
  7. “COR” Correlation-based method. diss.COR.
  8. “CORT” Temporal Correlation and Raw values method. diss.CORT.
  9. “DTWARP” Dynamic Time Warping method. diss.DTWARP.
  10. “DWT” Discrete wavelet transform method. diss.DWT.
  11. “EUCL” Euclidean distance. diss.EUCL. For many more convetional distances, see link[stats]{dist}, though you may need to transpose the dataset.
  12. “FRECHET” Frechet distance. diss.FRECHET.
  13. “INT.PER” Integrate Periodogram-based method. diss.INT.PER.
  14. “NCD” Normalized Compression Distance. diss.NCD.
  15. “PACF” Partial Autocorrelation-based method. diss.PACF.
  16. “PDC” Permutation distribution divergence. Uses the pdc package. pdcDist for additional arguments and details. Note that series given by numeric matrices are interpreted row-wise and not column-wise, opposite as in pdcDist.
  17. “PER” Periodogram-based method. diss.PER.
  18. “PRED” Prediction Density-based method. This method has two value-per-series agument, the logarithm and difference transform. diss.PRED.
  19. “MINDIST.SAX” Distance that lower bounds the Euclidean, based on the Symbolic Aggregate approXimation measure. diss.MINDIST.SAX.
  20. “SPEC.LLR” Spectral Density by Local-Linear Estimation method. diss.SPEC.LLR.
  21. “SPEC.GLK” Log-Spectra Generalized Likelihood Ratio test method. diss.SPEC.GLK.
  22. “SPEC.ISD” Intregated Squared Differences between Log-Spectras method. diss.SPEC.ISD.

時系列データ1系列のサンプルサイズは20とし、以下の3つの変動パターンそれぞれ10系列、計30系列の時系列データをサンプルとして作成します。
  1. Group A:上昇、下落、上昇、下落\(\cdots\)
  2. Group B:上昇、上昇、下落、下落、上昇、上昇、下落、下落\(\cdots\)
  3. Group C:上昇、下落、上昇、上昇、下落、上昇、下落、上昇、上昇、下落\(\cdots\)
set.seed(20190201)
library(dplyr)
library(tidyr)
library(ggplot2)
fun_tsdata <- function(pattern, n, rep, unif_min, unif_max, group) {
    buf <- sapply(seq(n), function(x) {
        return(cumsum(rep(pattern, rep) * runif(n = length(pattern) * rep, min = unif_min, max = unif_max)))
    }) %>% data.frame()
    colnames(buf) <- paste0(group, seq(ncol(buf)))
    return(buf)
}
n <- 10
unif_min <- 1
unif_max <- 10
ts01 <- fun_tsdata(pattern = c(1, -1), n = n, rep = 10, unif_min = unif_min, unif_max = unif_max, group = "A")
ts02 <- fun_tsdata(pattern = c(1, 1, -1, -1), n = n, rep = 5, unif_min = unif_min, unif_max = unif_max, group = "B")
ts03 <- fun_tsdata(pattern = c(1, -1, 1, 1, -1), n = n, rep = 4, unif_min = unif_min, unif_max = unif_max, group = "C")
tsdata <- cbind(ts01, ts02, ts03)
tsdata <- data.frame(N = seq(nrow(tsdata)), tsdata)
tsdata_tidy <- gather(data = tsdata, key = "key", value = "value", colnames(tsdata)[2:ncol(tsdata)])
tsdata_tidy$group <- substring(text = tsdata_tidy$key, 1, 1)

非類似度分類アルゴリズム毎の適合度一覧です。
library(TSclust)
library(dendextend)
k <- length(unique(tsdata_tidy$group))
METHODs <- c("ACF", "AR.LPC.CEPS", "AR.MAH", "AR.PIC", "CDM", "CID", "COR", "CORT", "DTWARP", "DWT", "EUCL", "FRECHET", "INT.PER", "NCD", "PACF", "PDC", "PER", "PRED", "MINDIST.SAX", "SPEC.LLR", "SPEC.GLK", "SPEC.ISD")
cluster_eval <- data.frame()
hclust_method <- "complete"
cnt <- 1
setwd(output_image_dir)
for (mmm in seq(METHODs)) {
    diss_method <- METHODs[mmm]
    tryCatch({
        d <- diss(SERIES = tsdata[, -1], METHOD = diss_method)
        h <- hclust(d = d, method = hclust_method)
        cluster_eval[cnt, 1] <- METHODs[mmm]
        cluster_eval[cnt, 2] <- cluster.evaluation(rep(seq(k), rep(n, k)), as.factor(cutree(tree = h, k = k)))
        cluster_eval[cnt, 3] <- hclust_method
        # plot
        dendrogramData <- as.dendrogram(h)
        pngFile <- paste0("AMCC-dendrogram", formatC(cnt, width = 2, flag = "0"), ".png")
        png(file = pngFile, width = 900, height = 600)
        par(mar = c(0, 0, 1, 3))
        dendrogramData %>% set("branches_k_color", k = k) %>% set("branches_lwd", c(1)) %>% set("branches_lty", c(1)) %>% set("labels_colors", k = k) %>% set("labels_cex", c(1)) %>% plot(horiz = T, nodePar = list(pch = c(1), cex = c(1), lab.cex = 0.8), type = "r", center = T, dLeaf = NULL, edge.root = F, frame.plot = F, axes = F, main = paste0(diss_method, " / Evaluation = ", cluster_eval[cnt, 2]))
        dendrogramData %>% rect.dendrogram(k = k, horiz = T)
        dev.off()
        # plot
        cnt <- cnt + 1
    }, error = function(e) {
    })
}
colnames(cluster_eval) <- c("Dissimilarity measure method", "Evaluation", "Hierarchical clustering agglomeration method")
   Dissimilarity measure method Evaluation Hierarchical clustering agglomeration method
1                           ACF  0.8374728                                     complete
2                   AR.LPC.CEPS  0.9665831                                     complete
3                        AR.PIC  0.7619048                                     complete
4                           CDM  0.5396825                                     complete
5                           CID  0.5324284                                     complete
6                           COR  0.6187739                                     complete
7                          CORT  1.0000000                                     complete
8                        DTWARP  0.5103276                                     complete
9                           DWT  0.4920635                                     complete
10                         EUCL  0.5555556                                     complete
11                      FRECHET  0.4849624                                     complete
12                      INT.PER  0.6666667                                     complete
13                          NCD  0.5313390                                     complete
14                         PACF  1.0000000                                     complete
15                          PDC  1.0000000                                     complete
16                          PER  1.0000000                                     complete
17                     SPEC.LLR  0.7701149                                     complete
18                     SPEC.GLK  0.9326599                                     complete
19                     SPEC.ISD  0.4922342                                     complete