Skip to content

Commit e4c946d

Browse files
committed
5-10 more algorithms
1 parent 866f9b5 commit e4c946d

File tree

7 files changed

+297
-20
lines changed

7 files changed

+297
-20
lines changed

DESCRIPTION

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
Package: FCPS
22
Type: Package
33
Title: Fundamental Clustering Problems Suite
4-
Version: 1.3.2
5-
Date: 2023-03-18
4+
Version: 1.3.3
5+
Date: 2023-05-28
66
Authors@R: c(person("Michael", "Thrun", email= "[email protected]",role=c("aut","cre","cph"), comment = c(ORCID = "0000-0001-9542-5543")),person("Peter", "Nahrgang",role=c("ctr","ctb")),person("Felix", "Pape",role=c("ctr","ctb")),person("Vasyl","Pihur", role=c("ctb")),person("Guy","Brock", role=c("ctb")),person("Susmita","Datta", role=c("ctb")),person("Somnath","Datta", role=c("ctb")),person("Luis","Winckelmann", role=c("com")),person("Alfred", "Ultsch",role=c("dtc","ctb")),person("Quirin", "Stier",role=c("ctb","rev")))
77
Maintainer: Michael Thrun <[email protected]>
88
Description: Over sixty clustering algorithms are provided in this package with consistent input and output, which enables the user to try out algorithms swiftly. Additionally, 26 statistical approaches for the estimation of the number of clusters as well as the mirrored density plot (MD-plot) of clusterability are implemented. The packages is published in Thrun, M.C., Stier Q.: "Fundamental Clustering Algorithms Suite" (2021), SoftwareX, <DOI:10.1016/j.softx.2020.100642>. Moreover, the fundamental clustering problems suite (FCPS) offers a variety of clustering challenges any algorithm should handle when facing real world data, see Thrun, M.C., Ultsch A.: "Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems" (2020), Data in Brief, <DOI:10.1016/j.dib.2020.105501>.
99
Imports: mclust, ggplot2, DataVisualizations
10-
Suggests: kernlab, cclust, dbscan, kohonen, MCL, ADPclust, cluster, DatabionicSwarm, orclus, subspace, flexclust, ABCanalysis, apcluster, pracma,EMCluster, pdfCluster, parallelDist, plotly, ProjectionBasedClustering, GeneralizedUmatrix, mstknnclust, densityClust, parallel, energy, R.utils, tclust, Spectrum, genie, protoclust, fastcluster, clusterability, signal, reshape2, PPCI, clustrd, smacof, rgl,prclust, CEC, dendextend, moments,prabclus, VarSelLCM, sparcl, mixtools, HDclassif, clustvarsel, yardstick, knitr, rmarkdown, igraph, leiden, clusterSim, NetworkToolbox, randomForest, ConsensusClusterPlus, RWeka
10+
Suggests: mlpack, kernlab, cclust, dbscan, kohonen, MCL, ADPclust, cluster, DatabionicSwarm, orclus, subspace, flexclust, ABCanalysis, apcluster, pracma,EMCluster, pdfCluster, parallelDist, plotly, ProjectionBasedClustering, GeneralizedUmatrix, mstknnclust, densityClust, parallel, energy, R.utils, tclust, Spectrum, genie, protoclust, fastcluster, clusterability, signal, reshape2, PPCI, clustrd, smacof, rgl,prclust, CEC, dendextend, moments,prabclus, VarSelLCM, sparcl, mixtools, HDclassif, clustvarsel, yardstick, knitr, rmarkdown, igraph, leiden,clustMixType, clusterSim, NetworkToolbox, randomForest, ConsensusClusterPlus, RWeka
1111
Depends: R (>= 3.5.0)
1212
License: GPL-3
1313
LazyData: TRUE

R/DBscan.R

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
1+
DBSCAN = DBscan=function(Data,Radius,minPts,Rcpp=TRUE,PlotIt=FALSE,UpperLimitRadius,...){
22
# Cls=DBSCAN(FCPS$Hepta$Data,sqrt(min(res$withinss)))
33
# DBSCAN based on [Ester et al., 1996]
44
#
@@ -10,6 +10,7 @@ DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
1010
# minPts In principle minimum number of points in the unit disk, if the unit disk is within the cluster (core) [Ester et al., 1996, p. 228].
1111
# number of minimum points in the eps region (for core points).
1212
# Default is 5 points.
13+
# Rcpp TRUE: uses rcpp fast version
1314
# PlotIt Boolean. Decision to plot or not
1415
# UpperLimitRadius Limit for radius search, experimental
1516
#
@@ -21,6 +22,23 @@ DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
2122
#
2223
# [Ester et al., 1996] Ester, M., Kriegel, H.-P., Sander, J., & Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. Kdd, Vol. 96, pp. 226-231, 1996.
2324

25+
if(isTRUE(Rcpp)){
26+
if (!requireNamespace('mlpack',quietly = TRUE)) {
27+
message(
28+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
29+
Please install the package which is defined in "Suggests".'
30+
)
31+
return(
32+
list(
33+
Cls = rep(1, nrow(Data)),
34+
Object = "Subordinate clustering package (mlpack) is missing.
35+
Please install the package which is defined in 'Suggests'."
36+
)
37+
)
38+
}
39+
}else{
40+
41+
2442
if (!requireNamespace('dbscan',quietly = TRUE)) {
2543
message(
2644
'Subordinate clustering package (dbscan) is missing. No computations are performed.
@@ -34,7 +52,7 @@ DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
3452
)
3553
)
3654
}
37-
55+
}
3856
if(is.null(nrow(Data))){# Then we get a vector
3957
return(cls <- rep(1,length(Data)))
4058
}
@@ -55,14 +73,14 @@ DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
5573
if(missing(UpperLimitRadius))
5674
UpperLimitRadius=1.1*Radius
5775

58-
liste=dbscan::dbscan(x = Data,eps = Radius,minPts = minPts,...)
59-
Cls=liste$cluster
76+
if(isTRUE(Rcpp)){
77+
liste=mlpack::dbscan(input = Data,epsilon = Radius,min_size = minPts,...)
78+
Cls=as.vector(liste$assignments)
79+
}else{
80+
liste=dbscan::dbscan(x = Data,eps = Radius,minPts = minPts,...)
81+
Cls=liste$cluster
82+
}
6083
ind=which(Cls==0)
61-
# if(length(ind)>0)
62-
# Cls[ind]=999
63-
#Cls=NormalizeCls(Cls)$normalizedCls
64-
#if(length(ind)>0)
65-
# Cls[ind]=NaN
6684
Cls[!is.finite(Cls)]=0
6785
# Per Definition are not clustered objects in searching for
6886
# distance and density based structures not allowed.
@@ -74,7 +92,6 @@ DBSCAN = DBscan=function(Data,Radius,minPts,PlotIt=FALSE,UpperLimitRadius,...){
7492
liste=out$DBscanObject
7593
}
7694
if(isTRUE(PlotIt)){
77-
7895
Cls2=Cls
7996
Cls2[Cls2==0]=999
8097
p=ClusterPlotMDS(Data,Cls2)

R/MeanShiftClustering.R

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
MeanShiftClustering=function(Data,PlotIt=FALSE,...){
2+
# Cls=MeanShiftClustering(Data,ClusterNo=2)
3+
# Clustering by mean shift
4+
#
5+
# INPUT
6+
# Data[1:n,1:d] Data set with n observations and d features
7+
#
8+
# OPTIONAL
9+
# PlotIt Boolean. Decision to plot or not
10+
#
11+
# OUTPUT
12+
# Cls[1:n] Clustering of data
13+
# Object Object of mlpack::mean_shift algorithm
14+
#
15+
# Author: MT 05/2023
16+
#Cheng, Yizong ( 1995). "Mean Shift, Mode Seeking, and Clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (8): 790–799. CiteSeerX 10.1.1.510.1222. doi:10.1109/34.400568.
17+
if (!requireNamespace('mlpack',quietly = TRUE)) {
18+
message(
19+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
20+
Please install the package which is defined in "Suggests".'
21+
)
22+
return(
23+
list(
24+
Cls = rep(1, nrow(Data)),
25+
Object = "Subordinate clustering package (mlpack) is missing.
26+
Please install the package which is defined in 'Suggests'."
27+
)
28+
)
29+
}
30+
res = mlpack::mean_shift(input = Data,labels_only = T, ...)
31+
Cls = as.vector(res$output)+1
32+
33+
if (PlotIt) {
34+
ClusterPlotMDS(Data , Cls)
35+
}
36+
Cls = ClusterRename(Cls, Data)
37+
38+
return(list(Cls=Cls,Object=res))
39+
}

R/kmeansClustering.R

Lines changed: 166 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000,PlotIt=FALSE,Verbose=FALSE,...){
1+
kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000,CategoricalData,PlotIt=FALSE,Verbose=FALSE,...){
22
# Cls <- kmeansClustering(DataOrDistances,ClusterNo,Verbose);
33
# calls one of two common approaches for kmeans
44
#
@@ -9,6 +9,7 @@ kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000
99
# Type Kind of kmeans algorithm. Choose one of the two following strings:
1010
# "Hartigan": Hartigan, J. A. and Wong, M. A. A K-means clustering algorithm. Applied Statistics 28, 100-108, 1979.
1111
# "LBG": Linde,Y.,Buzo,A.,Gray,R.M., An algorithm for vector quantizer design. IEEE Transactions on Communications, COM-28, 84-95, 1980
12+
# ’pelleg-moore’, ’elkan’, ’hamerly’,’dualtree’, or ’dualtree-covertree’
1213
# RandomNo Only for Steinley method or in case of distance matrix, number of random initializations with
1314
# searching for minimal SSE, see [Steinley/Brusco, 2007]
1415
# PlotIt Boolean. Decision tgo plot or not
@@ -22,7 +23,10 @@ kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000
2223
# Adaption to Mdbt and documentation standards
2324
if (!isSymmetric(unname(DataOrDistances))) {
2425
#Data = DataOrDistances
25-
26+
if(missing(CategoricalData)&Type=="kprototypes"){
27+
warning("kmeansClustering: CategoricalData cannot be missing if Type is 'kprototypes'. Setting type to default")
28+
Type="Hartigan"
29+
}
2630
if (ClusterNo < 2) {
2731
warning("ClusterNo should be an integer > 2. Now, all of your data is in one cluster.")
2832
if (is.null(nrow(DataOrDistances))) {
@@ -83,6 +87,41 @@ kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000
8387
Object = res,
8488
Centroids = Centroids
8589
))
90+
},'kprototypes' = {
91+
if (!requireNamespace('clustMixType',quietly = TRUE)) {
92+
message(
93+
'Subordinate clustering (clustMixType) package is missing. No computations are performed.
94+
Please install the package which is defined in "Suggests".'
95+
)
96+
return(
97+
list(
98+
Cls = rep(1, nrow(DataOrDistances)),
99+
Object = "Subordinate clustering (clustMixType) package is missing.
100+
Please install the package which is defined in 'Suggests'."
101+
)
102+
)
103+
}
104+
DataOrDistancesWithFactors=as.data.frame(DataOrDistances)
105+
CategoricalData=as.data.frame(CategoricalData)
106+
for(i in 1:ncol(CategoricalData)){
107+
CategoricalData[,i]=as.factor(CategoricalData[,i])
108+
}
109+
DataOrDistancesWithFactors=cbind(DataOrDistancesWithFactors,CategoricalData)
110+
111+
res = clustMixType::kproto(x = DataOrDistancesWithFactors, k = ClusterNo, ...)#verbose=FALSE,
112+
113+
Cls = as.numeric((res$cluster))
114+
115+
Centroids=res$centers
116+
if (PlotIt) {
117+
ClusterPlotMDS(DataOrDistances, Cls)
118+
}
119+
Cls = ClusterRename(Cls, DataOrDistances)
120+
return(list(
121+
Cls = Cls,
122+
Object = res,
123+
Centroids = Centroids
124+
))
86125
},
87126
'LBG' = {
88127
if (!requireNamespace('cclust',quietly = TRUE)) {
@@ -149,6 +188,131 @@ kmeansClustering <-function(DataOrDistances,ClusterNo=2,Type='LBG',RandomNo=5000
149188
),
150189
Centroids = res$centers
151190
))
191+
},"Pelleg-moore"={
192+
if (!requireNamespace('mlpack',quietly = TRUE)) {
193+
message(
194+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
195+
Please install the package which is defined in "Suggests".'
196+
)
197+
return(
198+
list(
199+
Cls = rep(1, nrow(DataOrDistances)),
200+
Object = "Subordinate clustering package (mlpack) is missing.
201+
Please install the package which is defined in 'Suggests'."
202+
)
203+
)
204+
}
205+
res = mlpack::kmeans(input = DataOrDistances, clusters = ClusterNo, algorithm = tolower(Type),labels_only = T, ...)
206+
Cls = as.vector(res$output)+1
207+
if (PlotIt) {
208+
ClusterPlotMDS(DataOrDistances, Cls)
209+
}
210+
Cls = ClusterRename(Cls, DataOrDistances)
211+
return(list(
212+
Cls = Cls,
213+
Object = res,
214+
Centroids = res$centroid
215+
))
216+
},"Elkan"={
217+
if (!requireNamespace('mlpack',quietly = TRUE)) {
218+
message(
219+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
220+
Please install the package which is defined in "Suggests".'
221+
)
222+
return(
223+
list(
224+
Cls = rep(1, nrow(DataOrDistances)),
225+
Object = "Subordinate clustering package (mlpack) is missing.
226+
Please install the package which is defined in 'Suggests'."
227+
)
228+
)
229+
}
230+
res = mlpack::kmeans(input = DataOrDistances, clusters = ClusterNo, algorithm = tolower(Type),labels_only = T, ...)
231+
Cls = as.vector(res$output)+1
232+
if (PlotIt) {
233+
ClusterPlotMDS(DataOrDistances, Cls)
234+
}
235+
Cls = ClusterRename(Cls, DataOrDistances)
236+
return(list(
237+
Cls = Cls,
238+
Object = res,
239+
Centroids = res$centroid
240+
))
241+
},"Hamerly"={
242+
if (!requireNamespace('mlpack',quietly = TRUE)) {
243+
message(
244+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
245+
Please install the package which is defined in "Suggests".'
246+
)
247+
return(
248+
list(
249+
Cls = rep(1, nrow(DataOrDistances)),
250+
Object = "Subordinate clustering package (mlpack) is missing.
251+
Please install the package which is defined in 'Suggests'."
252+
)
253+
)
254+
}
255+
res = mlpack::kmeans(input = DataOrDistances, clusters = ClusterNo, algorithm = tolower(Type),labels_only = T, ...)
256+
Cls = as.vector(res$output)+1
257+
if (PlotIt) {
258+
ClusterPlotMDS(DataOrDistances, Cls)
259+
}
260+
Cls = ClusterRename(Cls, DataOrDistances)
261+
return(list(
262+
Cls = Cls,
263+
Object = res,
264+
Centroids = res$centroid
265+
))
266+
},"Dualtree"={
267+
if (!requireNamespace('mlpack',quietly = TRUE)) {
268+
message(
269+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
270+
Please install the package which is defined in "Suggests".'
271+
)
272+
return(
273+
list(
274+
Cls = rep(1, nrow(DataOrDistances)),
275+
Object = "Subordinate clustering package (mlpack) is missing.
276+
Please install the package which is defined in 'Suggests'."
277+
)
278+
)
279+
}
280+
res = mlpack::kmeans(input = DataOrDistances, clusters = ClusterNo, algorithm = tolower(Type),labels_only = T, ...)
281+
Cls = as.vector(res$output)+1
282+
if (PlotIt) {
283+
ClusterPlotMDS(DataOrDistances, Cls)
284+
}
285+
Cls = ClusterRename(Cls, DataOrDistances)
286+
return(list(
287+
Cls = Cls,
288+
Object = res,
289+
Centroids = res$centroid
290+
))
291+
},"Dualtree-covertree"={
292+
if (!requireNamespace('mlpack',quietly = TRUE)) {
293+
message(
294+
'Subordinate clustering package (mlpack) is missing. No computations are performed.
295+
Please install the package which is defined in "Suggests".'
296+
)
297+
return(
298+
list(
299+
Cls = rep(1, nrow(DataOrDistances)),
300+
Object = "Subordinate clustering package (mlpack) is missing.
301+
Please install the package which is defined in 'Suggests'."
302+
)
303+
)
304+
}
305+
res = mlpack::kmeans(input = DataOrDistances, clusters = ClusterNo, algorithm = tolower(Type),labels_only = T, ...)
306+
Cls = as.vector(res$output)+1
307+
if (PlotIt) {
308+
ClusterPlotMDS(DataOrDistances, Cls)
309+
}
310+
Cls = ClusterRename(Cls, DataOrDistances)
311+
return(list(
312+
Cls = Cls,
313+
Object = res,
314+
Centroids = res$centroid
315+
))
152316
},
153317
{#lloyd, forgy, mac queen
154318
res = kmeans(DataOrDistances, centers = ClusterNo, algorithm = Type, ...)

man/DBscan.Rd

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@ DBSGrid <- expand.grid(
6363
)
6464
BestAcc = c()
6565
for (i in seq_len(nrow(DBSGrid))) {
66-
print(i)
6766
parameters <- DBSGrid[i,]
6867
Cls9 = DBSCAN(
6968
Data,

man/MeanShiftClustering.Rd

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
\name{MeanShiftClustering}
2+
\alias{MeanShiftClustering}
3+
\title{Mean Shift Clustering}
4+
\description{
5+
Mean Shift Clustering of [Cheng, 1995]
6+
}
7+
\usage{
8+
MeanShiftClustering(Data,
9+
10+
PlotIt=FALSE,...)
11+
}
12+
13+
\arguments{
14+
\item{Data}{[1:n,1:d] matrix of dataset to be clustered. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features.}
15+
16+
17+
\item{PlotIt}{Default: FALSE, If TRUE plots the first three dimensions of the dataset with colored three-dimensional data points defined by the clustering stored in \code{Cls}}
18+
19+
\item{\dots}{Further arguments to be set for the clustering algorithm, if not set, default arguments are used.}
20+
}
21+
22+
\details{
23+
the radius used for search can be specified with the "\code{radius}" parameter. The maximum number of iterations before algorithm termination is controlled with the "\code{max_iterations}" parameter.
24+
25+
If the distance between two centroids is less than the given radius, one will be removed. A radius of 0 or less means an estimate will be calculated and used for the radius. Default value "0" (numeric).
26+
}
27+
\value{
28+
List of
29+
\item{Cls}{[1:n] numerical vector with n numbers defining the classification as the main output of the clustering algorithm. It has k unique numbers representing the arbitrary labels of the clustering.}
30+
\item{Object}{Object defined by clustering algorithm as the other output of this algorithm}
31+
}
32+
33+
\examples{
34+
data('Hepta')
35+
out=MeanShiftClustering(Hepta$Data,PlotIt=FALSE)
36+
}
37+
\author{Michael Thrun}
38+
39+
\references{
40+
[Cheng, 1995] Cheng, Yizong: Mean Shift, Mode Seeking, and Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17 (8), pp. 790799, doi:10.1109/34.400568, 1995.
41+
}
42+
\keyword{MeanShiftClustering}
43+
\keyword{Clustering}
44+
\concept{Large Application Clusteringg}
45+
\keyword{clara}

0 commit comments

Comments
 (0)