Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient

Authors

  • Priscilla Ramos Carvalho
  • Casimiro Sepúlveda Munita
  • André Luiz Lapolli

DOI:

https://doi.org/10.15392/bjrs.v7i2A.668

Keywords:

cluster analysis, cophenetic correlation coefficient, INAA.

Abstract

The literature presents many methods for partitioning of data set, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data set. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data set of 45 samples of ceramic fragments, analyzed by instrumental neutron activation analysis (INAA). The methods used for this study were: Single linkage, Complete linkage, Average linkage, Centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data set.

Downloads

Download data is not yet available.

Author Biography

Priscilla Ramos Carvalho

Instituto de Pesquisas Energéticas e Nucleares (IPEN - CNEN/SP)
Av. Professor Lineu Prestes 2242, 05508-000
São Paulo, SP, Brazil

References

FÁVERO, L. P.; BELFIORE, P.; SILVA, F. L.; CHAN, B. L. Análise de dados: modelagem multivariada para tomada de decisões, Rio de Janeiro: Elsevier, 2009.

MINGOTI, S. A. Análise de dados através de métodos de estatística multivariada: uma abordagem aplicada, Belo Horizonte: Editora UFMG, 2005.

PAPAGEORGIOU, J.; BAXTER, M. J. Model-based cluster analysis of artefact compositional data. Archaeometry, v. 43(4), p. 571-588, 2001.

TREBUNA, P.; HALCINOVÁ, J. Mathematical tools of cluster analysis. Applied Mathematics, v. 4, p. 814-816, 2013.

HAIR Jr., J. F.; ANDERSON, R. E.; TATHAM, R. L.; BLACK, C. Análise multivariada de dados, Porto Alegre: Bookman, 2005.

BARROSO, L. P.; ARTES, R. Análise multivariada, In: 48ª Região Brasileira da Sociedade Internacional de Biometria – RBRAS, 9º Simpósio de Estatística Aplicada à Experimentação Agronômica – SEAGRO, Lavras, MG, 7 a 11 de julho, 2003.

BUSSAB, W. O.; MIAZAKI, E. S.; ANDRADE, D. F. Introdução à análise de agrupamentos. São Paulo: ABE, 1990.

EVERITT, B. S.; LANDAU, S.; LEESE, M.; STAHL, D. Cluster analysis, London: Edward, 2011.

SARAÇLI, S.; DOGAN, N.; DOGAN, I. Comparison of hierarchical cluster analysis methods by cophenetic correlation. J. Inequalities and Applications, v. 203, p. 1-8, 2013.

MUNITA, C. S.; PAIVA, R. P.; ALVES, M. A.; OLIVEIRA, P. M. S.; MOMOSE, E. F. Provenance study of archaeological ceramic. J. Trace and Microprobe Techniques, v. 21(4), p. 697-706, 2003.

MURTAGH, F.; CONTRERAS, P. Methods of Hierarchical Clustering. Data Mining and Knowledge Discovery, Wiley-Interscience, v. 2(1), p. 86-97, 2012.

FLOREK, K.; LUKASZEWIEZ, L.; PERKAL L. et al. Sur la liaison et la division des points d’un ensemble fini. Colloquium Mathematicum, v. 2, p. 282-285, 1951.

SNEATH, P. H. A. The application of computers to taxonomy. J. General Microbiology, v. 17, p. 201-226, 1957.

JOHNSON, S. C. Hierarchical clustering schemes. Psychometrika, v. 32, p. 241–254, 1967.

MARDIA, K. V.; KENT, J. T.; BIBBY, J. M. Multivariate Analysis, London: Academic Press, 1989.

WARD, J. H. Hierarchical grouping to optimize an objective function. J. Applied Statistics, v. 58, p. 236-244, 1963.

SOKAL, R. R.; ROHLF, F. J. The comparison of dendrograms by objective methods. Taxon, v. 11, p. 33-40, 1962.

VENABLES, W. N.; SMITH, D. M.; THE R CORE TEAM. An introduction to R, 2017. Available at: <https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf> Last accessed: 10 Nov. 2017.

OLIVEIRA, P. M. S.; MUNITA, C. S. Influência do Valor Crítico na Detecção de Valores Discrepantes em Arqueometria, In: 48ª Reunião Anual Região Brasileira da Sociedade Internacional de Biometria, Lavras, MG, Brazil, 07-11 de julho, 2003.

OLIVEIRA, P. M. S.; MUNITA, C. S.; HAZENFRATZ, R. Comparative study between three methods of outlying detection on experimental results. J. Radioanalytical and Nuclear Chemistry, v. 283, p. 433-437, 2010.

ROHLF, F. J. Adaptative hierarquical clustering schemes”, Systematic Zoology, v. 19(1), p. 58-82, 1970.

KUIPER, F. K.; FISHER, L. A. A Monte Carlo comparison of six clustering procedures. Biometrics, v. 31, p.777-783, 1975.

MILLIGAN, G. W.; COOPER, M. C. A study of standardization of variables in cluster analysis. J. Classification, v. 5, p. 181-204, 1988.

Downloads

Published

2019-02-20

How to Cite

Carvalho, P. R., Munita, C. S., & Lapolli, A. L. (2019). Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient. Brazilian Journal of Radiation Sciences, 7(2A (Suppl.). https://doi.org/10.15392/bjrs.v7i2A.668

Issue

Section

The Meeting on Nuclear Applications (ENAN)