Talk:HCS clustering algorithm

1. HCS clustering algorithm article shouldn't be orphan, it could be linked from:

http://en.wikipedia.org/wiki/Cluster_analysis

http://en.wikipedia.org/wiki/Data_mining

http://en.wikipedia.org/wiki/Bioinformatics

2. There are plenty of cites of the HCS clustering algorithm which can be found by googling: a clustering algorithm based on graph connectivity.

For example, two works that builds (and cites) on HCS clustering algorithm are:

a. ''Mining coherent dense sub-graphs across massive biological networks for functional discovery. Haiyan Hu1, Xifeng Yan2, Yu Huang1, Jiawei Han2 and Xianghong Jasmine Zhou1,∗ Vol. 21 Suppl. 1 2005, pages i213–i221 doi:10.1093/bioinformatics/bti1049''

They write in the article:"MODES is developed based on HCS (Mining Highly Connected Sub-graphs) (Hartuv and Shamir, 2000), with two new features: (1) MODES..."

b. ''MOHCS: Towards Mining Overlapping Highly Connected Subgraphs. Xiahong Lin, Lin Gao, Kefei Chen, and David K. Y. Chiu. CoRR (2008)''

They write in the article:Among those that are most related to our work, [6] (HCS clustering algorithm) provides a definition of highly connected sub-graph that is valid and useful in practice. There, the HCS algorithm is one of the most well-known clustering algorithms and has been widely used in various domains such as gene expression analysis [7] and functional module discovery [5, 8-10].

Please pay attention that several heuristics are found in the original HCS paper that solves many problems that other authors tackled and solved in their versions of HCS. For example:Removing Low Degree Vertices in the early stage of HCS, and later add them to the clustering. In the wiki page its only briefly mentioned, please refer to the articles for complete description (listed also at the References section at the end of the article wiki page):

1. Hartuv, Erez, and Ron Shamir. "A clustering algorithm based on graph connectivity." Information processing letters 76, no. 4 (2000): 175-181.

2. E Hartuv, A O Schmitt, J Lange, S Meier-Ewert, H Lehrach, R Shamir. "An algorithm for clustering cDNA fingerprints." Genomics 66, no. 3 (2000): 249-256.

These two articles cite many references and sources, on both aspects algorithmic and application, which can be copied to the article wiki page:

Ahuja, R. K., Magnanti, T. A., and Orlin, J. B. (1993). “Network Flows: Theory, Algorithms and Applications,” Prentice Hall, Englewood Cliffs, NJ.

Alizadeh, F., Karp, R. M., Newberg, L. A., and Weisser, D. K. (1995). Physical mapping of chromosomes: A combinatorical problem in molecular biology. Algorithmica 13: 52–76.

Bonaldo, M. F., Lennon, G., and Soares, M. B. (1996). Normalization and subtraction: Two approaches to facilitate gene discovery. Genome Res. 6: 791–806.

Crkvenjakov, R., Drmanac G., Lennon S., Drmanac I., Labat R., and Lehrach, H. (1991). Partial sequencing by oligohybridization: Concept and applications in genome analysis. In “Proceedings of the First International Conference on Electrophoresis Supercomputing and the Human Genome” (C. Cantor and H. Lim, Eds.), pp. 60–75, World Scientific, Singapore.

Drmanac, S., and Drmanac, R. (1994). Processing of cDNA and genomic kilobase-size clones for massive screening mapping and sequencing by hybridization. BioTechniques 17: 328–336.

Drmanac, S., Stavropoulos, N. A., Labat, I., Vonau, J., Hauser, B., Soares, M. B., and Drmanac, R. (1996). Gene-representing cDNA clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes. Genomics 37: 29–40.

Fodor, S. P., Rava, R. P., Huang, X. C., Pease, A. C., Holmes, C. P., and Adams, C. L. (1993). Multiplexed biochemical assays with biological chips. Nature 364: 555–556.

Garey, M. R., and Johnson, D. S. (1979). “Computers and Intractability: A Guide to the Theory of np-Completeness,” Freeman, San Francisco.

Hansen, P., and Jaumard, B. (1997). Cluster analysis and mathematical programming. Math. Programming 79: 191–215.

Hartigan, J. A. (1975). “Clustering Algorithms,” Wiley, New York.

Hartuv, E. (1998). “Cluster Analysis by Highly Connected Subgraphs with Applications to cDNA Clustering,” M.Sc. thesis, Department of Computer Science, Tel Aviv University.

Hartuv, E., and Shamir, R. “A Clustering Algorithm Based on Graph Connectivity,” Technical report, Tel Aviv University, Department of Computer Science. Submitted for publication. [Manuscript available at http:www/math.tau.ac.il/;shamir/papers/hcs.ps]

Hartuv, E., Schmitt, A., Lange, J., Meier-Ewert, S., Lehrach, H., and Shamir, R. (1999). An algorithm for clustering cDNAs for gene expression analysis using short oligonucleotide fingerprints. In “Proceedings Third International Symposium on Computational Molecular Biology (RECOMB 99),” pp. 188–197. ACM Press, New York.

Jardine, N., and Sibson, R. (1971). “Mathematical Taxonomy,” Wiley, London.

Karger, D. R. (1996). Minimum cuts in near linear time. In “Proceedings of the 28th Annual ACM Symposium on Theory of Computing,” pp. 56–63.

Kinzler, L., Zhang, W., Zhou, V. E., Velculescu, S. E., Kern, R. H., Hruban, S. R., Hamilton, B., and Vogelstein, K. W. (1997). Gene expression profiles in normal and cancer cells. Science 276: 1268– 1272.

Lennon, G. S., and Lehrach, H. (1991). Hybridization analysis of arrayed cDNA libraries. Trends Genet. 7: 60–75.

Matula, D. W. (1969). The cohesive strength of graphs. In “The Many Facets of Graph Theory,” (G. Chartrand and S. F. Kapoor, Eds.), Lecture Notes in Mathematics No. 110, pp. 215–221, Springer- Verlag, Berlin.

Matula, D. W. (1970). Cluster analysis via graph theoretic techniques. In “Proceedings of the Louisiana Conference on Combinatorics, Graph Theory and Computing,” (R. C. Mullin, K. B. Reid, and D. P. Roselle, Eds.), pp. 199–212, University of Manitoba, Winnipeg.

Matula, D. W. (1972). k-Components, clusters and slicings in graphs. Siam J. Appl. Math. 22(3): 459–480.

Matula, D. W. (1977). Graph theoretic techniques for cluster analysis algorithms. In “Classification and Clustering” (J. Van Ryzin, Ed.), pp. 95–129, Academic Press, San Diego.

Matula, D. W. (1987). Determining edge connectivity in O(nm). In “Proceedings of the 28th IEEE Symposium on Foundations of Computer Science,” pp. 249–251. Computer Society Press of the IEEE, Washington DC.

Mayraz, G., and Shamir, R. (1999). Construction of physical maps from oligonucleotide fingerprints data. J. Comput. Biol. 6(2): 237– 252.

Mehlhorn, and Naher (1995). LEDA: A platform for combinatorial and geometric computing. Commun. ACM 38(1): 96–102.

Meier-Ewert, S., Rothe, J., Mott, R., and Lehrach, H. (1994). Establishing catalogues of expressed sequences by oligonucleotide fingerprinting of cDNA libraries. In “Identification of Transcribed Sequences” (U. Hochgeschwender, Ed.), pp. 253–260, Plenum, New York.

Meier-Ewert, S., Lange, J., Gerst, H., Herwig, R., Schmitt, A., Freund, J., Elge, T., Mott, R., Herrmann, B., and Lehrach, H. (1998). Comparative gene expression profiling by oligonucleotide fingerprinting. Nucleic Acids Res. 26(9): 2216–2223.

Michiels, F., Craig, A. G., Zehetner, G., Smith, G. P., and Lehrach, H. (1987). Molecular approaches to genome analysis: A strategy for the construction of ordered overlapping clone libraries. Cabios 3(3): 203–210.

Milosavljevic, A., Strezoska, Z., Zeremski, M., Grujic, D., Paunesku, T., and Crkvenjakov, R. (1995). Clone clustering by hybridization. Genomics 27: 83–89.

Mirkin, B. (1996). “Mathematical Classification and Clustering,” Kluwer Academic, Dordrecht/Norwell, MA. Nagamochi, H., and Ibaraki, T. (1992). Computing edge connectivity in multigraphs and capacitated graphs. Siam J. Disc. Math. 5: 54–66.

Platt, D. M., and Dix, T. I. (1997). Comparison of clone-ordering algorithms used in physical mapping. Genomics 40: 490–492.

Poustka, A. J., Herwig, R., Krause, A., Hennig, S., Meier-Ewert, S., and Lehrach, H. (1999). Toward the gene catalogue of sea urchin development: The construction and analysis of an unfertilized egg cDNA library highly normalized by oligonucleotide fingerprinting. Genomics 59: 122–133.

Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). “Molecular Cloning: A Laboratory Manual,” 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O., and Davis, R. W. (1996). Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. USA 93: 10614–10619.

Sokal, R. R. (1977). Clustering and classification: Background and current directions. In “Classification and Clustering” (J. Van Ryzin, Ed.), pp. 1–15, Academic Press, San Diego.

Stoer, M., and Wagner, F. (1997). A simple Min-Cut algorithm. J. ACM 44(4): 585–591.

Vicentic, R., Drmanac S., Drmanac I., Labat R., Crkvenjakov A., and Gemmell, A. (1992). Sequencing by hybridization: Towards an automated sequencing of one million M13 clones arrayed on membranes. Electrophoresis 13: 566–573.

Wu, Z., and Leahy, R. (1993). An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Machine Intelligence 15(11): 1101– 1113. ErezHartuv (talk) 09:36, 18 August 2013 (UTC) ErezHartuv (talk) 09:22, 18 August 2013 (UTC) ErezHartuv (talk) 08:04, 18 August 2013 (UTC)