Abstract
The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences.This condition places a significant burden on scientists and computers. Some genomics studies depend on clusteringtechniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning thatcan be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms areexamples of algorithms that can be used for clustering. Consequently, clustering is a common approach that dividesan input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study usedthree models to cluster a brain tumor dataset. The first model uses FCM, which is used to cluster genes. FCM allowsan object to belong to two or more clusters with a membership grade between zero and one and the sum of belongingto all clusters of each gene is equal to one. This paradigm is useful when dealing with microarray data. The totaltime required to implement the first model is 22.2589 s. The second model combines FCM and particle swarmoptimization (PSO) to obtain better results. The hybrid algorithm, i.e., FCM– PSO, uses the DB index as objectivefunction. The experimental results show that the proposed hybrid FCM–PSO method is effective. The total time ofimplementation of this model is 89.6087 s. The third model combines FCM with a genetic algorithm (GA) to obtainbetter results. This hybrid algorithm also uses the DB index as objective function. The experimental results show thatthe proposed hybrid FCM–GA method is effective. Its total time of implementation is 50.8021 s. In addition, thisstudy uses cluster validity indexes to determine the best partitioning for the underlying data. Internal validity indexesinclude the Jaccard, Davies Bouldin, Dunn, Xie–Beni, and silhouette. Meanwhile, external validity indexes includeMinkowski, adjusted Rand, and percentage of correctly categorized pairings. Experiments conducted on brain tumorgene expression data demonstrate that the techniques used in this study outperform traditional models in terms ofstability and biological significance
Recommended Citation
anabee, Omar Al- and Sarray, Basad Al
(2022)
"Evaluation Algorithms Based on Fuzzy C-means for the DataClustering of Cancer Gene Expression,"
Iraqi Journal for Computer Science and Mathematics: Vol. 3:
Iss.
2, Article 4.
DOI: https://doi.org/10.52866/ijcsm.2022.02.01.004
Available at:
https://ijcsm.researchcommons.org/ijcsm/vol3/iss2/4