Bioinformatics is a rapidly growing scientific field whose importance is growing in tandem with the complete analysis of evolving biological systems. It is made up of several fields, including computer science, math, statistics, biology, and chemistry, all of which continue to operate in unison. The Human Genome Project (HGP) and large-scale data collected by various investigations have raised the significance of computer technologies. These factors contribute to the widespread usage of bioinformatics in the biological sciences. For instance, it is employed in a variety of wide industries like agriculture, animal husbandry, and space exploration.[1]
The interaction of numerous distinct molecules causes cancer, which is a serious disease. In clinical investigations, analysis and modeling of many features of cancer as well as the identification of its molecular substructure help to develop new therapeutic approaches as well as methods for prognosis, diagnosis, and follow-up.[2]
The Human Genome Project has raised the significance of bioinformatics in cancer research. The progression of the disease and the likelihood of a cure are predicted by statistically developed models of cancer (at the transcriptome and signaling pathway levels). Additionally, it makes a significant contribution to the discovery of micro gateways that could serve as therapeutic targets.[3]
BIOINFORMATICS IN CANCER RESEARCH
Bioinformatics areas used in cancer research can be categorized into seven subgroups. These are; Epigenomic analyses, Proteomics, Genomic sequencing, and variation detection, Analysis of system biology, Transcriptomic analysis, Clinical and demographic information on patients, and Metabolomic analysis. These categories can each be addressed separately.
Epigenomic Analyses
Epigenomic investigations aid in examining chemical alterations to deoxyribonucleic acid (DNA) and histones at the genomic level, such as DNA methylation and histone modification. The most widely utilized methods in this area are chromatin immunoprecipitation-on-chip (ChlP-chip) arrays or chromatin immunoprecipitation sequencing (ChlP-seq) studies.[4] Cancer research benefits greatly from these improvements. A few epigenetic methods are DNA methylation analysis, chromosome access, ribonucleic acid (RNA)-array analysis, histone modifications, nucleosome insertion, and genomic analyses.[5]
Numerous epigenomic projects are carried out in addition to projects involving genome sequencing. The National Institutes of Health (NIH) Roadmap Epigenomics Program examines 30 distinct alterations in human cells in an effort to detect histone modifications.[6] A further objective of the Encyclopedia of DNA Elements (ENCODE) Project is to ascertain the epigenetic profiles of 50 various tissue types.[7] The International Epigenome Consortium agrees that a global platform should be established in order to integrate non-human live cells and tissues into the research.
It has been revealed that genes whose changes in the genome cannot be found have a significant role in oncogenic activation[6] as a result of epigenetic modeling during tumor growth. Deoxyribonucleic acid methylation has also been found to suppress the expression of tumor suppressor genes during oncogenic transformation.[8–10] Furthermore, it has been found that a gene’s expression is halted by DNA methylation. These studies demonstrate the importance of both genomic and epigenetic investigations in the study of tumorigenesis. Similar to this, owing to the NIH Epigenetic Roadmap Programme, properties of the human epigenome have been identified by contrasting numerous disease tissues, such as human cancer cells, with healthy tissues. As a result, it has aided in establishing the distinctions between cancerous and healthy tissues and is credited with advancing both cancer research and cancer therapy strategies.[6-11] We can use the National Center for Biotechnology (NCBI) Information Gene Expression Omnibus (GEO) database as an example of one of the most popular databases where information about the epigenomic analysis can be accessed.
Proteomics
Proteomic investigations offer data in a variety of domains, including protein quantity, interaction partners and networks, amino acid variations, and alterations to comprehend biological processes associated with a particular proteome. These investigations contribute substantially to the discovery of cancer research biomarkers.[12] Proteomics uses a variety of techniques, some of which include mass spectrometry and protein microarray.
Genomic Sequencing and Variation Detection
The genome map, which is made up of base pairs, was made available online with the help of the HGP. Since the discovery of the double helix structure of DNA in the 1950s, this discovery has established itself as the most significant discovery in biology and medicine. Using computational and informatics techniques, this effort found about 20,000 protein-coding genes in the human genome.[13] By contrasting the variances in the genomes of people of various backgrounds, single nucleotide polymorphisms and other genetic variants in the genome were also revealed. Thus, disease sequences in particular genome areas were matched to disease sequences, and this finding produced the identification of genetic markers for a variety of disorders.
Other initiatives, in addition to the HGP, have been completed. Among the most significant of these efforts are the Human Variome Project (HVP),[14] which seeks to understand the connections between human genetic variants and diseases, and The International haplotype map (HapMap) Project, which seeks to identify the human genome’s haplotypes.
The Cancer Genome Atlas (TCGA) is a vast undertaking that involves the sequencing of 10,000 cancer genomes and covers 25 main cancer types. Public access to the information gathered for this research is available. It offers a wealth of knowledge regarding cancer-related genetic mutations.[15] Accordingly, any alterations in the cancer genome can be identified by comparing the genetic sequences of cancer tissue with healthy tissues.
Analysis of System Biology
Systems biology analysis is the thorough analysis of data from biochemistry, molecular biology, and biological systems that results in the mapping of cell signaling networks. Systems biology analysis is the quantitative observation of the molecular levels of nucleic acids, proteins, and intermediates through in vitro or in vivo studies, as well as the development of the mathematical software required for integrating biochemical data with computational biology data from multi-output cutting-edge technologies. System biology research has surged recently in tandem with studies on the search for new drug candidate molecules with specified targets as a result of the rising number of tumors that exhibit resistance to currently available treatments. When determining therapeutic targets, one of the most crucial aspects of this research is the requirement to look at potential adverse effects on both pathogenic and healthy cells. These studies are crucial, especially in identifying target signaling pathways.[16] Numerous studies have been done on systemic assessments of cancer. The Integrative Cancer Biology Programme (ICBP) and Molecular Signatures Database (MSigDB) are two examples of this.
Transcriptomic Analysis
The expression of every gene encoded in the human genome could be examined under various circumstances because of the microarray technology that arose in the middle of the 20th century. This method has allowed researchers to learn a lot about human diseases at the molecular and cellular levels. These methods have enabled researchers studying cancer to pinpoint the genes whose expression drastically changes between cancer cells and healthy cells, revealing minor cell-level pathways that are linked to the disease.[6] The RNA-array approach, which is more sophisticated than microarray, gives more exact details about significantly differentially expressed genes. The information concerning differentially spliced variants is likewise provided by RNA-array data simultaneously.[16,17] Databases for RNA and microarray are widely available. The GEO and essentially non-oscillatory (ENO) databases are the two that are most frequently used.[18,19]
Clinical and Demographic Information on Patients
When the biological systems of various cancer patients are fully analyzed, it has been found that many parameters, including age, gender, origin, consumption of alcohol and tobacco, blood biochemistry, and many other similar characteristics, may contain differences independent of the primary disease. Therefore, thorough examinations of the biological systems of patient-provided phenotypic data are crucial. The USA’s Pharmaceutical Precision Medicine Initiative (PMI) was established in 2016 with the goal of creating patient-specific preventative and therapeutic measures.[20] The data gathered from patients one-to-one has significantly advanced cancer research since cancer is heterogeneous or the blending of two separate substances with differing densities.
Metabolomic Analysis
These investigations inform us about biomolecules found inside the cell, such as tiny secondary messenger molecules or the products or substrates of enzymes. Over 40,000 intermediates have been found to be present in human cells at any given time, according to the Human Metabolome Databases (HMDB). The level of activity of a specific metabolic cell gate can be assessed by examining the intermediates connected to that cell gate. Cellular magnetic resonance spectroscopy and high-resolution single-cell mass spectrometry are two examples of methods used to find intermediates in cells or tissues. These metabolomic investigations give precise information about which cell gates create which intermediates when paired with transcriptomics and gene functions.[6] These investigations allow for the identification of metabolic changes between malignant and healthy cells.
The Role of Bioinformatics in Cancer Treatment and Diagnosis
Information regarding the structural details, mutations, and function of the gene can be discovered by sequencing and gene expression. Analysis of the variations in comparison to reference genomes can also be used to estimate the risk of disease. When a disease is suspected, an earlier and more accurate diagnosis is feasible because of the identification of gene sequences linked to certain illnesses.[6,21,22]
The development of individualized treatment plans involves a significant amount of bioinformatics research. Time and financial loss are minimized by choosing several treatment modalities that are best for the molecular structure of the disease. With the recently evolving technology in cancer treatment, a strategy used in particular in these areas is identifying the overactive genes and adopting an approach accordingly, as well as acquiring the missing genes.[22] By examining the patient’s genome and protein variants using bioinformatics methods, the sufficiency and effectiveness of the pharmaceuticals given may also be assessed, and the resistance status can be modified.[23]
Hanahan and Weinberg[24] assert that specific traits are shared by all cancer kinds. These include: reprogramming the energy metabolism, signaling for sustainable cell development, avoiding growth inhibitors, apoptosis resistance, immortality brought on via replication, angiogenesis, immune system emigration, inflammation that expresses tumors, genomic instability and mutation, cell invasion, and metastasis activation.
Energy metabolism can be changed to promote cell growth and proliferation. Research is being done to identify therapeutic approaches that target the proteins involved in energy metabolism. Databases like MetaCyc metabolic pathway database and BioCyc genome database collection are examples that are utilized for this. Uncontrolled cell growth occurs in cancer. This is due to the fact that cancer cells continuously send out growth signals. As a result, researchers are looking at cell signaling pathways and the mechanism study of cancer cells. All genomic or transcriptome analyses, as well as data from numerous databases like the Kyoto Encyclopedia of Genes and Genomes (KEGG), can support this incredibly rich area of research. Researchers studying cancer are interested in the fact that cancer cells can evade immune system defenses.[25] As a result, questions and research into the use of immunotherapy for various cancer kinds have begun. But not all patients respond favorably to this approach. Different indicators are required to analyze the connection between cancer and the immune system. Epigenetic alterations, gene signatures, and mutant antigen profiles in tumor and immune system cells can all be investigated thanks to multi-output cutting-edge technology.[26]
Bioinformatics for Anti-Cancer Drug Discovery
A key component of treating cancer is drug therapy.[27-30] Drug development relies heavily on the screening of chemicals in preclinical studies.[31] These procedures include selecting the most effective compounds through methods including in vitro drug screening and in vivo animal experiments.[29] Small compounds are commonly the subject of tedious, expensive, and time-consuming experimental experiments used to find new cancer treatments.[32] As a result, effective strategies are needed to enhance drug discovery. Comprehensive analytical data on biological systems have been generated in recent years as a result of the advent of high-throughput technologies like microarray and next-generation sequencing.[33,34] These data give us knowledge that will help us enhance cancer treatment.
Finding new uses for currently available medications is far more cost-effective than first developing novel chemicals for cancer therapy. The opportunity for computerized prediction of anticancer medications based on drug repositioning is made possible by the development of extensive analytical data of biological systems.[35] Drug discovery has become more dependent on drug repositioning, which involves finding new clinical indications for already-approved medications or those that are in development.[36]
How to manage patient heterogeneity presents another difficulty in the treatment of cancer. Recent studies have shown that individuals with the same cancer type respond differently to the same medication.[37,38] To direct medication therapy and enhance clinical results, it is crucial to understand the unique genetic, epigenetic, and transcriptome profiles of each patient.[39] Obtaining thorough analysis data of biological systems for patients prior to making a therapy decision is now possible thanks to sequencing technology. The interpretation of sequencing data, the prediction of drug sensitivity, and the exact selection of drugs are all accomplished using bioinformatics methods.[40]
Furthermore, in order to enhance clinical outcomes, numerous medications are routinely combined in the treatment of cancer. There are numerous benefits to combination therapies.[41,42] They provide a greater ability to destroy cancer cells,[43] minimal risk of side effects due to low individual dosage,[44] and the ability to prevent treatment resistance by simultaneously focusing on numerous pathways.[45] It is challenging to find and confirm drug combinations using experimental tests due to the huge number of drug pairings and dose combinations.[46,47] When assessing all potential combinations and choosing an effective multicomponent therapy, bioinformatics methods play a significant role. Numerous studies, particularly those based on biological network theory and techniques, have systematically investigated binary medication combinations and predicted effective drug combinations for the treatment of cancer.[48,49]
Bioinformatics for Cancer Immunotherapy
Cancer is a disease brought on by changes in a cell’s DNA, which enable the cell to multiply uncontrollably and eventually spread to other areas of the body.[50] These alterations are characterized by chromosomal rearrangements, which happen when the open reading frames of two genes are joined, producing a new gene product. These mutations endow mutant cells with novel features that facilitate malignant behavior.[51]
The adaptive immune system includes T cells, which react to antigens from cancer and infectious illnesses. They go through a procedure referred to as negative selection.[52]
The idea of “personalized medicine” improves multi-layered strategies for addressing patients’ specific needs.[53] An intelligent vaccine design, individual cancer-specific tumor mutations, neoepitope prediction, and personalized immunotherapies are required for the creation of personalized mutanome vaccines. On-demand production occurs.[54,55]
Neoepitopes are important for the immune system as well as experimental therapies such as customized mutanome vaccines.[56] They have shown a great deal of promise in the treatment of several types of cancer by energizing the immune system to fight cancer cells. Hundreds of clinical trials are presently being conducted,[57] and these treatments have been shown to be effective for treating a variety of cancer types.[58]
A wide range of patient samples are now available due to the advancement of next-generation sequencing (NGS) technologies. Up until recently, the focus of these investigations was the identification of variations that were much more common in a patient population. With this approach, low detection sensitivity and specificity can be made up for by a large number of patient instances. The overall false discovery rate for short read-based parallel NGS approaches and current analytic algorithms is only about 50% to 70%, according to prior research.[59]
Designing personalized, neoantigen-based vaccinations involves a number of stages, such as the detection of mutations and subsequent prediction of possible epitopes. In addition to mutation screening of NGS data, accurate identification of neoantigens as treatment targets and biomarkers also requires predictions of the immunogenicity of mutation-derived peptides.[49]
Targeted Therapy in Cancer
The Human Genome Project allowed for the sequencing of human DNA and resulted in technological advancements that can now detect a multitude of data. The adoption of personalized medicine has been sped up by these technologies in conjunction with drug development. When it comes to disease prevention, diagnosis, and therapy, personalized medicine takes into account the genetic and environmental causes of illness.[60,61] Improvements in patient care may be achieved through the use of targeted therapy and other techniques made available by translational medical advances.[62]
In conclusion, almost all fields of anti-cancer drug development, including precision cancer therapy, drug design, and drug repositioning, appear to be heavily reliant on computational approaches such as bioinformatics network analysis and integrated analysis of multi-omics data. The production of anti-cancer medications can be done more cheaply and effectively thanks to the generation of enormous volumes of multi-omics data and bioinformatics techniques. With the advancement of bioinformatics, this omics data may change. The accuracy of the information provided by this field should therefore be confirmed by cancer researchers.