The evidence implicating a gene should be statistically significant supported with ancillary data from informatics sources and functional studies. All the available evidence should be collectively assessed.
How can Sequence Support help you with data curation?
Let’s start with an example of NF2 gene and to understand how sequence support helps you identify optimal resources available.
This example is divided into three steps.
You’ve been asked to develop a genetic report that tells patients whether they have a high risk of developing an inherited cancer due to the presence of certain variants in their genome. You have to mine data related to mutations in NF2 gene, which have been found to be associated with a hereditary cancer syndrome called Neurofibromatosis Type 2. As a first step, you have decided to collect evidence from clinical studies that have identified specific NF2 variants in patients with Neurofibromatosis Type 2.
Potential factors that can affect data mining exercise to collect evidence in support of gene-disease association:
- Name entity recognition (NER): NF2 gene is also referred to as ACN; SCH; BANF. NER has two subtasks, namely recognition and normalization (also known as identification or grounding), the former helps recognize the words of interest and the latter helps map them to the correct identifiers in databases. Unlike earlier times machine learning with the help of manually annotated text is a standard practice.
- A controlled vocabulary of disease: Just like the synonyms for gene name NF2, Neurofibromatosis Type 2 is also referred to as Central type Acoustic Schwannomas, Bilateral Acoustic Neurofibromatosis, and Central neurofibromatosis, etc. To counter the discrepancies caused by the use of different versions of International Classification of Diseases by clinical groups across the various countries, a viable solution is consulting the Disease Ontology, which is a part of the Open Biomedical Ontologies (OBO). It cross maps to Unified Medical Language System (UMLS) and has an extensive annotation of synonyms. Disease Ontology works well for recognition of disease names mentioned in Gene Reference Into Function entries.
- Information extraction for Gene-disease association: NF2 gene has two isoforms, and the data needs to be analyzed for mutation at the gene level, variant level and information should be supplemented with the range of evidence. Gene-disease association is made through a variety of different types of studies, including classical pedigree-based genetics studies of Mendelian and complex diseases, genome-wide association studies (GWAS), somatic mutation frequencies, transcriptomics and proteomics studies, and detailed molecular biology studies of individual proteins.
Manually curated resources like DISEASES keep above-mentioned factors into consideration and provide annotated information pertinent to the gene-disease association. Since the data is continuously produced, curation and a careful analysis of data coupled with proper annotation are required for any conclusion to support or negate gene-disease association.
Name entity recognition (NER):
Controlled vocabulary of disease
Open Biomedical Ontologies (OBO)
Information extraction for Gene disease association
|Authors, Year||PMID||Evidence from paper||Classification||Reasoning|
|Rouleau et al., 1993||8379998||Prevalence of point and germline mutations in SCH gene in NF2 patients and NF2 related tumors.||Gene Level: Gene disruption||The putative novel mutations of SCH identified in this study were only present in patients and NF-2 related tumors.|
|Parry et al., 1996||8751853||SSCP analysis to screen DNA from 32 unrelated patients.||Variant level: Genetic association||Even with the caveat of small sample size, the study could link mutations to age groups, time of disease onset and diagnosis.|
|Giovannini et al. 2000||10887156||Cre-mediated excision of Nf2 exon 2 in Schwann cells (mice) showed characteristics of neurofibromatosis type 2||Variant Level: Phenotype recapitulation||Hemizygosity for the NF2 gene in humans causes a syndromic susceptibility to schwannoma development but not in mice and this is because of insufficient rate of allele inactivation. This report describes how homozygous Nf-2 mutation in mice shows characteristics of neurofibromatosis type 2.|
|Scoles et al., 2000||10861283||Schwannomin protein interacts with hepatocyte growth factor-regulated tyrosine kinase substrate (HRS).||Gene Level: Protein interaction.||Authors verify the interactions using immunoprecipitation, in-vivo and in-vitro assays. Report also highlights that the missense mutations in Schwannomin protein inhibits it’s binding with HRS.|
|Fernandez-Valle et al, 2002||12118253||Molecular adaptor paxillin binds directly to schwannomin||Gene Level: Protein interaction.||Authors verify the interactions using immunoprecipitation, co-localization and identify protein binding domains and describe how this interaction is important for membrane localization and polarization.|
This step include cross checking of reference and summarizing information from databases with validated information from GWAS studies, protein-protein interactions and transcriptome regulation.