General

yy2009 · Post by **yy2009** » Wed Nov 21, 2012 8:52 am

Please help me to know what is the error in the following LaTeX code.

\begin{document}
\maketitle
\section{Introduction} SNVs (Single Nucleotide Variant) are the most common form of the variant of intra-species. Genome project aim to provide deep characterization of the human genome sequence variation as a foundation to investigate the relationship between genotype and phenotype (Durbin et al., 2010). and migration pattern of ancient human. Now SNVs are used as a marker in genome wide association study that are identified  over 3300 common SNVs which correlate to the phenotypes of disease.
Discovering the functional impact of SNVs that contribute to the disease susceptibilities and drug sensitivities. Are the main goals of modern genetics and genome studies.
From biological side the SNVs have many functional impact, it could affect to the transcriptional machinery of the cell in a region od DNA  that contain signals recognized by transcription factor or to the splice site or If it occurs at a splice site or a site where exonic or intronic splicing enhancers or repressors bind or it may affect to the transcriptional machinery.
SNVs that cause alteration in protein sequences also interfere with the conformation of tertiary structure .
The role of bioinformatics is indirectly through proxies. Biologist interested in SNVs where it’s affect on transcriptional gene that represent site directed mutagenesis on genomic DNA then transfer the mutated DNA to the cell culture  and use readouts that result from the activity of transcriptional gene to measure changes that related to wild-type , in contrast bioinformatics based typically approaches include computational analysis for DNA sequence around SNV. Most of their works Concentrated on SNVs on protein coding that is called cSNV or nsSNV and its related to the human healt.
Most SNVs found distributed differently in cases and controls in GWAS studies are not in protein exons and it's not found in protein coding region.
New methods to predict which of these putative regulatory SNVs (known as rSNVs) may be consequential.


\section{Classical Approaches}  In the genetic and cancer field there is a move to use SNVs instead of SNP as a broader term ,which encompasses both common and rare variant.
 
 
\section{Impact of SNVs}
 


\textit{Properties of amino acid residue substitution:}
Properties of amino acid residue substitution can contribute to predict the effect of SNVs because of the evolutionary distance between pairs of amino acid .
PAM matrices: This approach approximate the distance  according to the between pairs of amino acid .
BLOSUM matrices: this approach included distantly species but only consider highly conserved position of protein.
These two approaches using raw mutation rate  to calculate the score for each amino acid substitution  that determine the ability to produce evolutionary trajectory over the time rather by the chance.
If the substitution is conserved, that mean it's less likely to be disruptive. while if it is not conserved , that are more likely to be deleterious.
There is another approaches to consider another properties of amino acid like biophysical , changing in volume , Hydrophobicity ,net charges, packing density and solvent accessibility, these properties correlate with functional impact of cSNV.
Grantham distance hypothesized that biophysical importance  of amino acid substitution could be quantified in 3d pace and that can be represented by Euclidean distance.
But these metrics not sufficient to make accurate prediction about cSNV reliably.\\
\textbf{The evolutionary history of an amino acid position:}\\
BLOSUM and PAM scores: were used to estimate an evolutionary distance separating a pair of amino acids.
The aim of considering the composition of amino acid at equivalent position of protein family because functionally or structurally important positions will not tolerate a variety of amino acids.
Shannon entropy:quantifies how surprised we are by the distribution of all amino acids in a column. Highly conserve columns present few surprises while columns of little conservation present more surprises.
Relative entropy: Comparing Shannon entropy of a column with the Shannon entropy of the amino acid background distribution.
Many scores incorporate both properties of amino acid
substitution and evolutionary conservation.
The SIFT score: is a weighted average of the frequency with which a variant amino acid residue appears in the multiple alignment column, and an estimate of unobserved frequencies via Dirichlet mixture pseudocounts (Sjander et al., 1996). 
The PSIC score :considers the difference between the
Likelihood of the reference and variant amino acid at an alignment column ,after encoding of the alignment in a Position-Specific Scoring Matrix (PSSM).
AGVGD score: is a position specific variant of the Grantham distance, where the Grantham Variation GV is computed by replacing each pair of components representing composition, polarity and charge with the maximum and minimum value in the alignment column.
The Grantham Deviation: measures the extent that a variant amino acid deviates from the range of variation seen in the column, an estimate of its violation of evolutionary constraints on the protein position. 
The MAPP score: uses a statistical summary of an alignment column by constructing a phylogenetic tree and weighting each sequence by tree topology and branch lengths .
The mean and variance of amino acid physiochemical properties in the summarized column are used to estimate position-specific constraints on amino acid substitution in a biologically meaningful way.
(PCA)  principal components analyses :transforms the properties into de-correlated components, which are used to generate an integrated score that measures constraint violations with respect to all of the amino acid properties.


\textbf{Sequence-function relationships:}
There is essential web service for bioinformatics  like UniProtKB database that have information about the sequence function and this can be useful  for assessing whether cSNV  is functional and have a feature of maintain table attribute for each curated protein to annotate region and specific sites of the interest, and these feature used to identify whether cSNV appear in a region that are sensitive to amino acid substitution, biologically important sequence motifs,
active site residues, metal-binding sites, sites of post-translational modifications, and lipid-binding sites.
In some cases  the results of mutational functional testing in a position included as a feature.



\textbf{Structure-function relationships}
It is possible to compute a number of properties that are useful   in predicting the functional effect of cSNV if  it is can be mapped based on experimentally structure of protein or high quality homologues model .\\
Solvent accessibility: it is mean the ability of water to  touch some molecules on the surface of protein,  this property consider one of the strongest predictors of functional  impact of cSNV because any substitution in the hydrophobic of soluble protein can disrupt thermodynamic stability.\\
Structural modeling of mutant  :used to assess wither  cSNV induce backbone strains, which lead to over-packing and occur  in cavities, or impact key pair-wise residue interactions.
X-ray crystal help to interact with pair of protein or small molecule, nucleotide, and peptide ligands.
Thus the ability to determine cSNV  portion in protein structure make it able to assess, in some cases if the changes appear near catalyst region or binding site or at a domain-domain interface in a protein complex.
In other case we can use electrostatic analysis for protein surface or model to reflect highly charged patches that disrupted through cSNV  that may affect binding interaction.

\section{Impact of rSNVs}
SNV have regulatory  Impact in many way by disrupting chromatin structure to cause losing in post translational modification site. \\
\textbf{Transcription}
SNVs interfere with the regulation during disrupting transcription factor binding sites (TFBSs) , in the human genome most of these sites derived by PSSMs from Database resources such as JASPAR and TANSFACT, each PSSMs describe a statistical  profile for sequence bound during a given transcription factor, up to the experimental studies from sources such as ChIP-seq.
PSSM, allow researcher to predict TFBSs profile from limited sets of observations by supposing that each site is independent.
Some sites in TFBSs could be highly conserved , which reflect transcription factors that depend a specific nucleotide to proper binding. While  other is more divergent. This is captured by the relative entropy of each column in the PSSM.
A SNV  that altered conserved site is more likely to be deleterious than one that alters  a divergent site. A number of transcriptional factors work as a modification factors for chromatin like CTCF.  In such case SNVs  can be consequential because of the ability to regulate not simply individual genes but also a large region of chromatin.
Lose or gain of  a predicted TFBS might not be consequential for many reasons , PSSMs tend to be low specificity predictors because transcription factor binding motifs are short and degenerate.


\textbf{Pre-mRNA splicing}
SNV affect splicing that tend to be consequential the associated disease  to a single point mutation . SNVs can alter splicing by disrupting  an existing splice site and creating new one inside an exon, and adding and removing splicing regulatory motifs like Exon Splicing Enhancers and Silencers . SNVs in exons can also introduce premature termination codons, that lead to  little or no protein.  The analysis of the splice site that have SNV close to this a key component in SNV function prediction.
PSSMs is consider an effective matrix for  capturing the importance of each splice site position . SNVs that introduce premature termination codons have a high likelihood of being consequential. 


\textbf{MicroRNA binding}
The impact of SNV to regulation are by altering (miRNA) binding sites, miRNA perform a diversity functions but most of them are implicated in mRNA silencing  through translational repression or cleavage.
SNVs in this region have greater impact than SNVs elsewhere in the binding site. The impact of SNVs in the binding site depend on the entire binding site.



\textbf{Altering post-translational modification sites}
The activity of the protein in the eukaryote is through the  modification  of post-transitional modification(PTM), this PTM is predicted by :\\
1)experimental measurement
2)prediction
To detect PTM sites there is recent advances in mass spec proteomic have dramatically improved.
To evaluate the PTM site annotations , still onus on the researchers to understand how the prediction is derived and what is the limitations of the method.




\section{Bioinformatics predictors under the hood}
\textbf{Single vs. multiple feature strategies:}
Some functional predictors of cSNV and rSNV depend on a single feature, but the popular approach is to apply data integration, which combine multiple features into a single predictive score.
If two features are highly correlated then there is probably but not much to using both of them. But uncorrelated features or independent one are expected to increase  predictor accuracy. It make sense to design a predictor to identify a set of highly informative and independent features but researcher take an empirical approach to ascertain which features are useful.  The best feature  which able to distinguish between the functional impact of SNVs and neutral SNVs which don’t have functional impact. The  collection of SNVs from both categories require intelligent feature selection . If the collection is larger enough that mean it will be sufficiently representative for this variety, but if it’s too small  or contaminated by mislabeled SNVs that will not make us to ascertain if the selected feature is useful or not.

\textbf{Benchmark sets:}
Researcher use two approaches to assemble collection of functional and neutral SNVs, these approaches functional assay results and data-mining.
The functional SNVs that associated with disease, these disease associations are with respect to the mendelian or monogenetic disease, can be collected from curated database like SwissProt Variant, OMIM, HGMD or its obtained from  clinical or functional studies.
For the early predictors the used different approaches and used the results of saturation mutagenesis experiments in bacterial and viral proteins as benchmark, these benchmark set  assume  that there is relationship between SNVs  and the phenotype effect.
In the experiment of benchmark its assumed the measured impact on a single molecular function is directly coupled with disease.
 In these assumption , researchers should remain aware of underlying uncertainties 




\textbf{Supervised learning:}
The  most data-driven approach to predict function of SNV are used both  collection of function and neutral SNVs to assess the most relevant predictive feature and to train classification algorithm, benchmark as well  used as a training set.
The supervised statistical learning algorithm is  the most common used , where each SNVs is represented by multiple futures and a class label.
These algorithms detect patterns associated with each class and learn a decision rule, which is subsequently applied to SNVs , whose class membership is unknown.
Supervised learners is consider a successful if the decision rule yielded from the training phase  is generalizable or able to predict the class .


\section{BUYER BEWARE}
There a wide range of bioinformatics methods  available via web interfaces , they are easy to use to assess SNVs, it make sense to decide which of these are a good choice for a particular purpose. But the problem that they apply as a blackbox without solid understanding , and that is dangerous strategy from practical and scientific perspective , while its recommended to study the underlying these methods carefully and this method should be subject scientific publications because they explain the assumptions made by method. 
 Any method that uses protein sequence alignments to evaluate cSNVs will be biased by the sequence that are included in the alignment.
To identify the biologically importance of  conservation signals that require samples of  both closely and distantly related sequences.
If the alignment contain sequences that are too distantly related to  SNV containing sequence, or it may contain residue that is deleterious if substituted  into the equivalent positions in human.
almost all positions  will appear to be conserved whose are functionally important.
SIFT algorithm is designed to avoid pitfall by selecting a diverse of homolog’s.
If prediction method  involves feature selection/ supervised machine learning ,  information leak  introduced during testing may yield an overly  optimistic evaluation of performance.

\begin{figure}[h!]
\centering
\includegraphics[scale=0.8]{f1_large.JPG}
\caption{Flow chart for informed use of SNV function prediction tools}
\label{threadsVsSync}
\end{figure}


\section{CONCLUSION}
SNV is became high prioritization in the age of personal genomics(branch of genomics).
This task is now easy using SNV meta-server, which is essentially black-box  to automate execution of multiple assessments, and that lead to the functional SNV. Anyone whose used black-box tools should understand the underlying method and what their limitation, one of these limitations is shared by all of these methods . For example, current ESS predictors are based on only one family of splicing factors 
but there is no reason to believe that these proteins represent the complete universe of ESSs.
There is a new technology ChIP-seq represent a significant advances in evaluating the impact of SNVs on DNA binding , this technology will facilitae research on the impact of SNVs on DNA methylation, chromatin and transcription factor binding, and RNA processing (through CLIP-seq).



\bibliographystyle{abbrv}
\bibliography{references}

\end{document}

The "references.bib" file.

Code: Select all

@book{
  title={ A map of human genome variation from population-scale sequencing },
  author={ Durbin, R. M., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin,R. M., Gibbs, R. A., Hurles, M. E., and McVean, G. A.},
  isbn={467(7319),1061–1073},
  url={http://www.1000genomes.org/sites/1000genomes.org/files/docs/nature09534.pdfJ},
  year={2010},
  publisher={yyy},
 
 title={ Using bioinformatics to predict the functional impact of SNVs },
  author={ Melissa Cline , Rachel Karchin},
  isbn={467(7319),1061–1073},
  url={http://bioinformatics.oxfordjournals.org/content/27/4/441.full.pdf+html},
  year={2010},
  publisher={Dr. Jonathan Wren},
}

Learn LaTeX easily with newest books:

The LaTeX Beginner's Guide: 2nd edition and perfect for students writing a thesis

The LaTeX Cookbook: 2nd edition full of practical examples for mathematics, physics, chemistry, and more

LaTeX Graphics with TikZ: the first book about TikZ for perfect drawings in your LaTeX thesis

Post by **cgnieder** » Wed Nov 21, 2012 12:55 pm

Hi and welcome to the LaTeX community!

The code you posted is missing the whole preamble (!) and \end{document}. Other then that there is nothing that should produce any errors. You're going to have to provide us with a $Info$ minimal working example (if you don't know what that is please follow the link!).

Also you might be interested in the following threads:

Regards

yy2009 · Post by **yy2009** » Thu Nov 22, 2012 6:52 pm

what is the missing error? ..could you write clearly what is the sentence that is missing

Post by **cgnieder** » Thu Nov 22, 2012 7:09 pm

There is not a sentence missing in your example but the whole document preamble! Without it there is no chance that LaTeX can compile it. If you don't know what a document preamble is you should start reading some beginner's introduction to LaTeX. If you do know then please provide a $Info$ minimal working example!

Regards

yy2009 · Post by **yy2009** » Thu Nov 22, 2012 8:22 pm

I put the following class.

Code: Select all

\documentclass{article}
\usepackage[utf8]{inputenc}

\usepackage{graphicx}

But I still have a problem in references. It doesn't appear in the main output.

Post by **cgnieder** » Thu Nov 22, 2012 8:35 pm

I'll ask you a third time: please provide a $Info$ minimal working example! If you're not sure what that is please follow the link. (Please focus on the minimal part -- your first example was unnecessarily long...)

Maybe this can be a starting point:

Code: Select all

\documentclass{article}

\usepackage{filecontents}
\begin{filecontents}{example.bib}
@article{example,
  author  = {Example Author},
  title   = {Example Title},
  journal = {Example Journal},
  year    = {2012}
}
\end{filecontents}


\begin{document}

As is shown in \cite{example} ...

\bibliographystyle{plain}
\bibliography{example.bib}

\end{document}

Regards

LaTeX.org

General ⇒ Problems with Code Debugging

Problems with Code Debugging

Problems with Code Debugging

Re: Problems with Code Debugging

Problems with Code Debugging

Problems with Code Debugging

Problems with Code Debugging