I had a question from the next generation sequencing paper. I’m confused what mate-pair libraries are and what their advantage is for NGS? I understand that they are made by circularizing sheered DNA, then cutting the circles into linear fragments which puts the two ends (that were originally far away from each other) in close proximity. However, I’m unclear on what information is gained from creating fragments in this way. I know it was mentioned that you can know how far away the sequences originally were, but I’m not sure how that works. Hopefully briefly talking about it will help me see the piece of information that I’m missing!
In the "Genomic Surveillance..." paper by Gire et al. there is a mentioning to "Root-to-tip distance" in the phylogenetic tree of EBOV samples.
What is the "root-to-tip distance" and what is its significance in this study? Why did the authors state that "rooting the phylogeny using divergence from other ebolavirus genomes is problematic" ?
Disclaimer: my phylogeny background/knowledge is virtually nonexistent.
In the NGS paper (the review), the scientists were able to track the evolution of the virus to common ancestors. Would it be possible for them to "reverse engineer" the process and come up with predicted newly evolving strains based on highly conserved regions and highly variable regions of the viral genome? Given that they have a decent idea of how rapidly it evolves as well, would this be a functional way to design vaccines and treatments, kind of how the flu vaccine is done yearly?
In the Gire et al. paper they mention that one notable intrahost variation was found in the RNA editing site of the glycoprotein gene. What is the significance of this finding? Does variation at this site have any impact on the outcome of an infection?
The Gire et al. paper discusses the two lineages separated into two clusters of the disease that the first patients in Sierra Leone aquired, possibly from the funeral that they all attended of another EVD patient. The paper claims that the divergence of these species/clusters predated the emergence of them in Sierra Leone, but doesn't state how or why these two groups of people acquired two different strains. Were either of the strains the same as the patient whose funeral it was? Was there any genetic difference in the immune systems of the patients who acquired cluster 1 over cluster 2?
In the Gire et al. paper, can the relative amounts of synonymous vs nonsynonymous substitutions in the West Africa outbreak patients EBOV genomes compared to the EBOV genomes from earlier outbreaks tell us anything about how the virus will likely adapt in the future? They found that the substitution rate was about double within the 2014 outbreak compared to other outbreaks and that a bigger proportion of the mutations were nonsynonymous, so in other words does this mean that nonsynonymous substitutions are more advantageous for viral adaptation than synonymous mutations are? Nonsynonymous mutations alter the amino acid sequence that is being encoded by the virus while synonymous mutations do not, so it would make sense that nonsynonymous mutations could introduce adaptive changes for the virus more often than synonymous mutations do.
In the articles Richness of human gut microbiome correlates with metabolic markers and Getting to the core of the gut microbiome, it was stated that in both obese mice and humans the gut microbiome had a “low gene count” (LGC). While the differences in number seem to have been studied, have the differences in the content been examined? Meaning, can we truly attribute the difference in weight/BMI to the amount of microbes present, r is there a difference in percentage of kinds of microbes in the microbiota as well?
Just ignore this one plz
In the main article, the authors talk about how the nonsynonymous, high substitution rate found in the 2014 ebola outbreak are consistent with incomplete purifying selection. I understand that purifying selection would favor a high substitution rate (because it ultimately eliminates deleterious mutations). However, it seems to me that the viruses would be undergoing positive selection as they are rapidly spread. Why would it be the case that they are actually under the influence of purifying selection?
In the NGS review, they say a major advantage to NGS over Sanger sequencing is that it has "the ability to produce an enormous volume of data cheaply". In what cases would one be better than the other? (For example, if you have less data you need sequenced would Sanger be a better option as it is the "golden standard" and is supposed to be more accurate)
Although we're taught about both of these techniques frequently, I never quite understood how to choose which to use.
After reading the Guire et al. paper I had the following question: How much more common would it be for a virus to adapt a new function than for a mutation to become deleterious? This is a correlation to its mutations, which it does often, but one would think that it would mutate a gene that would be for its low number of functioning proteins much more than it would mutate into something coding for a new function. The paper states: "the rate of nonsynonymous mutations suggests that continued progression of this epidemic could afford an opportunity for viral adaptation" (Figure 4H) which makes sense, but wouldn't there be the same or more viral adaptions for deleterious functions?
Question after reading Gire et al. Since this paper was published has any further functional analysis on the nonsynonymous mutations during the outbreak to find whether they are adaptive or deleterious to the disease?
In Camouflage and Misdirection: The Full-On Assault of Ebola Virus Disease, they mentioned that Ebola was able to attach to almost all types of cells, including macrophages and dendrites because of surface expressed proteins in those cells. The only exception to this is lymphocytes. Do lymphocytes lack these plasma membrane proteins and that is why the Ebola virus cannot infect them? Has removing these surface proteins been an area of study in order to help prevent infection?
The first question that popped into my head is about the general scope of the Gire et al. paper - what kind of predictive power does their data present, if any? They find very interesting statistics in terms of the number of synonymous versus nonsynonymous mutations, particularly that nonsynonymous mutations were more frequent during the 2014 outbreak. However, I still don't quite understand how this could be helpful in terms of preventing another outbreak. Are there enough patterns in the data that suggest what new mutations might arise next? Are any studies using this data for functional analysis to find out if these mutations were deleterious or adaptive? (You obviously might not be able to answer that second part, but I am just curious.) I find this data very compelling and probably useful, but I'm personally struggling to understand how it can be used to "refine public health strategies".
The paper "Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak" stated that "one notable intrahost variation is the RNA editing site of the glycoprotein (GP) gene." I have two questions regarding this statement - 1: what is the significance of this change? In other words, what is the selective advantage offered to the virus to have a high variation in this region? 2: at what point would variation between samples in this area become significant rather than a few random mutations? What is the statistical method in sequencing to determine if there's a significant "hotspot" for variation between genomes rather than being considered a few random mutations occurring?
Metzker (2010) discusses a variety of artifacts caused by sequencing technology. How many of these artifacts are sortable by analysis downstream? How is this typically done/is there a standard for interpretation? How did Gire et al. (2014) approach this fairly significant source of error?
Misasi and Sullivan (2014) discuss the life cycle of the Ebola virus and the protein interactions of the Ebola protein products. In your research did you find any studies investigating potential protein-lipid interactions and its potential effects on the host cell?
What did you think about the way that they characterized the participants as lean or obese? This study made me wonder about those people that are "skinny fat", as in those who could be measured as skinny in the study, but they have a large percentage of body fat compared to muscle. Do you think they would have demonstrated high or low bacterial diversity?
In the “Sequencing technologies – the next generation” paper, an inconsistency in some of the NGS technologies show an underrepresentation of AT-rich and GC-rich regions. In the Illumina sequencing it is brought up that in this method it is most likely due to amplification bias in template preparations. Couldn’t this underrepresentation be accounted for by altering the templates to target the regions thought to contain higher AT and GC base pairs?