Genomic Sequencing

Gene and Protein Sequence Lab

gen564lab1-1-30-24new.doc
File Size:	1992 kb
File Type:	doc

Download File

Overview of today's lab:
1. Find your human PROTEIN, Sequence ID/Accession code and FASTA formatted sequence.
Go to Protein>click on Orthologs, Find the table, click the carrot (on right side to dropdown isoforms), take the longest isoform, Click FASTA, hyperlink this page to your human sequences on your website.

2. Find your model organism HOMOLOGS by performing RECIPROCAL BLASTs. (Look for at LEAST 5 of them, including Arabidopsis)
Go back to the orthologs page and find isoforms for homologs. Hyperlink all to your homology page. Double check with reciprocal blasts. Also to be aware, fly, worm, zebrafish gene names can be very different so you will have to find them on ENSEMBLE or in NCBI using these names. Pubmed can help you find them in the literature.

3. Compare what you find with NCBI ORTHOLOGS & ENSEMBLE databases. Make sure you have the right protein accession codes as associated FASTA formatted seaquences.

4. Hyperlink your FASTA sequences of all MODEL organisms homologs (at least 5) on a HOMOLOGS subtab on your website under your Protein Tab. Please include as many model organisms as you can find: fly, worm, fish, bacteria (E. coli), yeasts (S. cerevisiae, S. pombe), plants, Xenopus, planaria, mouse, are the main ones.  If you find plant (Arabidopsis), Lung fishes, and ancient fish (Coelacanth) this is a plus. What does Coelacanth mean about your gene if you find a homolog?

See example: Colin Nguyen's Page: https://nguyengen564s21.weebly.com/homology-and-model-organisms.html

5. Compile all of your FASTA sequences from your protein and homologs into a .txt file (via Word). Save this for future labs. Annotate your species after the > and also do not leave a space between species. Example is below.

*HINT: First Go to Word and open a new blank file and SAVE AS: .txt file. Then paste your FASTA formatted proteins in
  this file. It has to be done in this order.

6. BIORENDER--work on your 2 intro slides on your disease. Place into a working file for your final talk. Be prepared to share next week.
See Colin's conclusion page on how he modeled normal vs. disease states: https://nguyengen564s21.weebly.com/conclusion.html

Model organism choice:
7. Think about which model organism is best to model your disease (Dietrich, et al 2020: https://www.sciencedirect.com/science/article/pii/S1369848619301165www.sciencedirect.com/science/article/pii/S1369848619301165
8. Model organism adv/disadv: https://www.bosterbio.com/blog/post/how-to-choose-a-model-organism

Sample .txt file from Abigail Jaiquish (2019)

homolog1_recent.txt
File Size:	17 kb
File Type:	txt

Download File

PubMed: http://www.ncbi.nlm.nih.gov/pubmed/
BLAST: http://blast.ncbi.nlm.nih.gov
NCBI/ORTHOLOGS: https://www.ncbi.nlm.nih.gov/datasets/gene/
ENSEMBLE: https://useast.ensembl.org/index.html

Things you should know

What is an E-value (Expect value)?
The lower the E-value, or the closer it is to zero, the more "significant" the match is
For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

Similarity: The extent to which nucleotide or protein sequences are related.
Identity: The extent to which a nucleotide or protein sequence is identical between two BPs or AAs in different genes or homologs.