Gene and Protein Sequence Lab
gen564lab1-1-30-24new.doc |
Overview of today's lab:
1. Find your human PROTEIN, Sequence ID/Accession code and FASTA formatted sequence.
Go to Protein>click on Orthologs, Find the table, click the carrot (on right side to dropdown isoforms), take the longest isoform, Click FASTA, hyperlink this page to your human sequences on your website.
2. Find your model organism HOMOLOGS by performing RECIPROCAL BLASTs. (Look for at LEAST 5 of them, including Arabidopsis)
Go back to the orthologs page and find isoforms for homologs. Hyperlink all to your homology page. Double check with reciprocal blasts. Also to be aware, fly, worm, zebrafish gene names can be very different so you will have to find them on ENSEMBLE or in NCBI using these names. Pubmed can help you find them in the literature.
3. Compare what you find with NCBI ORTHOLOGS & ENSEMBLE databases. Make sure you have the right protein accession codes as associated FASTA formatted seaquences.
4. Hyperlink your FASTA sequences of all MODEL organisms homologs (at least 5) on a HOMOLOGS subtab on your website under your Protein Tab. Please include as many model organisms as you can find: fly, worm, fish, bacteria (E. coli), yeasts (S. cerevisiae, S. pombe), plants, Xenopus, planaria, mouse, are the main ones. If you find plant (Arabidopsis), Lung fishes, and ancient fish (Coelacanth) this is a plus. What does Coelacanth mean about your gene if you find a homolog?
See example: Colin Nguyen's Page: https://nguyengen564s21.weebly.com/homology-and-model-organisms.html
5. Compile all of your FASTA sequences from your protein and homologs into a .txt file (via Word). Save this for future labs. Annotate your species after the > and also do not leave a space between species. Example is below.
*HINT: First Go to Word and open a new blank file and SAVE AS: .txt file. Then paste your FASTA formatted proteins in
this file. It has to be done in this order.
6. BIORENDER--work on your 2 intro slides on your disease. Place into a working file for your final talk. Be prepared to share next week.
See Colin's conclusion page on how he modeled normal vs. disease states: https://nguyengen564s21.weebly.com/conclusion.html
Model organism choice:
7. Think about which model organism is best to model your disease (Dietrich, et al 2020: https://www.sciencedirect.com/science/article/pii/S1369848619301165www.sciencedirect.com/science/article/pii/S1369848619301165
8. Model organism adv/disadv: https://www.bosterbio.com/blog/post/how-to-choose-a-model-organism
1. Find your human PROTEIN, Sequence ID/Accession code and FASTA formatted sequence.
Go to Protein>click on Orthologs, Find the table, click the carrot (on right side to dropdown isoforms), take the longest isoform, Click FASTA, hyperlink this page to your human sequences on your website.
2. Find your model organism HOMOLOGS by performing RECIPROCAL BLASTs. (Look for at LEAST 5 of them, including Arabidopsis)
Go back to the orthologs page and find isoforms for homologs. Hyperlink all to your homology page. Double check with reciprocal blasts. Also to be aware, fly, worm, zebrafish gene names can be very different so you will have to find them on ENSEMBLE or in NCBI using these names. Pubmed can help you find them in the literature.
3. Compare what you find with NCBI ORTHOLOGS & ENSEMBLE databases. Make sure you have the right protein accession codes as associated FASTA formatted seaquences.
4. Hyperlink your FASTA sequences of all MODEL organisms homologs (at least 5) on a HOMOLOGS subtab on your website under your Protein Tab. Please include as many model organisms as you can find: fly, worm, fish, bacteria (E. coli), yeasts (S. cerevisiae, S. pombe), plants, Xenopus, planaria, mouse, are the main ones. If you find plant (Arabidopsis), Lung fishes, and ancient fish (Coelacanth) this is a plus. What does Coelacanth mean about your gene if you find a homolog?
See example: Colin Nguyen's Page: https://nguyengen564s21.weebly.com/homology-and-model-organisms.html
5. Compile all of your FASTA sequences from your protein and homologs into a .txt file (via Word). Save this for future labs. Annotate your species after the > and also do not leave a space between species. Example is below.
*HINT: First Go to Word and open a new blank file and SAVE AS: .txt file. Then paste your FASTA formatted proteins in
this file. It has to be done in this order.
6. BIORENDER--work on your 2 intro slides on your disease. Place into a working file for your final talk. Be prepared to share next week.
See Colin's conclusion page on how he modeled normal vs. disease states: https://nguyengen564s21.weebly.com/conclusion.html
Model organism choice:
7. Think about which model organism is best to model your disease (Dietrich, et al 2020: https://www.sciencedirect.com/science/article/pii/S1369848619301165www.sciencedirect.com/science/article/pii/S1369848619301165
8. Model organism adv/disadv: https://www.bosterbio.com/blog/post/how-to-choose-a-model-organism
Sample .txt file from Abigail Jaiquish (2019)
homolog1_recent.txt |
PubMed: http://www.ncbi.nlm.nih.gov/pubmed/
BLAST: http://blast.ncbi.nlm.nih.gov
NCBI/ORTHOLOGS: https://www.ncbi.nlm.nih.gov/datasets/gene/
ENSEMBLE: https://useast.ensembl.org/index.html
BLAST: http://blast.ncbi.nlm.nih.gov
NCBI/ORTHOLOGS: https://www.ncbi.nlm.nih.gov/datasets/gene/
ENSEMBLE: https://useast.ensembl.org/index.html
Things you should know
What is an E-value (Expect value)?
The lower the E-value, or the closer it is to zero, the more "significant" the match is
For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.
Similarity: The extent to which nucleotide or protein sequences are related.
Identity: The extent to which a nucleotide or protein sequence is identical between two BPs or AAs in different genes or homologs.
The lower the E-value, or the closer it is to zero, the more "significant" the match is
For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.
Similarity: The extent to which nucleotide or protein sequences are related.
Identity: The extent to which a nucleotide or protein sequence is identical between two BPs or AAs in different genes or homologs.