Lab 3: Protein domain discovery & DNA motifs
Example of a protein domain structure with mutations
*Start with a Tutorial for Protein Domains:
*Tutorial for Gene Ontology:
Using PFAM, SMART, InterPro, PROSITE, GO (+ others--use EMBL-EBI or Geneinfinity to find more) determine the following: (NOTE: I always try the sample sequences first to see how the website works. I also like to compare the data I get from several websites as well to see if I get similar results).
1. What Protein domains are present in your protein? Provide 2 images of your protein from at least PFAM and SMART on your website. Be sure to label each domain and what each domain is known to do. Do you know if your protein has isoforms? If so, which domain(s) are missing in particular isoforms? If so, what domains are missing or added to your protein sequence? Put this on your website and in your talk. Do you know if your protein is alternatively spliced in your disease state? Where do your mutations lie in the protein structure you found? These are questions you can ask using the data you obtained from these sites.
2. Where does your protein localize in the cell (cellular component)? What is it’s molecular function? What biological process is your protein involved in? Use GO or AMIGO to find this information for your protein. Gene Ontology has 3 terms is uses to describe proteins: biological_process, cellular_component, molecular_function
3. Do you know where the mutations lie in your protein or DNA sequence? Does this info mean anything to you given what you know so far about your gene & disease? Best way is to work in a PPT or KEYNOTE document and mark where you mutations are that you have found by reading the literature.
4. Find an image of where your protein localizes in the cell or tissue(s). You can find this via Pubmed. This is something you can use in your talk.
5. OPTIONAL If you have time, find a few DNA motifs that are present in your gene? Using MEME (DREME –DNA only, Find motif, Then Click Submit and select GOMO to predict function of this motif). Provide examples of motifs you found. About motifs: http://www.nature.com/nbt/journal/v24/n4/full/nbt0406-423.html
OPTIONAL but might be helpful for your project:
Gene Infinity: http://www.geneinfinity.org/index.html?dp=1
6. What is the MW and pI of your protein?
7. Are there any cis-Regulatory elements? Where is your promotor located?
8. Is your gene GC or AT rich? Any Repeats in your gene?
9. Does your gene have a Nuclear Localization Signal (NLS) or not?
10. Are there any Transcription factor binding sites?
11. Does your protein belong to a family or superfamily?
12. Is your protein Post-translationally modified? Example, is your protein phosphorylated and/or sumoylated? Find this under Protein Analysis on Gene Infinity Phosphorylation.
Know your 3 Gene Ontologies: http://geneontology.org/page/ontology-documentation
GO consortium: http://www.geneontology.org/
Search for proteins in same species with similar domain architecture
**All domain analysis sites: http://www.ebi.ac.uk/interpro/about.html#about_08
Can you find all of the proteins that have the same domain organization in your model organism?
Can you find your HMM logo for at least one of your domains?
What clan does your protein belong to?
(What Normal and Genomic Mode Means) (Search sequence analysis or architecture analysis)
In Normal SMART, the database contains Swiss-Prot, SP-TrEMBL and stable Ensembl proteomes.
In Genomic SMART, only the proteomes of completely sequenced genomes are used
**EMBL-EBI: http://www.ebi.ac.uk/services (GO terms are easy to find here after finding domains, FYI).
MEME: http://meme.nbcr.net/meme/ (TRY DREME)
This is a cool site. You can find a lot of databases to dissect info about your DNA/protein sequences