生物 生物信息学

2009年2月19日星期四

蛋白序列分析

Finding tools/databases

Finding a sequence

Entrez Protein database at NCBI: The protein entries in the Entrez search and retrieval system have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq.

NCBI Handbook:

Finding a sequence

UniProt – you can accomplish a lot here!

  • The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
  • In 2002, PIR, EBI and SIB, were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases.

UniProt help

Outreach and education resources

ExPASy

ExPASy (Expert Protein Analysis System)

The proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE (Disclaimer / References / Linking to ExPASy).

  • Databases
  • Tools & Software
  • Education & Services
  • Links
  • Announcements
  • Mirror Sites
  • Job openings

ExPASy Proteomics tools

ExPASy Proteomics tools available on a variety of topics, such as:

  • Other proteomics tools
  • DNA -> Protein
  • Similarity searches
  • Pattern and profile searches
  • Post-translational modification prediction
  • Topology prediction
  • Primary structure analysis
  • Secondary structure prediction
  • Tertiary structure
  • Sequence alignment
  • Gateways
  • Phylogenetic analysis
  • Biological text analysis

Other proteomics resources (info from Online Bioinformatics Resources Collection)

PEP — Predictions for Entire Proteomes

  • The database contains summaries of analyses of protein sequences (open reading frames) from a range of organisms representing all three major kingdoms of life: eukaryotes, prokaryotes and archaea.
  • All proteins publicly available for organisms were aligned against SWISS-PROT, TrEMBL and PDB.
  • Additionally, the following annotations are provided: secondary structure, transmembrane helices, coiled coils, regions of low complexity, signal peptides, PROSITE motifs, nuclear localization signals and classes of cellular function. Proteins that contain long regions without regular secondary structure are also identified.

ISPIDER Central – an integrated database web-server for proteomics

  • ISPIDER Central Proteomic Database search is an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine.
  • It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients.

Motif searching, predictive methods

PROSITE - Protein families and domains.

  • Database of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. Currently contains patterns and profiles specific for >1000 protein families or domains.

List of PROSITE entries - Background information for each of these protein signatures is provided.

Sequence manipulation

Sequence Manipulation Suite

  • The Sequence Manipulation Suite is a collection of JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences. It is commonly used by molecular biologists, for teaching, and for program and algorithm testing.
  • See the about the Sequence Manipulation Suite page for more information about individual Sequence Manipulation Suite programs.
  • You can easily mirror the Sequence Manipulation Suite on your own web site, or you can use it off-line.

Sequence alignments

ClustalW2 – also available from the UniProt site…

  • a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
  • ClustalW@ FAQ includes information about supported sequence formats
  • Download Clustal to run locally
  • Help documentation
  • Multiple sequence alignment with the Clustal series of programs. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Nucleic Acids Res. 2003 Jul 1;31(13):3497-500.

Sequences

Other resources/helpful links

NCBI Mini-Course, “Making Sense of DNA and Protein Sequences”, by Medha Bhagwat and David Wheeler.

Special ExPASY features:

NCBI’s Proteome Analysis tools:

Other Proteomics Websites:

References

  • Baxevanis, A.D. and Ouellette, B.F.F., eds., Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, third edition. Wiley, 2005. ISBN 0-471-47878-4
  • Geer, R.C., Messersmith, D.J, Alpi, K., Bhagwat, M., Chattopadhyay, A., Gaedeke, N., Lyon, J., Minie, M.E., Morris, R.C., Ohles, J.A., Osterbur, D.L. & Tennant, M.R. 2002. NCBI Advanced Workshop for Bioinformatics Information Specialists. [Online] Protein Analysis. http://www.ncbi.nlm.nih.gov/Class/NAWBIS/. [date revised July 23, 2006; date cited February 15, 2009]
  • Chen, YB, Chattopadhyay A., Bergen P., Gadd C and Tannery N. 2007. The online Bioinformatics resources collection at the University of Pittsburgh Health Sciences Library System - A one-stop gateway to online Bioinformatics databases and software tools. Nucleic Acids Research 2007 Database Issue, 35:D780-D785 http://www.hsls.pitt.edu/guides/genetics/obrc [date cited February 15, 2009]

没有评论:

发表评论