Frequently Asked Questions:
- A: The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, except that it incorporates both functional and phylogenetic information. Descriptions, TC numbers, and examples of over 600 families of transport proteins are provided.
Transport systems are classified on the basis of five criteria, and each of these criteria corresponds to one of the five numbers or letters within the TC# for a particular type of transporter. Thus a TC # normally has five components as follows: V.W.X.Y.Z. V (a number) corresponds to the transporter class (i.e., channel, carrier (porter), primary active transporter, group translocator or transmembrane electron flow carrier); W (a lettter) corresponds to the transporter subclass which in the case of primary active transporters refers to the energy source used to drive transport; X (a number) corresponds to the transporter family (sometimes actually a superfamily); Y (a number) corresponds to the subfamily in which a transporter is found, and Z corresponds to a specific transporter with a particular substrate or range of substrates transported.
Q: How can I search TCDB with a specific TC Identification number (TC ID or TC #)?
- A: You can search TCDB with a TC ID at various classification levels. For instance,
1) you can search for a TC-Class with just one digit representing the that Class (e.g., 2),
2) for a TC-Subclass, you can search with number followed by a letter representing the Subclass (e.g., 2.A),
3) for a TC-Family (Superfamily), you can search with 3 digits representing that Family (e.g., 2.A.1),
4) for a subfamily belonging to a family (or family belonging to a superfamily), you can search with a 4 digits TC # (e.g., 2.A.1.1) and
5) for a specific transporter, you can search with a 5 digit TC # (e.g., 2.A.1.1.1)
- A: Because of the requirements of the IUBMB for a static system, the distinction between superfamily and family is blurred. "Superfamily" is defined as a large family consisting of sequence divergent members. Thus, with the establishment of homology between distantly related families, superfamilies can be created. These superfamilies can be found under "Superfamilies". If the superfamily was recognized before the IUBMB requirement for a static system was implemented, superfamilies are listed under a single TC family number (e.g., 2.A.1), but if implemented later, the families within a superfamily assume different TC numbers (e.g., 2.A.2). All superfamilies can be viewed on the "SUPERFAMILIES" hyperlink.
Q: What are the other ways to search TCDB?
- A: You can also search TCDB with Key words including protein names and abbreviations, author names, UniProtKB's protein Accession numbers (e.g., P02916) or Protein Databank's PDB (structure) ID (e.g., 1JQ1).
Q: How are TC IDs assigned to transporter proteins?
- A: Any two transport systems in the same subfamily of a transporter family that transport the same substrate(s) are given the same TC#, regardless of whether they are orthologues (e.g., arose in distinct organisms by speciation) or paralogues (e.g., arose within a single organism by gene duplication). Sequenced homologues of unknown function are not normally assigned a TC# unless they represent a unique (sub)family or are from an unrepresented organismal kingdom. If multiple dissimilar subunits comprise a transport system, all are listed under a single 5 digit TC ID.
Q: Where are TC Classes 6 and 7?
- A: Classification categories 6 and 7 are reserved for future, yet to be discovered classes. If and when new classes are discovered, they will receive these TC class numbers.
Q: What are TC Classes 8 and 9 for?
- A: Classification categories 8 and 9 are reserved for accessory transport proteins and incompletely characterized (putative) families of transporters, respectively.
Q: What are the other databases to which TCDB is linked?
A: TCDB is now linked to several important databases, which include
UniProtKB (Universal Protein Resource Knowledgebase), PDB (Protein Databank), RefSeq (Reference Sequence database), Pfam (Protein Families domain databases), Entrez Gene (Searchable database of genes), KEGG (Kyoto Encyclopedia of Genes and Genomes), OMIM (Online Mendelian Inheritance in Man), GO (Gene Ontology), BioCyc (Pathway/Genome Databases), DIP (Database of Interacting Proteins), EchoBASE (Escherichia coli dataBases), EcoGene (Escherichia coli K-12 dataBases), and eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups)
Q: Is it possible to download all the TCDB protein sequences?
- A: Yes, you can go to the download link, which is on the left menu of the TCDB Home page. It is called "TCDB FastA Sequences".
Q: Are the software tools you use available for download?
- A: Yes, they are all freely available for download in the "Software Download" link on the Home page.
Q: How can I contribute my results to TCDB?
- A: On the Home page there is a "Contribute" hyperlink, which will take you to the Contribute page.
Q: How can I leave suggestions and feedback about TCDB?
- A: Please, go to the "Feedback" hyperlink in the bottom left of the Home page, or you can contact the Saier Lab Bioinformatics Group in the "about" link.
Q: How often is TCDB updated?
- A: Usually every week, but it mainly depends on the availability of new data. However,the update of the www page as well as the ID mapping files (in the download link) are updated as soon as a new entry is inserted into the database.
- A: TCDB entries come from published (or occasionally unpublished) data, evaluated by our curators. The process of screening the literature has been greatly enhanced with the introduction of Machine Learning programs that distinguish documents relevant to TCDB from irrelevant ones. The classifier ranks new, unlabeled documents and pass them onto an human expert. They are then carefully checked/analyzed before inserting into TCDB. Since TCDB is a representative database of transporter, not all the functionally characterized transport systems are included, particularly orthologues with the same function (see above).
- A: You can search again TCDB using BLAST search (http://www.tcdb.org/progs/blast.php) either with your protein sequence in FASTA format or with UniProt protein accession number (e.g., P48048). The BLAST search is always kept up-to-date with the sequences updated as soon as they are entered into database.
FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column.
Here is an example:
>P0AEP1|GALP_ECOLI Galactose-proton symporter - Escherichia coli. MPDAKKQGRSNKAMTFFVCFLAALAGLLFGLDIGVIAGALPFIADEFQITSHTQEWVVSSMMFGAAVGAVGSGWLSFKLG RKKSLMIGAILFVAGSLFSAAAPNVEVLILSRVLLGLAVGVASYTAPLYLSEIAPEKIRGSMISMYQLMITIGILGAYLS DTAFSYTGAWRWMLGVIIIPAILLLIGVFFLPDSPRWFAAKRRFVDAERVLLRLRDTSAEAKRELDEIRESLQVKQSGWA LFKENSNFRRAVFLGVLLQVMQQFTGMNVIMYYAPKIFELAGYTNTTEQMWGTVIVGLTNVLATFIAIGLVDRWGRKPTL TLGFLVMAAGMGVLGTMMHIGIHSPSAQYFAIAMLLMFIVGFAMSAGPLIWVLCSEIQPLKGRDFGITCSTATNWIANMI VGATFLTMLNTLGNANTFWVYAALNVLFILLTLWLVPETKHVSLEHIERNLMKGRKLREIGAHD
UniProt accession numbers consist of 6 alphanumerical characters. e.g.,P48048, P0A334, etc.Accession number is assigned to each sequence upon inclusion into UniProtKB. Since, almost all the proteins entries (except the once that are not currently available) in TCDB are from UniProtKB, we also retain the accession numbers assigned to them.