Frequently Asked Questions:
Q: What is TC-system?
- A: The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, except that it incorporates both functional and phylogenetic information. For a detailed definition and the ability to navigate the full TC system hierarchy click here.
Q: How can I search TCDB with a specific TC Identification number (TC ID or TC #)?
- A: You can search TCDB with a TC ID at various classification levels. For instance,
1) you can search for a TC-Class with just one digit representing the that Class (e.g., 2),
2) for a TC-Subclass, you can search a number followed by a letter (e.g., 2.A),
3) for a TC-Family (Superfamily), you can search with 3 digits representing that Family (e.g., 2.A.1),
4) for a subfamily belonging to a family, you can search with a 4 digits TC # (e.g., 2.A.1.1) and
5) for a specific transporter, you can search with a 5 digit TC # (e.g., 2.A.1.1.1)
For other types of queries that you can submit to TCDB, please follow our tutorials: The "Search TCDB" Box, The Substrate Search Tool and Transporters and Human Diseases.
Q: What is a TC Family? What's the difference between a TC Family and a TC Superfamily?
- A: Because of the requirements of the IUBMB for a static system, the distinction between superfamily and family is blurred. "Superfamily" is defined as a large family consisting of sequence divergent members. Thus, with the identification of homology between distantly related families, superfamilies can be created. If the superfamily was recognized before the IUBMB requirement for a static system was implemented, superfamilies are listed under a single TC family number (e.g., 2.A.1), but if identified later, the families within a superfamily assume different TC numbers (e.g., 2.A.2). All superfamilies can be viewed on the "SUPERFAMILIES" tab in the front TCDB page.
Q: What are the other ways to search TCDB?
- A: You can also search TCDB with Key words including protein names and abbreviations, author names, UniProt protein accession numbers (e.g., P02916). For more details and step-by-step examples, please follow our Search box tutorial. Furthermore, TCDB has adapted the ChEBI ontology to annotate substrates, to query TCDB for all systems involved in the transport a given substrate you follow our tutorial The Substrate search tool. You can also search for transporters involved in human diseases. For details and examples follow our tutorial on Transporters and Human Diseases.
Q: How are TC identifiers assigned to transporter proteins?
- A: Any two homologous transport systems that belong to same subfamily of a TC family transport the same substrate(s), regardless of whether they are orthologues (e.g., arose in distinct organisms by speciation) or paralogues (e.g., arose within a single organism by gene duplication). Close homologs to well characterized transport systems are not added to TCDB because their sequence and function are already well represented in the database. However, some close homologs may still be added to TCDB if there are noticeable biological differences (e.g., in mode of regulation or transported substrates). Homologues of unknown function are not normally assigned a TC# unless they are distant enough to represent a unique (sub)family or are from an unrepresented organismal kingdom. If multiple subunits comprise a transport system (also referred to as multi-component system), all subunits all listed under the same system (5 digit TC ID).
Q: Where are TC Classes 6 and 7?
- A: Classification categories 6 and 7 are reserved for future, yet to be discovered classes. If and when new classes are discovered, they will receive these TC class numbers.
Q: What are TC Classes 8 and 9 for?
- A: Classification categories 8 and 9 are reserved for accessory transport proteins and incompletely characterized (putative) families of transporters, respectively. You can see the definitions of all classes, subclasses, families and sufamilies by browsing our TC sytem.
Q: What are the other databases to which TCDB is linked?
-
A: TCDB is now linked to several important databases, which include
UniProtKB (Universal Protein Resource Knowledgebase), PDB (Protein Databank), NCBI RefSeq (Reference Sequence database), Pfam (Protein Families domain databases), NCBI Gene (Searchable database of genes), KEGG (Kyoto Encyclopedia of Genes and Genomes), NCBI OMIM (Online Mendelian Inheritance in Man), GO (Gene Ontology), BioCyc (Pathway/Genome Databases), DIP (Database of Interacting Proteins), EchoBASE (Escherichia coli dataBases), eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups).
Q: Is it possible to download all the TCDB protein sequences?
- A: Yes, you can go to the "DOWNLOAD" link on the left side menu bar of the TCDB Home page. Once in the DOWNLOAD page, click on the link "All proteins in TCDB (FASTA format)" to download the sequences. Alternatively, you can download from our software repository the perl script extractFamily. This script downloads all or a subset of sequences from TCDB based on TCIDs. Sequences are downloaded in fasta format and can also be formated as a blast database (here is the manual).
Q: Are the software tools you use available for download?
- A: Yes, they are all freely available for download in our software repository.
Q: How can I contribute my results to TCDB?
- A: Please, get in touch with us by E-mail. You can also send us a message through the contact page in our lab's website.
Q: How can I leave suggestions and feedback about TCDB?
- A: Please, get in touch with us by E-mail. You can also send us a message through the contact page in our lab's website.
Q: How often is TCDB updated?
- A: Usually every week, but it mainly depends on the availability of new data. Note, however, that the links provided to download data from TCDB run scripts that always extract the most current data from TCDB.
Q: From where do data for TCDB come?
- A: TCDB entries come from published (or occasionally unpublished) data, evaluated by our curators. The process of screening the literature has been greatly enhanced with the introduction of Machine Learning programs that distinguish documents relevant to TCDB from irrelevant ones. The classifier ranks new, unlabeled documents and pass them onto an human expert. They are then carefully checked/analyzed before inserting into TCDB. Since TCDB is a representative database of transporter, not all the functionally characterized transport systems are included, particularly orthologues with the same function (see the question above on how TCIDs are assinged).
Q: How can I search TCDB with my sequence?
- A: You can search homologus in TCDB for any query sequence using TC-BLAST. Our BLAST database is updated as soon as a new system is entered into TCDB. You can follow our step-by-step TC-BLAST tutorial for more details.
Q: What is FASTA format?
- A:
The FASTA format is
used to represent either nucleotide or peptide sequences in plain text. Base pairs or amino
acids are represented using single-letter codes. A sequence in FASTA format begins with a
single-line description, followed by lines of sequence data. The description line is
distinguished from the sequence data by a greater-than (">") symbol in the first position
of the line. Here is an example:
>P0AEP1|GALP_ECOLI Galactose-proton symporter - Escherichia coli. MPDAKKQGRSNKAMTFFVCFLAALAGLLFGLDIGVIAGALPFIADEFQITSHTQEWVVSSMMFGAAVGAVGSGWLSFKLG RKKSLMIGAILFVAGSLFSAAAPNVEVLILSRVLLGLAVGVASYTAPLYLSEIAPEKIRGSMISMYQLMITIGILGAYLS DTAFSYTGAWRWMLGVIIIPAILLLIGVFFLPDSPRWFAAKRRFVDAERVLLRLRDTSAEAKRELDEIRESLQVKQSGWA LFKENSNFRRAVFLGVLLQVMQQFTGMNVIMYYAPKIFELAGYTNTTEQMWGTVIVGLTNVLATFIAIGLVDRWGRKPTL TLGFLVMAAGMGVLGTMMHIGIHSPSAQYFAIAMLLMFIVGFAMSAGPLIWVLCSEIQPLKGRDFGITCSTATNWIANMI VGATFLTMLNTLGNANTFWVYAALNVLFILLTLWLVPETKHVSLEHIERNLMKGRKLREIGAHD
Q: What is UniProt Accession number?
- A: UniProt accession numbers consist of 6 or 10 alphanumerical characters (e.g., P48048, P0A334, A0A023GPI8, etc.). An accession number is assigned to each sequence upon inclusion into UniProt. TCDB stores the UniProt accession numbers assigned to every protein. When a UniProt accession is not available for a protein, we use the NCBI RefSeq accession.