Abstract
Background The data of human complete proteome in the databases of Universal Protein Resource (UniProt) or National Center for Biotechnology Information(NCBI) were disorderly organized and hardly handled by an ordinary biologist.
Results The HICL table enable an ordinary biologist efficiently to handle the human complete proteome with 67911 entries, to get an overview on the distribution of the physicochemical features of all proteins in the human complete proteome, to perceive the details of the distribution patterns of the physicochemical features in some protein family members and protein variants, to find some particular proteins.
Moreover, two discoveries were made via the HICL table: (1) The amino aicds(Asp,Glu) have symmetrical trend of the distributions versus pI, but the amino aicds(Arg, Lys) have local asymmetrical trend of the distributions versus pI in human complete proteome. (2) Protein sequence, besides amino acid properties, can in theory influence the modal distribution of protein isoelectric points.
Conclusion I has created the HICL table as a robust tool for orderly managing 67911 proteins in human complete proteome by their physicochemical features, the names and sequences. Any proteins with the particular physicochemical features can be screened out from the human complete proteome via the HICL table. In addition, the unbalanced distribution of the amino aicds(Arg, Lys) in high pI proteins of human complete proteome and the effect of protein sequence on modal distribution of protein isoelectric points have been discovered through the HICL table.
Abbreviations
- 2D-PAGE
- two-dimensional polyacrylamide gel electrophoresis
- AAC
- Amino acid composition
- AAs
- amino acids
- Ala
- Alanine
- Annot1
- Annotation1
- Annot2
- Annotation2
- Arg
- Arginine
- Asp
- Aspartic acid
- Asn
- Asparagine
- Cys
- Cysteine
- DDB1
- damage-specific DNA binding protein1
- DNA
- deoxyribonucleic acid
- F-box
- a protein structural motif of about 50 amino acids that mediates protein–protein interactions
- Gln
- Glutamine
- Glu
- Glutamic acid
- Gly
- Glycine
- His
- Histidine
- HP
- Hydrophobicity
- ID
- identification
- Ile
- Isoleucine
- Leu
- Leucine
- Lys
- Lysine
- Met
- Methionine
- MS
- Mass spectrometry
- MTS
- Met-truncated sequence(derived from full protein sequence by eliminating the initial methionine)
- MW
- Molecular weight
- NCBI
- National Center for Biotechnology Information
- NO
- Number
- PDZ
- a common structural domain of 80-90 amino-acids found in the signaling proteins of bacteria, yeast, plants, viruses[1] and animals
- Phe
- Phenylalanine
- pI
- Isoelectric point
- Pfam
- a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)
- Pro
- Proline
- Ser
- Serine
- SL
- Sequence length
- sORF
- short open reading frames
- Thr
- Threonine
- Trp
- Tryptophan
- Tyr
- Tyrosine
- UniProt
- Universal Protein Resource
- Val
- Valine