papers AI Learner
The Github is limit! Click to go to the new site.

On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences

2018-07-26
Pablo Turjanski, Diego U. Ferreiro

Abstract

All known terrestrial proteins are coded as continuous strings of ~20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for ‘repetition’, an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of pattern occurrences indicate that short repetitions are enough to account for the differences between natural families and randomized groups by more than 10 standard deviations, while patterns shorter than 5 residues are effectively random. A small subset of patterns is sufficient to account for a robust ‘‘familiarity’’ definition of arbitrary sets of sequences.

Abstract (translated by Google)
URL

https://arxiv.org/abs/1807.10394

PDF

https://arxiv.org/pdf/1807.10394


Similar Posts

Comments