Hi,
Can you tell me how I could sort lists of sequences alphabetically? My files look like the list below. I want to sort by the .S--- suffix on the accessions. Can I do this in Textpad?
Thanks, Joey
>YP_139939.Sthe
MTDELKNLEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>NP_687647.Saga
MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>YP_002996415.Sdys
MIEENKQVEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>NP_721651.Smut
MTEENKNLDQLAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>ZP_06611811.Soralis
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ABV56769.Soli
LSTQLDVRVHKNGKIYYQEYHRGNVVADLEVVGDTDKTGTT
>NP_802640.Spyo
MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>ZP_07458916.Soral
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_06199258.SM143
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_06611811.Sora
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ABV56734.Sora
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>ABV56733.Smit
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>BAE97486.Sora
GGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYR
>ABV56794.Sinf
LSTQLDVRVHKNGKIHYQEYRRGHVVADLEVIGDTDKTGTI
Can you help me in sorting sequences alphabetically?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
There is a lot of missing information in your post, to help you properly. Is the data in a file? Is this a Windows command prompt output? Below is what I thought you meant. I sorted by "S" following the dot "." on the the > prompt line and also assumed that the long uppercase alphanumeric code follows the line above beginning with the >.
.SM143 is on top because of the ANSI/ASCII character order, then it's followed by .Saga
>ZP_06199258.SM143
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_687647.Saga
MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>YP_002996415.Sdys
MIEENKQVEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>ABV56794.Sinf
LSTQLDVRVHKNGKIHYQEYRRGHVVADLEVIGDTDKTGTI
>ABV56733.Smit
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>NP_721651.Smut
MTEENKNLDQLAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>ABV56769.Soli
LSTQLDVRVHKNGKIYYQEYHRGNVVADLEVVGDTDKTGTT
>BAE97486.Sora
GGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYR
>ABV56734.Sora
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>ZP_06611811.Sora
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_07458916.Soral
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_06611811.Soralis
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_802640.Spyo
MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>YP_139939.Sthe
MTDELKNLEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
.SM143 is on top because of the ANSI/ASCII character order, then it's followed by .Saga
>ZP_06199258.SM143
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_687647.Saga
MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>YP_002996415.Sdys
MIEENKQVEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>ABV56794.Sinf
LSTQLDVRVHKNGKIHYQEYRRGHVVADLEVIGDTDKTGTI
>ABV56733.Smit
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>NP_721651.Smut
MTEENKNLDQLAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>ABV56769.Soli
LSTQLDVRVHKNGKIYYQEYHRGNVVADLEVVGDTDKTGTT
>BAE97486.Sora
GGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYR
>ABV56734.Sora
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>ZP_06611811.Sora
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_07458916.Soral
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_06611811.Soralis
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_802640.Spyo
MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>YP_139939.Sthe
MTDELKNLEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
I changed every pair into a single line by identifying, the uniqueness of the [.] and the [\n>] and replaced it with \t tab. This created 3 columns, which I sorted by the 2nd column. Then I reversed the process.joeytogo wrote:Thanks,
Yes, this is what I wanted to do. How did you do this?
Find what and replace with what?
Joey Togo
>ZP_06199258. {TAB} SM143 {TAB} MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_687647. {TAB} Saga {TAB} MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS