Can you help me in sorting sequences alphabetically?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
joeytogo
Posts: 10
Joined: Thu Oct 14, 2010 2:43 am
Location: US

Can you help me in sorting sequences alphabetically?

Post by joeytogo »

Hi,
Can you tell me how I could sort lists of sequences alphabetically? My files look like the list below. I want to sort by the .S--- suffix on the accessions. Can I do this in Textpad?
Thanks, Joey

>YP_139939.Sthe
MTDELKNLEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>NP_687647.Saga
MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>YP_002996415.Sdys
MIEENKQVEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>NP_721651.Smut
MTEENKNLDQLAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>ZP_06611811.Soralis
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ABV56769.Soli
LSTQLDVRVHKNGKIYYQEYHRGNVVADLEVVGDTDKTGTT
>NP_802640.Spyo
MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>ZP_07458916.Soral
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE

>ZP_06199258.SM143
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE

>ZP_06611811.Sora
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE

>ABV56734.Sora
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT

>ABV56733.Smit
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT

>BAE97486.Sora
GGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYR

>ABV56794.Sinf
LSTQLDVRVHKNGKIHYQEYRRGHVVADLEVIGDTDKTGTI
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

There is a lot of missing information in your post, to help you properly. Is the data in a file? Is this a Windows command prompt output? Below is what I thought you meant. I sorted by "S" following the dot "." on the the > prompt line and also assumed that the long uppercase alphanumeric code follows the line above beginning with the >.

.SM143 is on top because of the ANSI/ASCII character order, then it's followed by .Saga

>ZP_06199258.SM143
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_687647.Saga
MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>YP_002996415.Sdys
MIEENKQVEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>ABV56794.Sinf
LSTQLDVRVHKNGKIHYQEYRRGHVVADLEVIGDTDKTGTI
>ABV56733.Smit
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>NP_721651.Smut
MTEENKNLDQLAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
>ABV56769.Soli
LSTQLDVRVHKNGKIYYQEYHRGNVVADLEVVGDTDKTGTT
>BAE97486.Sora
GGGYKVSGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYR
>ABV56734.Sora
LSTQLDVHVHKNGKIHYQEYRRGHVVADLEVVGDTDKTGTT
>ZP_06611811.Sora
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_07458916.Soral
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>ZP_06611811.Soralis
MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_802640.Spyo
MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTA
>YP_139939.Sthe
MTDELKNLEEKAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
joeytogo
Posts: 10
Joined: Thu Oct 14, 2010 2:43 am
Location: US

Post by joeytogo »

Thanks,
Yes, this is what I wanted to do. How did you do this?
Find what and replace with what?
Joey Togo
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

joeytogo wrote:Thanks,
Yes, this is what I wanted to do. How did you do this?
Find what and replace with what?
Joey Togo
I changed every pair into a single line by identifying, the uniqueness of the [.] and the [\n>] and replaced it with \t tab. This created 3 columns, which I sorted by the 2nd column. Then I reversed the process.

>ZP_06199258. {TAB} SM143 {TAB} MTEEIKNQQAQDYDASQIQVLEGLEAVRMRPGMYIGSTSKE
>NP_687647. {TAB} Saga {TAB} MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGMYIGSTS
Post Reply