kisterae

Tuesday, April 25, 2006

structural alignment services


1topofitexamplepaperit seems this one will consider side chain atoms.
2multiprotpaperMultiProt finds the common geometrical cores between input molecules.
3POSAexampleBioinformatics, 2005 21(10):2362-2369can tell you the core, and also tell you what's in common and what's different between groups.
4MASSexamplealign by secondary structure
4SSMas Dr. Shibata shows youpapersalign by secondary structure

Monday, April 24, 2006

web service for test pattern's accuracy

http://www.cs.nyu.edu/~ysc212/bioinfo/scop/scoptest/scoptest.html

input1: a list of scop ids (ex:88535)
input2: a list of PDB id and chain id (ex:1cd0a)

you'll get Ture Positive and False Positive rates.

Sunday, April 23, 2006

Test pattern's accuracy

These are the pdb codes which hits by our pattern
1aqkl 1b2wl 1bjma 1bjmb 1cd0a 1cd0b 1fe8l 1fe8m 1fe8n 1fnsl 1hkfa 1jgll 1jhkl 1jvka 1jvkb 1ku4l 1lgva 1lgvb 1lhza 1lhzb 1oakl 1pewa 1pewb 1pw3a 1pw3b 1rzfl 1t04a 1t04c 1t3fa 1tjgl 1tjhl 1tjil 1u8ha 1u8ia 1u8ja 1u8ka 1u8la 1u8ma 1u8na 1u8oa 1u8pa 1u8qa 1u91a 1u92a 1u93a 1u95a 1zvoa 1zvob 2cd0a 2cd0b 2f5al 2f5bl 2fb4l 2fl5a 2fl5c 2fl5e 2fl5l 2ig2l 2loia 2loib 2rhe 3bjla 3bjlb 4bjla 4bjlb 7fabl 8faba 8fabc

this are scop ids of our representitives for motif 3A family 1.1.1
88535 88536 88537 88538 88539 88540 88541 88542 48733 89177 89179 49180 49181 81955 49189 63664 74844

the accuracy is 29/39(+29)
the false positive rate is 10/39

the false positive only accord in children of these two node:
88533 88531
which is acturally very similar to 3A (have same core), but have some extra strands then 3A.
FYI: 88533 is 2imn, 88531 is 1mqk-l, they all belongs to motif 1A.

after add these two nodes, the false positive rate is 0.

Wednesday, April 19, 2006

testing rmsd

1aqk 10,18,20,22,37,49,65,75,77,87,88,90,106,107,109,110;

1ie5 origenal
21,31,33,35,47,54,64,72,74,84,85,87,101,102,104,105;
1.56A

moved same direction in space one residue (some already in loop)
20,30,32,34,46,55,65,71,73,85,86,88,100,101,103,104;
3.85A

moved same direction in sequence one residue
20,30,32,34,46,53,63,71,73,83,84,86,100,101,103,104;
4.38A

moved only one strand
21,30,32,34,47,54,64,72,74,84,85,87,101,102,104,105;
2.41A

-------------another case--------------
origenal 1aqk vs 1kv3
$S1KV3 = select in "1KV3" num 31,39,41,43,59,77,91,102,104,116,117,119,134,135,137,138;
2.14A

shift 1kv3 by 2 residues
$S1KV3 = select in "1KV3" num 33,37,39,41,61,75,89,104,106,114,115,117,136,137,139,140;
2.17A

shibata's
$S1KV3 = select in "1KV3" num 33,37,39,41,59,76,89,104,106,115,116,118,136,137,139,140;
2.29A

pattern in 3A

sequence for 3A in fasta format

>1
NFMLNQPHSVSESPGKTVTISCTRSSGNIDSNYVQWYQQRPGSAPITVIYEDNQRPSGVPDRFAGSIDRSSNSASLTISGLKTEDEADYYCQSYDARNVVFGGGTRLTVLG
>2
ESVLTQPPSASGTPGQRVTISCTGSATDIGSNSVIWYQQVPGKAPKLLIYYNDLLPSGVSDRFSASKSGTSASLAISGLESEDEADYYCAAWNDSLDEPGFGGGTKLTVLGQPK
>3
NVLTQPPSVSGAPGQRVTISCTGSNSNIGAGFTVHWYQHLPGTAPKLLIFANTNRPSGVPDRFSGSKSGTSASLAITGLQAEDEADYYCQSYDSSLSARFGGGTRLTVLG
>4
ASVLTQPPSVSGAPGQRVTISCTGSSSNIGAGHNVKWYQQLPGTAPKLLIFHNNARFSVSKSGTSATLAITGLQAEDEADYYCQSYDRSLRVFGGGTKLTVLR
>5
TALTQPASVSGSPGQSITVSCTGVSSIVGSYNLVSWYQQHPGKAPKLLTYEVNKRPSGVSDRFSGSKSGNSASLTISGLQAEDEADYYCSSYDGSSTSVVFGGGTKLTVLG
>6
ELTQPPSVSVSPGQTARITCSANALPNQYAYWYQQKPGRAPVMVIYKDTQRPSGIPQRFSSSTSGTTVTLTISGVQAEDEADYYCQAWDNSASIFGGGTKLTV
>7
YELIQPSSASVTVGETVKITCSGDQLPKNFAYWFQQKSDKNILLLIYMDNKRPSGIPERFSGSTSGTTATLTISGAQPEDEAAYYCLSSYGDNNDLVFGSGTQLTVLRGPKSSPKVTVFPPSPEELRTNKATLVCLVNDFYPGSATVTWKANGATINDGVKTTKPSKQGQNYMTSSYLSLTADQWKSHNRVSCQVTHEGETVEKSLSPAECL
>8
TWGVSSPKNVQGLSGSCLLIPCIFSYPADVPVGITAIWYYDYSGKRQVVIHSGDPKLVDKRFRGRAELMGNMDHKVCNLLLKDLKPEDSGTYNFRFEISSNRWLDVKGTTVTVTT
>9
SNRKDYSLTMQSSVTVQEGMCVHVRCSFSYPVDSDTDSDPVHGYWFRAWKAPVATNNPAWAVQEETRDRFHLLGDPQTKNCTLSIRDARMSDAGRYFFRMEKGNIKWNYKYDQLSVNVTALT
>10
SKAQVLQSVAGQTLTVRCQYPPTGSLYEKKGWCKEASALVCIRLVTSSKPRTMAWTSRFTIWDDPDAGFFTVTMTDLREEDSGHYWCRIYRPSDNSVSKSVRFYLVVS
>11
GKDIQVIVNVPPSVRARQSTMNATANLSQSVTLACDADGFPEPTMTWTKDGEPIEQEDNEEKYSFNYDGSELIIKKVDKSDEAEYICIAENKAGEQDATIHLKVFAK
>12
MPVAPYWTSPEKMEKKLHAVPAAKTVKFKCPSSGTPQPTLRWLKNGKEFKPDHRIGGYKVRYATWSIIMDSVVPSDKGNYTCIVENEYGSINHTYQLDVVERSPHRPILQAGLPANKTVALGSNVEFMCKVYSDPQPHIQWLKHIEVNGSKIGPDNLPYVQILKTAGVNTTDKEMEVLHLRNVSFEDAGEYTCLAGNSIGLSHHSAWLTVL
>13
NKRAPYWTNTEKMEKRLHAVPAANTVKFRCPAGGNPMPTMRWLKNGKEFKQEHRIGGYKVRNQHWSLIMESVVPSDKGNYTCVVENEYGSINHTYHLDVVERSPHRPILQAGLPANASDVEFVCKVYSDAQPHIQWIKHVPYLKVLKAAGVNTTDKEIEVLYIRNVTFEDAGEYTCLAGNSIGISFHSAWLTVL
>14
KRAPYWTNTEKEKRLHAVPAANTVKFRCPAGGNP#VALUE!T#VALUE!WLKNGKEFKQEHRIGGYKVRNQHWSLI#VALUE!SVVPSDKGNYTCVVENEYGSINHTYHLDVVERSPHRPILQAGLPANASTVVGGDVEFVCKVYSDAQPHIQWIKHVEKNGSKYGPDGLPYLKVLKHSGINSSNAEVLALFNVTEADAGEYICKVSNYIGQANQSAWLTVLP
>15
GRPFVEMYSEIPEIIHMTEGRELVIPCRVTSPNITVTLKKFPLDTLIPDGKRIIWDSRKGFIISNATYKEIGLLTCEATVNGHLYKTNYLTHRQT





in Pratt
use parameter:
-PN 60 -PX 10 -FL 100 -FP 100

let it match about 12 sequences

you'll get this pattern:
L-X-[IL]-X(2)-[ALV]-X(2)-[AES]-D-[AE]-[AG]-[ADE]-Y-X-C-X-[ASV]

play with the parameters, you'll get others.

meeting 060420


1. alignment of all proteins in one motif:
1. align by sequence
2. align by structure
3. align by h-bonds
4. align by pattern
2. distances between conserved positions
1. previous
2. next
3. scan for patterns
1. scanprosite
4. advantage
1. supermotif
2. prove: more conserved positions
5.

6. it depands on what they are doing, what is their database about?

pattern for 3A from H-bond alignment

After add the missing K+3 strand
I got this pattern for 3A

[AHLMV]-X(6,9)-[AFILV]-x-[FILV]-x-[CL]-X(11,16)-[FGLW]-X(6,17)-[FILNV]-X(8,17)-[AEIKLST]-X(7,11)-[FILV]-x-[DILM]-X(9,11)-[ADEFHLNRT]-[FLY]-x-[CFLM]-X(13,16)-[HKLNQRVY]-[FLTV]-x-[ALRTV]-[FLQSTV]

when run on scanprosite, I got 169 hits, which is much better then last week.
although still have some false positive like: 1EIY-B, 1E6E-C, 1JJC-B, etc.
we should have some tools to test positive rates by given a PDBID list.

Tuesday, April 11, 2006

don't combine them

last time I can't find an example,
actually, I don't suggest combine these two.
I II< III<
I> II< III<
1. because if there's one day you see I< II< III<, then you don't know I II< III< should belongs to which one.
2. actually, I II<> II> which might be combine with I> III II and I> III> II and I> III> II<, but you'll see I II<> III> II< as different one.

Sunday, April 09, 2006

pdbsum subset of residues which can be aligned by using structure alignment


1 9 8 7 3 4
0' 2 6 5

1aqk
1 9 12 A 4 Yes SVSG
3 36 40 A 5 No HWYQH
4 47 50 A 4 Yes KLLI
7 86 93 A 8 No ADYYCQSY
8 99 101 A 3 Yes ARF
9 105 109 A 5 No TRLTV
2 18 23 B 6 Yes VTISCT
5 64 69 B 6 Yes FSGSKS
6 72 77 B 6 No SASLAI

2 10 9 5 6
1 4 3 8 7

1ie5
1 10 16 A 7 Yes VPPSVRA
4 35 40 A 6 Yes CDADGF
2 20 24 B 5 Yes TMNAT
5 44 49 B 6 No TMTWTK
6 52 53 B 2 Yes EP
9 83 90 B 8 No AEYICIAE
10 95 105 B 11 No EQDATIHLKVF
3 29 33 C 5 Yes QSVTL
7 63 64 C 2 Yes YS
8 72 77 C 6 No LIIKKV

0' 1 2 3 4 5 6 7 8 9
LTQ SVSG VTI HWYQ KL FS LAI ADYYCQSYARF TRLTV
VRA TMNA VTL TWTK EP YS LII AEYICIAEEQD H LKV
1 2 3 5 6 7 8 9 10