kisterae: July 2005

idea

to plot hits on two trees, we can use microarray style graph.
ex: one tree for scop, another tree for motifs.

software for my algorithm

I might write my algorithm as a Strap plug-ins.

h-bond alignment

This services can align protein sequences
by using structure information
(a helix, b sheet, h-bond, disulphide bond info)
CE-MC:Multiple Protein Structure Alignment Server

parellel info not in one colume, data can not retrieve automatically.
two SM #16
SM#P
wrong SM#23 #19
parelle info not finished (about 1/2 left)
since 4. statistics are not accurate.

scop tree structure on yed

Should be able to modify for plot our result on to scop tree.
data in temp/db/scop/

pattern for 1.1.1 another approch

hard to find pattern using POSA or SSM
so I use our methods,
get interlock strands by using getinterlockstr.pl
retrieve sequences by using getinterlocksq.pl
alignment S1 using genedoc, adj out liner manually by conserved sequences/properties.
then I get this pattern

[FILMVY]-[KLPRSTVYG]-[CV]-X(2,30)-{CEIMPVW}-[VW]-{AGHRDNST}-X(20,60)-[FILMVT]-{CFINPSWY}-[FHILM]-X(2,30)-{CFKPQW}-Y-{ACDGKMPRV}-[CLMRVD]

try these on scanprosite

1bww-a
1cd0-a
1h5b-a
1akj-d
1vge-h
1kxq-e
1neu
1pko-a
1hnf
1dr9-a
1ncn-a
1eaj-a
1cdy
1ccz-a
1qfo-a
1nko-a
1f97-a
1hkf-a
1l6z-a
1i8l-c
1jma-a

next time, I'll try hydrogen bond alignment.

motif translation graph for 1.1.1

Use Structure Superposition (using POSA)
to find common part for all motifs for supermotifs (6 main supermotifs).

to find the common core for all supermotifs.

Some times A and B share some common part, B and C share some common part, C and A share some common part, BUT, if you try to find common part of A and B and C, you'll fail, because they don't have common part.
(just an Idea, not sure yet.)

using statistics to find common and difference between and within strandon.
ex:
1265
8734
is there any thing in common between 1-2, 3-4, 5-6, 7-8?
is there any thing in common between 2-6,3-7?

About Scop's New Version

I just found a parsable version of scop
http://scop.mrc-lmb.cam.ac.uk/scop/parse/index.html
about how to use, search and link it, here's the documents
http://scop.mrc-lmb.cam.ac.uk/scop/release-notes.html

seems those 5 proteins are now importent to us.
because they ARE re-classified, but they are not sandwich nor barrel.
BUT, it seems they introduce about 10 thousands of new proteins in new version, and also new branchs in scop tree.
FYI: about 1/5 new proteins to old version.

PLAN

draw a graph about transformation between motifs and between supermotifs.

3H

WEATA
[AIV]-[DEKNTVY]-[FL]-[DHLNPST]-[CFIV]

AVSQFNAR
QFNA=[FGILMV]-[EFHKLRTY]-[LW]-[EKLQRY] in 3B

KIYFDV
[AFGLSW]-[DGHPST]-[IL]-[IKNQRTV]-[FILV]-[HKLPSTY]

SPTIVAM
[AEIKNQST]-[ADG]-[EFILST]-[FGY]-[EIKLNQTY]-[CLMV]-[ADEIKMSTV]

3G

VKVKV

YIELY

FVLKT

GKLYA

Pattern for 3E

i
v-t-l-t-c
compare to 3B:[AIV]-[DEKNTVY]-[FL]-[DHLNPST]-[CFIV]-

i+1
t-v-h-w-v
compare to 3B:[ADSQERV]-[FGILMV]-[EFHKLRTY]-[LW]-[EKLQRY]

k
R-R-L-L-L-R-S-V-Q
compare to 3B:[AFGLSW]-[DGHPST]-[IL]-[IKNQRTV]-[FILV]-[HKLPSTY]-[AHNPQS]-[ALV]-[DKQRST]

k+1
S-G-N-Y-S-C-Y
compare to 3B:[AEIKNQST]-[ADG]-[EFILST]-[FGY]-[EIKLNQTY]-[CLMV]-[ADEIKMSTV]

pattern for 3S*

pattern not in interlock
1:[CA]-[SR]-[IVL]-[ASVIW]-[YCA]

pattern not in interlock
2:[ISVN]-[VMLWY]-[TLW]-[VF]-[DG]

pattern not in interlock
3:[KR]-[GH]-[VF]-[QRH]-[LI]-[EYLF]

i
[VLA]-[WFYL]-[VALI]-[REN]-[CD]-[LIH]-S

i+1
[DTG]-[HSN]-[ASG]-[VIT]-[FW]-[VL]-[QN]-[SGT]

k
[YNSR]-[ILNR]-[KQP]-[VIL]-[FLN]-[DNS]

k+1
[RE]-[MI]-[STA]-[FVL]-[VGS]

pattern not in interlock
4:[WSV]-[ILF]-[EV]-[ILF]-[HEF]-[LID]

1,2,3,4 form another lock, which seems different from our interlock.

3A not including 1kv3

3A not including 1kv3
i
[VIAL]-[TRLHEV]-[IVLF]-[STPRAMV]-C-[tsiqdkr]-[GAFYV]

i+1
[IHKSYGTQ]-[WL]-[YFCTLIK]-[QYRK]-[QHDAEF]

k
[AVCFSEK]-[STNEVG]-[LVF]-[ATLSIHY]-[IML]-[STKRF]-[GDNK]-[LVA]-[EQKRDST]

k+1
[DE]-[ESAI]-[AG]-[DTRHEL]-[YL]-[YNFWIT]-[FC]-[AQSRILKE]

pattern not in interlock
[GQRTSN]-[TLFIAY]-[KRTSYHWL]-[LVT]-[TNVKH]

pattern for 3B

i, i+1
[AIV]-[DEKNTVY]-[FL]-[DHLNPST]-[CFIV]-X(8,19)-[FGILMV]-[EFHKLRTY]-[LW]-[EKLQRY]

k
[ADNPS]-[AFGLSW]-[DGHPST]-[IL]-[IKNQRTV]-[FILV]-[HKLPSTY]-[AHNPQS]-[ALV]-[DKQRST]-[ILPRSTY](0,1)-[DEKQS]-[DFN]
k+1
[AEIKNQST]-[ADG]-[EFILST]-[FGY]-[EIKLNQTY]-[CLMV]-[ADEIKMSTV]

PS: [ILPRSTY](0,1)-[DEKQS]-[DFN] in k is in the loop.

why last strand close to first strand

maybe it's because sandwich are similar to barrel, but a little bit different.

check

check 69208
why strand 5 connect to 1

PLAN 070105

check if the antiparellel insed interlock
what non standard interlock we have (one parellel, two parellel in interlock, etc)
supermotifs? 6 SM->90%
How many SM?
What transformation from non popular to popular SM?
check if parellel info consistent in same motif
calculate how many "?", check them in CATH
update link for 98235, 63700, 49767
check new structures in new SCOP (about 5 structures)
check if motifs duplicate.

pattern for 1A

strand i,i+1
[VALM]-[TSKRQED]-[VILM]-X-C-X(8,26)-[LWMIVC]-X-W-[YLVFI]
strand k, k+1
[YFALVGI]-X-[FILM]-X-[FILM]-X(8)-[AGS]-X-Y-[YSTFH]-C

I've tried...

this can locate two strands in sandwich, sometimes i,k sometimes i,k+1 (compare 1im3 and 1bww)
[VATYFILM]-X-[CFWP]-X-[VATYFILM]

because 1eeq, I extents the pattern to:
[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]

because 1c5c
[VACTYFILM]-X-[VACTYFILMW]-X-[WLCTGQ]
[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]

seems like if these two patterns covers i,i+1,k,k+1 then...
[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]-x(1,5)-[VACTYFILM]-X-[VACTYFILMW]-X-[WLCTGQ]
[VACTYFILM]-X-[VACTYFILMW]-X-[WLCTGQ]-x(1,5)-[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]

because of 1iak
I extents the pattern to:
[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]-x(2,10)-[VACTYFILM]-X-[VACTYFILMW]-X-[WLCTGQ]
[VACTYFILM]-X-[VACTYFILMW]-X-[WLCTGQ]-x(2,10)-[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]
[WLCTGQ]-X-[VACTYFILM]-X-[VACTYFILMW]-x(2,10)-[VACTYFILM]-X-[CFIWP]-X-[VACTYFILM]

[VACTYFILMWQ]-X-[CFIWP]-X-[VACTYFILMWQ]

VATYFILM

this one can locate s2-s7 for 1im3
{KREQDNS}-X-X-{KREQDNS}-X(10,35)-[KREQDNS]-X-X-[YFW]-X(1,5)-[LVAT]-X-[LVFIM]-x(1,25)-[YVFIL]-X-[CFWP]-X-[VAM]

then I combine the last five position to:
[VATYFILM]-X-[CFWP]-X-[VATYFILM]
which can locate two strands of four strands in interlock.

2A

P-X-[VLITAM]-X-[ILPTAV]-X(0,16)-[VLI]-X-C-X-[LMIAV]-X-X-[FILY]-X-P-X-X-[IAVL]-X-[VFTLIM]-X-[WLMF]-X(0,25)-[DNGST]-X-X-[YFW]-X(1,5)-[LVAT]-X-[LVFIM]-X(1,25)-[YVFIL]-X-C-X-[VAM]-X-[FH]-X(1,27)-[FLVIWAY]

pattern for 2F

P-X-[VLITAM]-X-[ILPTAV]-X(0,16)-[VLI]-X-C-X-[LMIAV]-X-X-[FILY]-X-P-X-X-[IAVL]-X-[VFTLIM]-X-[WLMF]-X(0,25)-[DNGST]-X-X-[YFW]-X(1,5)-[LVAT]-X-[LVFIM]-X(1,25)-[YVFIL]-X-C-X-[VAM]-X-[FH]-X(1,27)-[FLVIWAY]

has to be shorten too.

pattern for 2D

2D
(1 2) 5
6 (3 4)
it's like delete 5 and 8 in 2C
(1 2) (6 5)
(8 7) (3 4)
so after I modified the pattern to
P-X-[VLITAM]-X-[ILPTAV]-X(0,16)-[VLI]-X-C-X-[LMIAV]-X-X-[FILY]-X-P-X-X-[IAVL]-X-[VFTLIM]-X-[WLMF]-X(0,25)-[DNGST]-X-X-[YFW]-X(5)-[LVAT]-X-[LVFIM]-X(0,15)-[YVFIL]-X-C-X-[VAM]-X-[FH]-X(0,27)-[FLVIWAY]
they all contents this pattern.
PS: I only shorten the x lenths in pattern for 2C.
I think I can conferm this by using SSE on 2C+2D

pattern for 2E

protiens in 2E also contents pattern:
P-X-[VLITAM]-X-[ILPTAV]-X(10,16)-[VLI]-X-C-X-[LMIAV]-X-X-[FILY]-X-P-X-X-[IAVL]-X-[VFTLIM]-X-[WLMF]-X(17,25)-[DNGST]-X-X-[YFW]-X(5)-[LVAT]-X-[LVFIM]-X(7,15)-[YVFIL]-X-C-X-[VAM]-X-[FH]-X(8,27)-[FLVIWAY]

seems like 2C => 2E
1265
8734

1254
763

2E only delete strand 4 in 2C, and it's not an importent strand in sandwich.

PS: proteins in 2C but in superfamily 2 doesn't contents that pattern.

pattern for motif 2C

I use SSE to align 2C proteins in our statistics file.
and grab the fasta format result into GeneDoc
then use Physiochemical (H) point of view to get the alignment.
after disgards 1dr9, means I only use fold1-superfamily1-family2,
I got a pattern as follow.

P-X-[VLITAM]-X-[ILPTAV]-X(10,16)-[VLI]-X-C-X-[LMIAV]-X-X-[FILY]-X-P-X-X-[IAVL]-X-[VFTLIM]-X-[WLMF]-X(17,25)-[DNGST]-X-X-[YFW]-X(5)-[LVAT]-X-[LVFIM]-X(7,15)-[YVFIL]-X-C-X-[VAM]-X-[FH]-X(8,27)-[FLVIWAY]

I use ScanProsite to search the pattern

and use SuperFamily to check if the results are sandwich-like proteins. (click Immunoglobulin in Superfamily section)

seems like all proteins matchs this pattern are sandwich-like proteins.

kisterae

Sunday, July 31, 2005

idea

Saturday, July 30, 2005

software for my algorithm

h-bond alignment

Friday, July 29, 2005

Thursday, July 28, 2005

scop tree structure on yed

pattern for 1.1.1 another approch

Wednesday, July 27, 2005

motif translation graph for 1.1.1

Wednesday, July 20, 2005

Friday, July 15, 2005