My Photo
Name:
Location: New York, New York, United States

我叫江奕賢啦

Wednesday, July 19, 2006

predict interlock strands by using hydrophobicity data

after today's meeting,
I think I can use datamining to help this idea,
and by this way, we can get to 88%~94% accuracy just by looking one strand's hydrophocibity data.
(of course, we can imaging if we add other information like beta turn value and consercutive strands info, we might get a higher accuracy then this.)
here's the result:

Relation: hydrotest3-weka.filters.unsupervised.attribute.Discretize-D-B2-M-1.0-R3,18-weka.filters.unsupervised.attribute.Remove-R1,7-8,10-17,19-22
Instances: 51
Attributes: 7
Grand average of hydropathicity
H
seqlen
freq
count
max
interlock

weka.classifiers.rules.Ridor -F 4 -S 10 -N 2.0
score 88.2%

interlock = '(-inf-0.5]' (51.0/33.0)
Except (max > 2.5) and (seqlen > 7.5) => interlock = '(0.5-inf)' (12.0/0.0) [3.0/0.0]
Except (max > 2.5) and (freq <= 0.380952) => interlock = '(0.5-inf)' (10.0/1.0) [3.0/0.0]
Except (freq > 0.154762) and (Grand average of hydropathicity <= 0.585714) => interlock = '(0.5-inf)' (8.0/4.0) [1.0/0.0]

weka.classifiers.rules.Ridor -F 3 -S 1 -N 2.0
score 86.2745 %

interlock = '(-inf-0.5]' (51.0/33.0)
Except (max > 2.5) => interlock = '(0.5-inf)' (20.0/0.0) [10.0/2.0]
Except (Grand average of hydropathicity <= 0.535714) and (freq > 0.154762) => interlock = '(0.5-inf)' (6.0/3.0) [2.0/1.0]

weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1
86.2745 %

(max <= 2) => interlock='(-inf-0.5]' (21.0/5.0)
=> interlock='(0.5-inf)' (30.0/2.0)

weka.classifiers.rules.DecisionTable -X 1 -S 5 -R
86.2745 %
Rules:
==================================
max interlock
==================================
'(-inf-2.5]' '(-inf-0.5]'
'(2.5-inf)' '(0.5-inf)'
==================================

=== Run information ===

Scheme: weka.classifiers.trees.NBTree
Relation: hydrotest3-weka.filters.unsupervised.attribute.Discretize-D-B2-M-1.0-R3,18
Instances: 51
Attributes: 22
str#
Grand average of hydropathicity
H
seqlen
freq
count
odd
even
max
min
mpm
mmm
odl
edl
madl
midl
mmdl
interlock
id
maxs
mpms
mmms
Test mode: evaluate on training data

=== Classifier model (full training set) ===

NBTree
------------------

freq <= 0.211111: NB 1
freq > 0.211111
| freq <= 0.366667: NB 3
| freq > 0.366667: NB 4

Leaf number: 1 Naive Bayes Classifier

Class '(-inf-0.5]': Prior probability = 0.71

str#: Discrete Estimator. Counts = 15 (Total = 15)
Grand average of hydropathicity: Discrete Estimator. Counts = 15 (Total = 15)
H: Discrete Estimator. Counts = 7 9 (Total = 16)
seqlen: Discrete Estimator. Counts = 15 (Total = 15)
freq: Discrete Estimator. Counts = 15 (Total = 15)
count: Discrete Estimator. Counts = 15 (Total = 15)
odd: Discrete Estimator. Counts = 15 (Total = 15)
even: Discrete Estimator. Counts = 15 (Total = 15)
max: Discrete Estimator. Counts = 15 (Total = 15)
min: Discrete Estimator. Counts = 15 (Total = 15)
mpm: Discrete Estimator. Counts = 15 (Total = 15)
mmm: Discrete Estimator. Counts = 15 (Total = 15)
odl: Discrete Estimator. Counts = 15 (Total = 15)
edl: Discrete Estimator. Counts = 15 (Total = 15)
madl: Discrete Estimator. Counts = 15 (Total = 15)
midl: Discrete Estimator. Counts = 15 (Total = 15)
mmdl: Discrete Estimator. Counts = 15 (Total = 15)
id: Discrete Estimator. Counts = 2 4 3 2 2 3 2 2 3 (Total = 23)
maxs: Discrete Estimator. Counts = 15 (Total = 15)
mpms: Discrete Estimator. Counts = 15 (Total = 15)
mmms: Discrete Estimator. Counts = 1 15 (Total = 16)


Class '(0.5-inf)': Prior probability = 0.29

str#: Discrete Estimator. Counts = 6 (Total = 6)
Grand average of hydropathicity: Discrete Estimator. Counts = 6 (Total = 6)
H: Discrete Estimator. Counts = 1 6 (Total = 7)
seqlen: Discrete Estimator. Counts = 6 (Total = 6)
freq: Discrete Estimator. Counts = 6 (Total = 6)
count: Discrete Estimator. Counts = 6 (Total = 6)
odd: Discrete Estimator. Counts = 6 (Total = 6)
even: Discrete Estimator. Counts = 6 (Total = 6)
max: Discrete Estimator. Counts = 6 (Total = 6)
min: Discrete Estimator. Counts = 6 (Total = 6)
mpm: Discrete Estimator. Counts = 6 (Total = 6)
mmm: Discrete Estimator. Counts = 6 (Total = 6)
odl: Discrete Estimator. Counts = 6 (Total = 6)
edl: Discrete Estimator. Counts = 6 (Total = 6)
madl: Discrete Estimator. Counts = 6 (Total = 6)
midl: Discrete Estimator. Counts = 6 (Total = 6)
mmdl: Discrete Estimator. Counts = 6 (Total = 6)
id: Discrete Estimator. Counts = 2 1 1 1 2 2 2 1 2 (Total = 14)
maxs: Discrete Estimator. Counts = 6 (Total = 6)
mpms: Discrete Estimator. Counts = 6 (Total = 6)
mmms: Discrete Estimator. Counts = 4 3 (Total = 7)


Leaf number: 3 Naive Bayes Classifier

Class '(-inf-0.5]': Prior probability = 0.08

str#: Discrete Estimator. Counts = 2 (Total = 2)
Grand average of hydropathicity: Discrete Estimator. Counts = 2 (Total = 2)
H: Discrete Estimator. Counts = 1 2 (Total = 3)
seqlen: Discrete Estimator. Counts = 2 (Total = 2)
freq: Discrete Estimator. Counts = 2 (Total = 2)
count: Discrete Estimator. Counts = 2 (Total = 2)
odd: Discrete Estimator. Counts = 2 (Total = 2)
even: Discrete Estimator. Counts = 2 (Total = 2)
max: Discrete Estimator. Counts = 2 (Total = 2)
min: Discrete Estimator. Counts = 2 (Total = 2)
mpm: Discrete Estimator. Counts = 2 (Total = 2)
mmm: Discrete Estimator. Counts = 2 (Total = 2)
odl: Discrete Estimator. Counts = 2 (Total = 2)
edl: Discrete Estimator. Counts = 2 (Total = 2)
madl: Discrete Estimator. Counts = 2 (Total = 2)
midl: Discrete Estimator. Counts = 2 (Total = 2)
mmdl: Discrete Estimator. Counts = 2 (Total = 2)
id: Discrete Estimator. Counts = 1 1 1 2 1 1 1 1 1 (Total = 10)
maxs: Discrete Estimator. Counts = 2 (Total = 2)
mpms: Discrete Estimator. Counts = 2 (Total = 2)
mmms: Discrete Estimator. Counts = 2 (Total = 2)


Class '(0.5-inf)': Prior probability = 0.92

str#: Discrete Estimator. Counts = 23 (Total = 23)
Grand average of hydropathicity: Discrete Estimator. Counts = 23 (Total = 23)
H: Discrete Estimator. Counts = 1 23 (Total = 24)
seqlen: Discrete Estimator. Counts = 23 (Total = 23)
freq: Discrete Estimator. Counts = 23 (Total = 23)
count: Discrete Estimator. Counts = 23 (Total = 23)
odd: Discrete Estimator. Counts = 23 (Total = 23)
even: Discrete Estimator. Counts = 23 (Total = 23)
max: Discrete Estimator. Counts = 23 (Total = 23)
min: Discrete Estimator. Counts = 23 (Total = 23)
mpm: Discrete Estimator. Counts = 23 (Total = 23)
mmm: Discrete Estimator. Counts = 23 (Total = 23)
odl: Discrete Estimator. Counts = 23 (Total = 23)
edl: Discrete Estimator. Counts = 23 (Total = 23)
madl: Discrete Estimator. Counts = 23 (Total = 23)
midl: Discrete Estimator. Counts = 23 (Total = 23)
mmdl: Discrete Estimator. Counts = 23 (Total = 23)
id: Discrete Estimator. Counts = 3 4 4 2 3 3 4 4 4 (Total = 31)
maxs: Discrete Estimator. Counts = 23 (Total = 23)
mpms: Discrete Estimator. Counts = 23 (Total = 23)
mmms: Discrete Estimator. Counts = 23 (Total = 23)


Leaf number: 4 Naive Bayes Classifier

Class '(-inf-0.5]': Prior probability = 0.36

str#: Discrete Estimator. Counts = 4 (Total = 4)
Grand average of hydropathicity: Discrete Estimator. Counts = 1 4 (Total = 5)
H: Discrete Estimator. Counts = 1 4 (Total = 5)
seqlen: Discrete Estimator. Counts = 4 (Total = 4)
freq: Discrete Estimator. Counts = 4 (Total = 4)
count: Discrete Estimator. Counts = 4 (Total = 4)
odd: Discrete Estimator. Counts = 4 (Total = 4)
even: Discrete Estimator. Counts = 4 (Total = 4)
max: Discrete Estimator. Counts = 4 (Total = 4)
min: Discrete Estimator. Counts = 4 (Total = 4)
mpm: Discrete Estimator. Counts = 4 (Total = 4)
mmm: Discrete Estimator. Counts = 4 (Total = 4)
odl: Discrete Estimator. Counts = 4 (Total = 4)
edl: Discrete Estimator. Counts = 4 (Total = 4)
madl: Discrete Estimator. Counts = 4 (Total = 4)
midl: Discrete Estimator. Counts = 4 (Total = 4)
mmdl: Discrete Estimator. Counts = 4 (Total = 4)
id: Discrete Estimator. Counts = 2 1 1 2 1 1 1 2 1 (Total = 12)
maxs: Discrete Estimator. Counts = 4 (Total = 4)
mpms: Discrete Estimator. Counts = 4 (Total = 4)
mmms: Discrete Estimator. Counts = 4 (Total = 4)


Class '(0.5-inf)': Prior probability = 0.64

str#: Discrete Estimator. Counts = 7 (Total = 7)
Grand average of hydropathicity: Discrete Estimator. Counts = 7 1 (Total = 8)
H: Discrete Estimator. Counts = 1 7 (Total = 8)
seqlen: Discrete Estimator. Counts = 7 (Total = 7)
freq: Discrete Estimator. Counts = 7 (Total = 7)
count: Discrete Estimator. Counts = 7 (Total = 7)
odd: Discrete Estimator. Counts = 7 (Total = 7)
even: Discrete Estimator. Counts = 7 (Total = 7)
max: Discrete Estimator. Counts = 7 (Total = 7)
min: Discrete Estimator. Counts = 7 (Total = 7)
mpm: Discrete Estimator. Counts = 7 (Total = 7)
mmm: Discrete Estimator. Counts = 7 (Total = 7)
odl: Discrete Estimator. Counts = 7 (Total = 7)
edl: Discrete Estimator. Counts = 7 (Total = 7)
madl: Discrete Estimator. Counts = 7 (Total = 7)
midl: Discrete Estimator. Counts = 7 (Total = 7)
mmdl: Discrete Estimator. Counts = 7 (Total = 7)
id: Discrete Estimator. Counts = 2 2 2 2 1 2 1 2 1 (Total = 15)
maxs: Discrete Estimator. Counts = 7 (Total = 7)
mpms: Discrete Estimator. Counts = 7 (Total = 7)
mmms: Discrete Estimator. Counts = 7 (Total = 7)



Number of Leaves : 3

Size of the tree : 5


Time taken to build model: 0.61 seconds

=== Evaluation on training set ===
=== Summary ===

Correctly Classified Instances 48 94.1176 %
Incorrectly Classified Instances 3 5.8824 %
Kappa statistic 0.8728
Mean absolute error 0.1352
Root mean squared error 0.2139
Relative absolute error 29.5009 %
Root relative squared error 44.7645 %
Total Number of Instances 51

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class
0.944 0.061 0.895 0.944 0.919 '(-inf-0.5]'
0.939 0.056 0.969 0.939 0.954 '(0.5-inf)'

=== Confusion Matrix ===

a b <-- classified as
17 1 | a = '(-inf-0.5]'
2 31 | b = '(0.5-inf)'

0 Comments:

Post a Comment

<< Home