-
Notifications
You must be signed in to change notification settings - Fork 0
Machine learning framework #8
base: ml
Are you sure you want to change the base?
Conversation
@@ -196,6 +198,7 @@ func PreConfig() { | |||
flag.BoolVar(&config.DEBUG, "d", config.DEBUG, "debug=true|false") | |||
flag.BoolVar(&config.ValidateConfig, "validateconfig", config.ValidateConfig, "true|false. Error out on invalid config if true, else Reset to default sane values.") | |||
flag.IntVar(&config.srand, "srand", config.srand, "random seed, use 0 (zero) for random seed selection") | |||
flag.StringVar(&config.learner, "learner", config.learner, "Macine learning module to be used for the model") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Macine learning module/ML algorithm/
@@ -175,7 +184,8 @@ func (l *LatencyParabola) Latency(params DiskLatencyParams) time.Duration { | |||
|
|||
func (d *Disk) scheduleWrite(sizebytes int) time.Duration { | |||
at := sizeToDuration(sizebytes, "B", int64(d.MBps), "MB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably makes sense to subclass Disk type for the mb and future models. Older models could then keep using the basic Disk with its basic lastIOdone. This would be safer, regression-wise. The changes in the scheduleWrite() and the need to maintain array of latencies would warrant this..
The only additional change that would probably still make sense to do on top of the above is to convert disk.lastIOdone member variable into disk.lastIOdone() method, overloaded in the disk implementation that we want for the mb model. This method would always return the properly recomputed latency based on the disk current state. And then you may not need to do any disk related logic inside NowIsDone(), which would be much cleaner.
log(LogVVV, fmt.Sprintf("LinearRegression.Fit() failed: %v", err)) | ||
return nil, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once fitting is done (via lr.Fit() above) , we have effectively a* x^2 + b * x + c polynomial expression, with a, b, and c computed by the golearn framework. And so there are two types of errors to look at: delta between the computed polynomial expression and the real throughput values for the training set, and the same delta but now for the testing set that must be kept separately from training.
Both deltas must be fairly close to zero, for us to believe that we can safely use this a* x^2 + b * x + c expression to calculate the maximum throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
The two changesets implement the machine learning framework. The first commit implements the generic framework and the second one implements a linear regression module based on golearn
https://github.com/sjwhitworth/golearn
Now, to successfully run surge you would need golearn as well
$ go get github.com/sjwhitworth/golearn
The framework adds a new command line option: --learner
Two learners are implemented
"simple-learner": This module just remembers all throghput values for each command window size and then selects the best one with maximum throughput
"lr-learner": Linear regression based on golearn
Default is simple-learner
Sampe command line:
$go run cmd/ck.go -m b -vvv -servers 1 -gateways 1 -ttr 5ms -cmdwindowsz 128 --disklatencysim latency-parabola-h64-p10-k0 --learner lr-learner