Machine learning framework #8

vineethrp · 2016-11-29T00:20:16Z

The two changesets implement the machine learning framework. The first commit implements the generic framework and the second one implements a linear regression module based on golearn
https://github.com/sjwhitworth/golearn

Now, to successfully run surge you would need golearn as well
$ go get github.com/sjwhitworth/golearn

The framework adds a new command line option: --learner
Two learners are implemented
"simple-learner": This module just remembers all throghput values for each command window size and then selects the best one with maximum throughput

"lr-learner": Linear regression based on golearn

Default is simple-learner

Sampe command line:
$go run cmd/ck.go -m b -vvv -servers 1 -gateways 1 -ttr 5ms -cmdwindowsz 128 --disklatencysim latency-parabola-h64-p10-k0 --learner lr-learner

alex-aizman · 2016-11-29T03:33:20Z

config.go

@@ -196,6 +198,7 @@ func PreConfig() {
 	flag.BoolVar(&config.DEBUG, "d", config.DEBUG, "debug=true|false")
 	flag.BoolVar(&config.ValidateConfig, "validateconfig", config.ValidateConfig, "true|false. Error out on invalid config if true, else Reset to default sane values.")
 	flag.IntVar(&config.srand, "srand", config.srand, "random seed, use 0 (zero) for random seed selection")
+	flag.StringVar(&config.learner, "learner", config.learner, "Macine learning module to be used for the model")



s/Macine learning module/ML algorithm/

alex-aizman · 2016-11-29T03:39:04Z

disk.go

@@ -175,7 +184,8 @@ func (l *LatencyParabola) Latency(params DiskLatencyParams) time.Duration {

 func (d *Disk) scheduleWrite(sizebytes int) time.Duration {
 	at := sizeToDuration(sizebytes, "B", int64(d.MBps), "MB")


Probably makes sense to subclass Disk type for the mb and future models. Older models could then keep using the basic Disk with its basic lastIOdone. This would be safer, regression-wise. The changes in the scheduleWrite() and the need to maintain array of latencies would warrant this..

The only additional change that would probably still make sense to do on top of the above is to convert disk.lastIOdone member variable into disk.lastIOdone() method, overloaded in the disk implementation that we want for the mb model. This method would always return the properly recomputed latency based on the disk current state. And then you may not need to do any disk related logic inside NowIsDone(), which would be much cleaner.

alex-aizman · 2016-11-29T04:01:38Z

linear_regression_learner.go

+		log(LogVVV, fmt.Sprintf("LinearRegression.Fit() failed: %v", err))
+		return nil, err
+	}
+


Once fitting is done (via lr.Fit() above) , we have effectively a* x^2 + b * x + c polynomial expression, with a, b, and c computed by the golearn framework. And so there are two types of errors to look at: delta between the computed polynomial expression and the real throughput values for the training set, and the same delta but now for the testing set that must be kept separately from training.

Both deltas must be fairly close to zero, for us to believe that we can safely use this a* x^2 + b * x + c expression to calculate the maximum throughput.

alex-aizman

Added comments

vineethrp added 2 commits November 28, 2016 17:36

Basic Machine learning framework

74d3e36

Linear Regression learner based on regression implementation in golearn

4093e9f

vineethrp assigned vineethrp and alex-aizman Nov 29, 2016

alex-aizman reviewed Nov 29, 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine learning framework #8

Machine learning framework #8

vineethrp commented Nov 29, 2016

alex-aizman Nov 29, 2016

alex-aizman Nov 29, 2016 •

edited

Loading

alex-aizman Nov 29, 2016

alex-aizman left a comment

		@@ -175,7 +184,8 @@ func (l *LatencyParabola) Latency(params DiskLatencyParams) time.Duration {

		func (d *Disk) scheduleWrite(sizebytes int) time.Duration {
		at := sizeToDuration(sizebytes, "B", int64(d.MBps), "MB")

Machine learning framework #8

Are you sure you want to change the base?

Machine learning framework #8

Conversation

vineethrp commented Nov 29, 2016

alex-aizman Nov 29, 2016

Choose a reason for hiding this comment

alex-aizman Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

alex-aizman Nov 29, 2016

Choose a reason for hiding this comment

alex-aizman left a comment

Choose a reason for hiding this comment

alex-aizman Nov 29, 2016 •

edited

Loading