Skip to content
This repository was archived by the owner on Jun 7, 2020. It is now read-only.

Machine learning framework #8

Open
wants to merge 2 commits into
base: ml
Choose a base branch
from
Open

Machine learning framework #8

wants to merge 2 commits into from

Conversation

vineethrp
Copy link
Contributor

The two changesets implement the machine learning framework. The first commit implements the generic framework and the second one implements a linear regression module based on golearn
https://github.com/sjwhitworth/golearn

Now, to successfully run surge you would need golearn as well
$ go get github.com/sjwhitworth/golearn

The framework adds a new command line option: --learner
Two learners are implemented
"simple-learner": This module just remembers all throghput values for each command window size and then selects the best one with maximum throughput

"lr-learner": Linear regression based on golearn

Default is simple-learner

Sampe command line:
$go run cmd/ck.go -m b -vvv -servers 1 -gateways 1 -ttr 5ms -cmdwindowsz 128 --disklatencysim latency-parabola-h64-p10-k0 --learner lr-learner

@@ -196,6 +198,7 @@ func PreConfig() {
flag.BoolVar(&config.DEBUG, "d", config.DEBUG, "debug=true|false")
flag.BoolVar(&config.ValidateConfig, "validateconfig", config.ValidateConfig, "true|false. Error out on invalid config if true, else Reset to default sane values.")
flag.IntVar(&config.srand, "srand", config.srand, "random seed, use 0 (zero) for random seed selection")
flag.StringVar(&config.learner, "learner", config.learner, "Macine learning module to be used for the model")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Macine learning module/ML algorithm/

@@ -175,7 +184,8 @@ func (l *LatencyParabola) Latency(params DiskLatencyParams) time.Duration {

func (d *Disk) scheduleWrite(sizebytes int) time.Duration {
at := sizeToDuration(sizebytes, "B", int64(d.MBps), "MB")
Copy link
Member

@alex-aizman alex-aizman Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes sense to subclass Disk type for the mb and future models. Older models could then keep using the basic Disk with its basic lastIOdone. This would be safer, regression-wise. The changes in the scheduleWrite() and the need to maintain array of latencies would warrant this..

The only additional change that would probably still make sense to do on top of the above is to convert disk.lastIOdone member variable into disk.lastIOdone() method, overloaded in the disk implementation that we want for the mb model. This method would always return the properly recomputed latency based on the disk current state. And then you may not need to do any disk related logic inside NowIsDone(), which would be much cleaner.

log(LogVVV, fmt.Sprintf("LinearRegression.Fit() failed: %v", err))
return nil, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once fitting is done (via lr.Fit() above) , we have effectively a* x^2 + b * x + c polynomial expression, with a, b, and c computed by the golearn framework. And so there are two types of errors to look at: delta between the computed polynomial expression and the real throughput values for the training set, and the same delta but now for the testing set that must be kept separately from training.

Both deltas must be fairly close to zero, for us to believe that we can safely use this a* x^2 + b * x + c expression to calculate the maximum throughput.

Copy link
Member

@alex-aizman alex-aizman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants