Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving README with more details #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ luarocks install optnet

It goes over the network and verify which buffers can be reused.
It supports both evaluation mode and training mode.
More details can be found in [details.md](details.md).

### Evaluation mode

Expand Down
65 changes: 65 additions & 0 deletions details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#How does it work?

`OptNet` contains 4 levels of memory optimization which are internally performed.
Those are:
* Addition of in-place operations
* Reuse of internal module temporary buffers
* Removal of `gradWeights` and `gradBias` for inference mode
* `output`/`gradInput` reuse once they are not needed anymore.

The following sections explain in more details each of those optimizations.

### In-place
This is a pretty basic functionality at the moment. It currently only runs over the network and sets all modules which supports in-place operations to `inplace` mode.
As it currently does not analyse the flow of computations to evaluate if the chaining of inplace operations produces correct gradients, it might be better to turn this option off if you have at the moment and manually define the in-place modules that you want to use.

### Internal buffers
Several modules like convolutions and max pooling from `nn` have internal buffers to store intermediate information, like the unfolded image or the indices of the maximum values. We use a simple heuristic for buffer sharing. We manually annotated a few modules that are mainly used and the buffers that are needed and not needed for computing correct gradients. The buffers that are only needed for the forward pass or the backward pass are then shared among the other instances of the same module in the network.

### Removal of gradients
When using networks only during inference, one does not need the gradients with respect to the weights and biases, which are by default allocated by `nn`. This function remove all gradients from the model, keeping track of sharings so that they can be exactly reconstructed when needed.

### Output/gradInput reuse
The main part of the `optnet` package, this function reuses the `outputs` (or `gradInputs` in training mode) by keeping track of the life of the storage of each `output`. Once a specific `output` is not used anymore, it adds it to a list of storages that can be reused.
In the discussion that follows, we will focus on the `inference` mode for the optimization, but a similar reasoning is used for the `training` mode.

#### Obtaining the life span of each output

The first step is then to find the first time each output is defined and last time each output is used.
Given that `nn` is so flexible and only enforces that `self.output` is defined after the forward pass and is the returned value,
the way `optnet` tries to handle new/generic modules it is to infer the structure of the network by running a forward pass.
If we are able to analyse the flow of each storage during the forward pass, then we will be able to infer the structure of the network
as well as the life cycle of each `output`.

The question is then how to track the outputs definition and use without having to write specific code for each module.
We solve this by temporarely overwriting the `updateOutput` function.
By using upvalues, we are able to keep track of the input that is fed to each module, as well as the output that is generated by each module.
Here is an example snippet that illustrates the idea that is employed:
```lua
local inputs = {}
net:apply(
function(x)
-- this is the original forward function
local orig_updateOutput = x.updateOutput
-- lets overwrite it to do some more things before/after
-- the original function is executed
x.updateOutput = function(self, input)
-- inputs is an upvalue, and we do not need to change the
-- function signature, so we can actually inspect
-- each module during forward call
table.insert(inputs, input)
print('hello from '..torch.typename(self))
return orig_updateOutput(self, input)
end
end
)
```

With that in mind, we are then able to extract for each output the first time an output is defined and the last time it is used.

#### Analysing possible storage assignments

Once we have the life span of each storage, the memory sharing will try to find the minimum number of sets of non-overlapping storages (in a temporal sense) and attribute a single storage to each of the non-overlapping sets.

#### Assigning shared storages
With the assignments from the previous section in hand, we can then change the storage of each output to match the assignments that were previously found.