Skip to content

SVM connector training fails  #764

@YaYaB

Description

@YaYaB

Configuration

  • Version of DeepDetect:
    • Locally compiled on:
      • Ubuntu 14.04 LTS
      • Mac OSX
      • Other:
    • Docker
    • Amazon AMI
  • Commit (shown by the server when starting):
    073e9a1

Your question / the problem you're facing:

This issue is related to #761 and more precisely to the fix #762 .
It resolved the inference issue however now it is impossible for me to train a model using a svm connector
I joined some random data if you want to replicate.
To replicate you can just download it and extract it.
In the next section we will consider PATH_MODEL as the path where the model is stored.

Error message (if any) / steps to reproduce the problem:

Let us first create the service we will use to train:

  • list of API calls:
    curl -X PUT "http://localhost:8082/services/svm_test" -d '{
                "sname": "svm_test",
                "description": "classification model",
                "mllib": "caffe",
                "type": "supervised",
                "parameters": {
                        "input": {
                                "connector": "svm"
                        },
                        "mllib": {
                                "gpu": true,
                                "gpuid": 1,
                                "template": "mlp",
                                "nclasses": 2,
                                "ntargets": null,
                                "layers": [128,64,32],
                                "activation": "relu",
                                "dropout": 0.5,
                                "regression": false,
                                "finetuning": false,
                                "db": true
                        },
                        "output":{}
                },
                "model": {
                        "repository": "PATH_MODEL/bug_svm_prediction",
                        "templates": "../templates/caffe",
                        "weights": null
                }
        }'
  • Server log output:
DeepDetect [ commit 073e9a1f5cf5a565ee91c5a9b46a6b8b3afc19f3 ]
[2020-07-29 00:15:30.030] [api] [info] Running DeepDetect HTTP server on localhost:8082
[2020-07-29 00:16:02.615] [svm_test] [info] Using GPU 1
ETC.
ETC.
[2020-07-29 00:16:03.168] [svm_test] [info] instantiating model template mlp
[2020-07-29 00:16:03.168] [svm_test] [info] source=../templates/caffe/mlp/
[2020-07-29 00:16:03.168] [svm_test] [info] dest=PATH_MODEL/mlp.prototxt
[2020-07-29 00:16:03.170] [api] [info] 127.0.0.1 "PUT /services/svm_test" 201 556

Now we can try launching a training with an older version of DD (caaeb78).
We observe that the training is launched.

  • list of API calls:
curl -X POST "http://127.0.0.1:8082/train" -d '{
                "service": "svm_test",
                "async": false,
                "data": [
                        "PATH_MODEL/data/train.svm",
                        "PATH_MODEL/data/test.svm"
                ],
                "parameters":{
                        "input": {
                                "db": true
                        },
                        "mllib": {
                                "gpu": true,
                                "resume": false,
                                "ignore_label": null,
                                "solver": {
                                        "iterations": 1000,
                                        "snapshot": 500,
                                        "snapshot_prefix": null,
                                        "solver_type": "ADAM",
                                        "test_interval": 100,
                                        "test_initialization": false,
                                        "lr_policy": "step",
                                        "base_lr": 0.001,
                                        "gamma": 0.1,
                                        "stepsize": 100,
                                        "momentum": 0.9,
                                        "weight_decay": 0.00001,
                                        "power": null,
                                        "iter_size": 1
                                },
                                "net": {
                                        "batch_size": 1,
                                        "test_batch_size": 1
                                }
                        },
                        "output": {
                                "best": 2,
                                "measure": ["accp", "mcll", "f1", "mcc"]
                        }
                }
        }'
  • Server log output:
[2020-07-29 00:25:17.238] [svm_test] [info] Net total flops=10560 / total params=10560
[2020-07-29 00:25:17.238] [svm_test] [info] detected network type is classification
[2020-07-29 00:25:17.238] [caffe] [info] Opened lmdb PATH_MODEL/bug_svm_prediction/test.lmdb
[2020-07-29 00:25:17.244] [api] [info] 127.0.0.1 "POST /train" 201 1297

However now if we use the new version corresponding to commit 073e9a1 the train fails almost immediately.

  • list of API calls:
curl -X POST "http://127.0.0.1:8082/train" -d '{
                "service": "svm_test",
                "async": false,
                "data": [
                        "PATH_MODEL/data/train.svm",
                        "PATH_MODEL/data/test.svm"
                ],
                "parameters":{
                        "input": {
                                "db": true
                        },
                        "mllib": {
                                "gpu": true,
                                "resume": false,
                                "ignore_label": null,
                                "solver": {
                                        "iterations": 1000,
                                        "snapshot": 500,
                                        "snapshot_prefix": null,
                                        "solver_type": "ADAM",
                                        "test_interval": 100,
                                        "test_initialization": false,
                                        "lr_policy": "step",
                                        "base_lr": 0.001,
                                        "gamma": 0.1,
                                        "stepsize": 100,
                                        "momentum": 0.9,
                                        "weight_decay": 0.00001,
                                        "power": null,
                                        "iter_size": 1
                                },
                                "net": {
                                        "batch_size": 1,
                                        "test_batch_size": 1
                                }
                        },
                        "output": {
                                "best": 2,
                                "measure": ["accp", "mcll", "f1", "mcc"]
                        }
                }
        }'
  • Server log output:
{"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"./include/caffe/util/db_lmdb.hpp:15 / Check failed (custom): (mdb_status) == (0)"}}

[2020-07-29 00:20:37.494] [svm_test] [info] detected network type is classification
[2020-07-29 00:20:37.505] [svm_test] [info] Iteration 0, lr = 0.001, smoothed_loss=0.523027
[2020-07-29 00:20:37.562] [caffe] [info] Ignoring source layer prob
[2020-07-29 00:20:37.562] [svm_test] [error] Error while filling up network for testing
[2020-07-29 00:20:37.565] [svm_test] [error] training call failed
[2020-07-29 00:20:37.565] [api] [error] 127.0.0.1 "POST /train" 500 814

bug_svm_training.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions