How should this loss function be written? #3568

Nightbringers · 2023-12-17T04:58:17Z

Nightbringers
Dec 17, 2023

this is the model, it has two encoder and one decoder, call just for init, it not use in inference :

class model(nn.Module):
    dtype: jnp.dtype = jnp.float32


    def setup(self):
    

        self.encoder1 = encoder1(dtype = self.dtype)

        self.encoder2 = encoder2(dtype = self.dtype)

        self.decoder = decoder(dtype = self.dtype)

    def __call__(self, input1 ,input2,train: bool = True):
        s = self.encoder1(input1)

        s2 = self.encoder2(input2)

        out = self.decoder(s,train)

        return out

    def encode1(self, input):

        return self.encoder1(input)

    def encode2(self,input):

        return self.encoder2(input)

    def decoder(self, input, train: bool = True):
        return self.decoder(input,train)

this is i create TrainState, if it has any issue， please tell me:

class TrainState(train_state.TrainState):
  batch_stats: Any


def create_train_state(rng, learning_rate):
    agent = model(dtype=jnp.bfloat16)
    variables = agent.init(rng,input1,input2)
    params = variables['params']
    batch_stats = variables['batch_stats']
    tx = optax.adam(learning_rate=learning_rate)
    return TrainState.create(
        apply_fn=agent.apply, params=params, batch_stats=batch_stats, tx=tx)

I don't know how to write loss_fn, it have two input and two label, it should like this:
encode1(input1) -> s1 -> decoder(s1) -> predict1 -> (predict1,label1)
encode2(input2) -> s2 -> decoder(s2) -> predict2 -> (predict2,label2)

How should this train_step be written? please give me a example, thanks.

a example like this:

  """Train for a single step."""
  def loss_fn(params):
    logits, updates = state.apply_fn(
      {'params': params, 'batch_stats': state.batch_stats},
      x=batch['image'], train=True, mutable=['batch_stats'])
    loss = optax.softmax_cross_entropy_with_integer_labels(
      logits=logits, labels=batch['label'])
    return loss, (logits, updates)
  grad_fn = jax.value_and_grad(loss_fn, has_aux=True)
  (loss, (logits, updates)), grads = grad_fn(state.params)
  state = state.apply_gradients(grads=grads)
  state = state.replace(batch_stats=updates['batch_stats'])
  metrics = {
    'loss': loss,
      'accuracy': jnp.mean(jnp.argmax(logits, -1) == batch['label']),
  }
  return state, metrics```

assume this is batch: batch['input1'], batch['input2'] , batch['label1'], batch['label2'].

encode1(input1) -> s1 -> decoder(s1) -> predict1 -> (predict1,label1)
encode2(input2) -> s2 -> decoder(s2) -> predict2 -> (predict2,label2)

this is the loss:
mse_loss1 = optax.l2_loss(predict1, batch['label1'])
mse_loss2 = optax.l2_loss(predict2, batch['label2'])
loss = mse_loss1+ mse_loss2

I need the rest of full train_step

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should this loss function be written? #3568

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How should this loss function be written? #3568

Nightbringers Dec 17, 2023

Replies: 0 comments

Nightbringers
Dec 17, 2023