Skip to content

Is value must be linear transformed at multi-head attention?  #170

@GuoYL36

Description

@GuoYL36

In the paper, Attention is All You Need, query, key, value are linear transformed at the multi-head attention.
`

    Q = tf.layers.dense(queries, d_model, use_bias=True) # (N, T_q, d_model)

    K = tf.layers.dense(keys, d_model, use_bias=True) # (N, T_k, d_model)

    V = tf.layers.dense(values, d_model, use_bias=True) # (N, T_k, d_model)

`
And I want to know whether value must be linear transformed at multi-head attention?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions