Generated GPT_OSS model files through porter script. #2384

laxmareddyp · 2025-09-05T19:09:05Z

@divyashreepathihalli @mattdangerw @abheesht17 Could you please check and provide your feedback on the quality of this code generated through script.

I assume that 80-85% the code is matching and backbone files import successfully and it's possible to instantiate a backbone model. There still were some errors , which might be alleviated with a stronger model.

The converter and weight conversion scripts are still in development. Generating a workable solution is complex because it requires providing the model with a comprehensive understanding of the entire architectural layout to handle the intricate dependencies of the model's layers and weights.

Output match:

Generated response(gpt_oss_safeguard_20b_en):

Generated response(gpt_oss_20b_en):

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

divyashreepathihalli · 2025-09-08T16:25:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new GPT-OSS model, seemingly generated by a script. While the overall structure aligns with the repository's conventions, there are several critical issues that need to be addressed. The weight conversion scripts in both utils/transformers and tools/checkpoint_conversion are non-functional due to incorrect layer and weight names, likely a result of being copied from another model's implementation without proper adaptation. Additionally, there's a critical code duplication issue with the GptOssCausalLMPreprocessor class being defined in two separate files. Several style guide violations were also found, including the use of type hints in function signatures, incorrect docstring formatting, and missing docstrings or examples in several new components. These issues must be resolved to ensure the model is correct, maintainable, and usable.

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm.py

keras_hub/src/utils/transformers/convert_gpt_oss.py

tools/checkpoint_conversion/convert_gpt_oss_checkpoints.py

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_preprocessor.py

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

keras_hub/src/models/gpt_oss/gpt_oss_decoder.py

keras_hub/src/models/gpt_oss/gpt_oss_layer_norm.py

keras_hub/src/models/gpt_oss/gpt_oss_tokenizer.py

laxmareddyp · 2025-09-08T23:53:04Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the GPT_OSS model, including its backbone, causal language model task, preprocessor, and tokenizer. The overall structure aligns well with the project's conventions. However, there are several areas that need attention.

Most critically, the Hugging Face weight conversion script in convert_gpt_oss.py is incorrect and incompatible with the implemented model architecture, particularly for the Mixture-of-Experts (MoE) layers. This will prevent loading pre-trained weights correctly. I've provided a corrected implementation for the convert_weights function.

Additionally, there are multiple instances of incorrect docstring formatting across the new files, which violates the project's style guide. I've pointed these out with suggestions for correction. A minor but important clarification is also needed in a test file comment to improve maintainability.

keras_hub/src/utils/transformers/convert_gpt_oss.py

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

keras_hub/src/models/gpt_oss/gpt_oss_backbone_test.py

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm.py

keras_hub/src/models/gpt_oss/gpt_oss_decoder.py

laxmareddyp · 2025-09-10T20:49:47Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new GptOss model, including its backbone, causal language model, preprocessor, and associated tests and conversion scripts. The code is script-generated, and this review focuses on its quality, correctness, and adherence to the project's style guide.

The model definition and tests are generally well-structured. However, there are a few issues in the GptOssBackbone implementation regarding initialization and configuration that need to be addressed. The most critical issue lies in the Hugging Face conversion script (convert_gpt_oss.py), which appears to contain significant copy-paste errors from another model's converter. This will prevent correct weight loading and needs a substantial revision. My detailed comments provide specific suggestions to fix these issues.

keras_hub/src/utils/transformers/convert_gpt_oss.py

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

mattdangerw · 2025-09-10T22:10:07Z

@laxmareddyp does it work? Looks like it probably doesn't even output valid code yet right? Going by the test output.

The overall code style looks ok, so this will come down to how accurate the code is.

I would view scripts like this as a developer tool rather than an automated workflow. So maybe the thing to do here is to try getting this code end to end working. Then you can have some first hand experience on how useful this was at saving time. If there's tons of hard to find errors in the code, this might be slower than doing it yourself. If it's pretty accurate, maybe this is saving time and worth putting forward as a tool for contributors to use.

laxmareddyp · 2025-09-11T23:58:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new GptOss model, which is a significant contribution. The code is mostly well-structured and follows the project's conventions. However, I've found a few critical issues, primarily related to the tokenizer usage in tests and a bug in the weight conversion logic for the MoE layers. There are also some inconsistencies in the checkpoint conversion script and a minor point of confusion in a test file's comments. Addressing these points will be crucial for the model's correctness and maintainability.

keras_hub/src/utils/transformers/convert_gpt_oss.py

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_preprocessor_test.py

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_test.py

tools/checkpoint_conversion/convert_gpt_oss_checkpoints.py

keras_hub/src/models/gpt_oss/gpt_oss_backbone_test.py

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm.py

keras_hub/src/models/gpt_oss/gpt_oss_decoder.py

sachinprasadhs · 2025-11-12T18:12:15Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new GPT-OSS model, which is a GPT-style transformer with Mixture of Experts, generated via a script. The changes are comprehensive, including the backbone, causal LM task, preprocessor, tokenizer, and a conversion script from Hugging Face format. Additionally, the RotaryEmbedding layer has been enhanced to support YaRN scaling.

My review confirms that the new model implementation generally follows the repository's established patterns and style guide. However, I've identified a few areas for improvement. Specifically, I've noted a potential issue in the RotaryEmbedding layer's state management and opportunities to improve code clarity. I also found that several docstrings are missing required usage examples as mandated by the repository's style guide. The rest of the implementation, including the new model architecture and tests, appears solid.

gemini-code-assist · 2025-11-12T18:15:17Z

keras_hub/src/layers/modeling/rotary_embedding.py

+
+        # Store original axis values for validation
+        self._original_sequence_axis = sequence_axis
+        self._original_feature_axis = feature_axis


The self.built = True statement was removed from __init__. For layers that do not have weights that need to be built in a build() method, it's important to set self.built = True at the end of __init__ to indicate to the framework that the layer is already built. Please add it back to ensure correct layer state.

Suggested change

self._original_feature_axis = feature_axis

self._original_feature_axis = feature_axis

self.built = True

gemini-code-assist · 2025-11-12T18:15:17Z

keras_hub/src/layers/modeling/rotary_embedding.py

+            # Find correction dimensions for beta_fast and beta_slow
+            def find_correction_dim_tensor(
+                num_rotations, dim, base_val, max_pos
+            ):
+                return (
+                    dim
+                    * ops.log(max_pos / (num_rotations * 2 * 3.141592653589793))
+                ) / (2 * ops.log(base_val))


The value of pi is hardcoded here. It's better to use math.pi for precision and readability. You'll need to add import math at the top of the file.

Additionally, the nested function find_correction_dim_tensor does not depend on any instance state and could be defined as a static method on the class or a helper function outside the class for better code organization and to avoid potential JIT compilation issues.

def find_correction_dim_tensor( num_rotations, dim, base_val, max_pos ): return ( dim * ops.log(max_pos / (num_rotations * 2 * math.pi)) ) / (2 * ops.log(base_val))

gemini-code-assist · 2025-11-12T18:15:17Z

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

+    """A cached attention layer with sliding window and sink tokens.
+
+    This layer implements the attention mechanism described in the GPT-OSS
+    paper. It includes grouped-query attention, rotary position embeddings,
+    sliding window attention, and sink tokens for improved performance on
+    long sequences.
+
+    Args:
+        num_query_heads (int): The number of query attention heads.
+        num_key_value_heads (int): The number of key and value attention
+            heads.
+        rope_max_wavelength (int, optional): The maximum wavelength for the
+            rotary position embedding. Defaults to 10000.
+        rope_scaling_factor (float, optional): The scaling factor for the
+            rotary position embedding. Defaults to 1.0.
+        kernel_initializer (str, optional): The initializer for the kernel
+            weights. Defaults to "glorot_uniform".
+        sliding_window (int, optional): The size of the sliding window.
+            Defaults to 4096.
+        dropout (float, optional): The dropout rate. Defaults to 0.
+        head_dim (int, optional): Head dimension for attention. If None,
+            calculated as hidden_dim // num_query_heads. Defaults to None.
+    """


The docstring is missing a usage example and a reference citation, which are required by the style guide.¹²

Please add a simple example demonstrating how to instantiate and use this layer, and include a citation for the GPT-OSS paper.

Style Guide References

Footnotes

↩

↩

gemini-code-assist · 2025-11-12T18:15:17Z

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm.py

+    """An end-to-end GptOss model for causal language modeling.
+
+    A causal language model (LM) predicts the next token based on previous
+    tokens. This task setup can be used to train the model unsupervised on
+    plain text input, or to autoregressively generate plain text similar to
+    the data used for training. This task can be used for pre-training or
+    fine-tuning a GptOss model, simply by calling `fit()`.
+
+    This model has a `generate()` method, which generates text based on a
+    prompt. The generation strategy used is controlled by an additional
+    `sampler` argument on `compile()`. You can recompile the model with
+    different `keras_hub.samplers` objects to control the generation. By
+    default, `"top_k"` sampling will be used.
+
+    Args:
+        backbone: A `keras_hub.models.GptOssBackbone` instance.
+        preprocessor: A `keras_hub.models.GptOssCausalLMPreprocessor` or
+            `None`. If `None`, this model will not apply preprocessing, and
+            inputs should be preprocessed before calling the model.
+    """


The docstring for this task model is missing usage examples, which are required by the style guide.¹ Please add examples for generate() and fit(), similar to other CausalLM models in the repository.

Style Guide References

Footnotes

↩

gemini-code-assist · 2025-11-12T18:15:17Z

keras_hub/src/models/gpt_oss/gpt_oss_tokenizer.py

+    """A GptOss tokenizer using BytePair encoding.
+
+    Tokenizer is a subclass of `keras_hub.tokenizers.BytePairTokenizer`.
+    It uses a BytePair encoding model to tokenize strings. It also adds special
+    tokens for the start and end of a sequence.
+
+    Args:
+        vocabulary: string or dict, maps token to integer ids. If it is a
+            string, it should be the file path to a json file.
+        merges: string or list, contains the merge rule. If it is a string,
+            it should be the file path to merge rules.
+    """


The docstring is missing a usage example, which is required by the style guide.¹ Please add an example demonstrating how to use the tokenizer.

Style Guide References

Footnotes

↩

divyashreepathihalli · 2025-11-14T19:45:07Z

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

@@ -0,0 +1,351 @@
+# Copyright 2024 The KerasHub Authors


remove copyright banner

divyashreepathihalli · 2025-11-14T19:47:05Z

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

+    long sequences.
+
+    Args:
+        num_query_heads (int): The number of query attention heads.


follow type hints format from other files in the repo.

arg_name: type_hint. description

divyashreepathihalli · 2025-11-14T19:47:35Z

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

+        # b = batch size
+        # q = query length
+        # k = key/value length
+        # m = model dim


what is model dim?

divyashreepathihalli · 2025-11-14T19:49:47Z

keras_hub/src/models/gpt_oss/gpt_oss_attention.py

+                key = self.rotary_embedding_layer(key, start_index=start_index)
+            return key, value
+
+        if cache is not None:


cache logic for KerasHub is located in causal_lm file.
example:

keras-hub/keras_hub/src/models/gemma/gemma_causal_lm.py

Line 217 in c7bd26d

def _build_cache(self, token_ids):

divyashreepathihalli · 2025-11-14T19:50:15Z

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

@@ -0,0 +1,230 @@
+# Copyright 2024 The KerasHub Authors


remove banner here and everywhere

divyashreepathihalli · 2025-11-14T19:50:35Z

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

+    constructor.
+
+    Args:
+        vocabulary_size (int): The size of the token vocabulary.


fix type hit format. refer comment above

divyashreepathihalli · 2025-11-14T19:51:18Z

keras_hub/src/models/gpt_oss/gpt_oss_backbone.py

+            calculation of roatary embedding. Defaults to `1.0`.
+        layer_norm_epsilon (float, optional): Epsilon for the layer
+            normalization layers in the transformer decoder. Defaults to `1e-6`.
+        sliding_window (int, optional): The sliding window for the attention


the args order list in docstring should match init

divyashreepathihalli · 2025-11-14T19:53:23Z

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_preprocessor.py

+
+    Examples:
+    ```python
+    import tensorflow as tf


divyashreepathihalli · 2025-11-14T20:00:10Z

keras_hub/src/models/gpt_oss/gpt_oss_causal_lm_test.py

+class GptOssCausalLMTest(TestCase):
+    def setUp(self):
+        # Define vocabulary and merges inline like GPT-2 tests
+        self.vocab = ["!", "air", "Ġair", "plane", "Ġat", "port"]


use test vocab file like - Link

divyashreepathihalli · 2025-11-14T20:03:39Z

from the PR description, I dont see the generate output matching. @laxmareddyp do you need to update the result?
Can you provide a colab demo of the results and also numerics verification?

sachinprasadhs

I have looked only part of the code and made some comments.
Based on the observation, below are some of the common points which needs addressing.

Remove all the copyright text at the starting of the file.
Remove obvious comments for each code logic, only add the comments where it is necessary like complex logic or workaround is mentioned.
Avoid adding magic numbers, handle it better.

Will add more comments after I review other files.

sachinprasadhs · 2025-11-13T19:09:57Z

keras_hub/src/models/gpt_oss/gpt_oss_presets.py

+"""GPT-OSS preset configurations."""
+
+backbone_presets = {
+    "gpt_oss_8_7b_en": {
+        "metadata": {
+            "description": (
+                "32-layer GPT-OSS MoE model with 7 billion "
+                "active parameters and 8 experts per MoE layer."
+            ),
+            "params": 46702792704,
+            "path": "gpt_oss",
+        },
+        "kaggle_handle": "kaggle://keras/gpt_oss/keras/gpt_oss_8_7b_en/1",
+    },
+    "gpt_oss_instruct_8_7b_en": {
+        "metadata": {
+            "description": (
+                "Instruction fine-tuned 32-layer GPT-OSS MoE model "
+                "with 7 billion active parameters and 8 experts per MoE layer."
+            ),
+            "params": 46702792704,
+            "path": "gpt_oss",
+        },
+        "kaggle_handle": (
+            "kaggle://keras/gpt_oss/keras/gpt_oss_instruct_8_7b_en/1"
+        ),
+    },
+}


Comment this part till we have the checkpoints uploaded.

sachinprasadhs · 2025-11-14T18:44:19Z

keras_hub/src/layers/modeling/rotary_embedding.py

+
+        # Store original axis values for validation
+        self._original_sequence_axis = sequence_axis
+        self._original_feature_axis = feature_axis
+


Change this back to

self.sequence_axis = sequence_axis self.feature_axis = feature_axis

To avoid the confusion with previous implementation
and add self.built = True

sachinprasadhs · 2025-11-14T18:55:40Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        self.original_max_position_embeddings = original_max_position_embeddings
+        self.truncate = truncate
+
+        # Store original axis values for validation


Remove this comment

sachinprasadhs · 2025-11-14T18:57:39Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        if feature_axis < 0:
+            feature_axis += rank
+
+        # Validate axis indices


remove this comment

sachinprasadhs · 2025-11-14T18:59:30Z

keras_hub/src/layers/modeling/rotary_embedding.py

+            )

    def call(self, inputs, start_index=0, positions=None):
+        # Normalize and validate axes


Remove this comment

sachinprasadhs · 2025-11-14T18:59:40Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        input_shape = ops.shape(inputs)
+        sequence_axis, feature_axis = self._normalize_axes(input_shape)
+
+        # Validate rotary dimension


Remove this comment

sachinprasadhs · 2025-11-14T18:59:56Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        rotary_dim = input_shape[feature_axis]
+        self._validate_rotary_dimension(rotary_dim)
+
        # Take care of unbatched `positions`.


Remove this comment

sachinprasadhs · 2025-11-14T19:08:38Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        return sequence_axis, feature_axis
+
+    def _validate_rotary_dimension(self, rotary_dim):
+        """Validate that rotary dimension is even and handle odd dimensions."""


remove this comment

sachinprasadhs · 2025-11-14T22:01:35Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        # rotary_dim should be half of the last
+        # feature axis (HF-style: rotate pairs)
        rotary_dim = ops.shape(inputs)[feature_axis]
+        # Validate evenness
+        try:
+            # best-effort check when running eagerly;
+            # if unavailable this will be a no-op
+            if int(rotary_dim) % 2 != 0:
+                raise ValueError(
+                    "Rotary embedding requires even feature "
+                    "dimension (last axis)."
+                )
+        except Exception:
+            pass
+
+        # Get inverse frequencies using the appropriate
+        # scaling method (linear, dynamic, yarn, etc.)


Let's not use try except block, can't we just use _validate_rotary_dimension function?
Also, remove the comments.

sachinprasadhs · 2025-11-14T22:58:15Z

keras_hub/src/layers/modeling/rotary_embedding.py

+        # YaRN temperature scaling: implement in tensor ops
+        if self.rope_type == "yarn":
+            # t = (0.1 * ln(s) + 1)^2
+            # make sure s > 0
+            small = ops.cast(1e-6, self.compute_dtype)
+            s_safe = ops.maximum(
+                ops.cast(self.scaling_factor, self.compute_dtype), small
+            )
+            t = ops.square(
+                ops.add(
+                    ops.multiply(
+                        ops.cast(0.1, self.compute_dtype), ops.log(s_safe)
+                    ),
+                    ops.cast(1.0, self.compute_dtype),
+                )
+            )
+            sqrt_t = ops.sqrt(t)
+
+            # HF/YaRN descriptions indicate a temperature
+            # scaling applied to cos/sin embeddings, equivalently
+            # scaling the logits.We implement the sqrt scaling on cos/sin.
+            cos_emb = cos_emb * sqrt_t
+            sin_emb = sin_emb * sqrt_t


Avoid variables names like t, small etc, give meaningful variable names.

Test GPT_OSS files through porter

26867ba

laxmareddyp requested review from abheesht17, divyashreepathihalli and mattdangerw September 5, 2025 19:09

laxmareddyp added 4 commits September 6, 2025 09:16

generate API and moved files to respective folders

f1c055b

Fix format issues

d4da96c

Add gpt_oss to preset loader and Fix format issues

b14cfb5

Add gpt_oss to preset loader

b675610

gemini-code-assist bot reviewed Sep 8, 2025

View reviewed changes

generated files through 2.5-pro model

8cf71ce

gemini-code-assist bot reviewed Sep 8, 2025

View reviewed changes

Format fix

2242ef4

gemini-code-assist bot reviewed Sep 10, 2025

View reviewed changes

laxmareddyp added 2 commits September 11, 2025 16:46

Add converter, RoPE update

eb25d19

Fix format

ba50a9f

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

Fix BPE tests

1854d80

laxmareddyp removed request for abheesht17, divyashreepathihalli and mattdangerw September 12, 2025 02:02

laxmareddyp added 4 commits September 11, 2025 22:21

Merge branch 'keras-team:master' into test_gpt_oss_model

76139cd

Merge branch 'keras-team:master' into test_gpt_oss_model

00ec305

Update converter

9447990

Fix converter, checkpoints conversion and attention

340aa85

laxmareddyp added the WIP Pull requests which are work in progress and not ready yet for review. label Sep 24, 2025

laxmareddyp added 9 commits September 24, 2025 11:43

Merge branch 'keras-team:master' into test_gpt_oss_model

b02cfea

Fix the parameter count and debug code

47dcdda

Add dequantization logic to converter

5e16f80

Merge branch 'keras-team:master' into test_gpt_oss_model

79c5664

Add YaRN support,Fix Serialisation,Fix dequantization

59b6930

Merge branch 'keras-team:master' into test_gpt_oss_model

8d3a658

Fixed several pytest tests

d9396c6

Address gpt_oss_causal_lm tests

4a63e85

Fix format issues

285253f

laxmareddyp marked this pull request as ready for review November 12, 2025 02:23

laxmareddyp removed the WIP Pull requests which are work in progress and not ready yet for review. label Nov 12, 2025

laxmareddyp requested review from divyashreepathihalli and mattdangerw November 12, 2025 14:19

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

divyashreepathihalli reviewed Nov 14, 2025

View reviewed changes

sachinprasadhs reviewed Nov 14, 2025

View reviewed changes

	self._original_feature_axis = feature_axis
	self._original_feature_axis = feature_axis
	self.built = True

Generated GPT_OSS model files through porter script. #2384

Are you sure you want to change the base?

Generated GPT_OSS model files through porter script. #2384

Conversation

laxmareddyp commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

divyashreepathihalli commented Sep 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laxmareddyp commented Sep 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laxmareddyp commented Sep 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented Sep 10, 2025

Uh oh!

laxmareddyp commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sachinprasadhs commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Style Guide References

laxmareddyp commented Sep 5, 2025 •

edited

Loading

sachinprasadhs left a comment •

edited

Loading

sachinprasadhs Nov 14, 2025 •

edited

Loading