Codellama generates wierd tokens with TGI 0.0.24 #704

pinak-p · 2024-09-25T20:07:37Z

System Info

Using TGI v0.0.24 to deploy the model on SageMaker

Who can help?

@dacorvo

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

I'm using the below configuration to deploy the model on SageMaker.

hub = {
    "HF_MODEL_ID": "meta-llama/CodeLlama-7b-Instruct-hf",
    "HF_NUM_CORES": "2",
    "HF_AUTO_CAST_TYPE": "fp16",
    "MAX_BATCH_SIZE": "4",
    "MAX_INPUT_TOKENS": "3686",
    "MAX_TOTAL_TOKENS": "4096",
    "HF_TOKEN": <>
}

huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-neuronx", version="0.0.24"),
    env=hub,
    role=role,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.inf2.xlarge",
    container_startup_health_check_timeout=1800,
    volume_size=512,
)

Text Generation:

predictor.predict(
    {
        "inputs": "Write a function to generate random numbers in python",
        "parameters": {
            "do_sample": True,
            "max_new_tokens": 256,
            "temperature": 0.1,
            "top_k": 10,
            
        }
    }
)

Output:

[{'generated_text': 'Write a function to generate random numbers in python stick (or (or (or (E2 (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or'}]

Expected behavior

Expectation is to get some text that is not weird and makes some sense.

The text was updated successfully, but these errors were encountered:

dacorvo · 2024-09-26T09:32:42Z

@pinak-p I reproduce your issue, both on SageMaker and locally with a 0.0.24 image.

I verified that deploying the model with neuronx-tgi 0.0.23 leads to meaningful results, so this seems to be only that version.
I also verified that I had no issue:

invoking the model generate method locally with optimum-neuron 0.0.25dev,
using a newly built 0.0.25dev image deployed locally (not on sagemaker).

dacorvo · 2024-09-26T10:10:26Z

@pinak-p this is not only a TGI issue: I also get gibberish with optimum-neuron itself, which makes me think that this is actually the same issue as the one you reported in transformers-neuronx: aws-neuron/transformers-neuronx#94.
Can you verify that the issue also happens with a vanilla transformers-neuronx model using continuous batching ?

dacorvo · 2024-10-07T07:51:14Z

@pinak-p could you check with version 0.0.25 ?

pinak-p · 2024-10-08T19:00:19Z

What's the URL for 0.0.25 ? I don't see it here https://github.com/aws/deep-learning-containers/blob/master/available_images.md ... nor does the sagemaker SDK have the version.

dacorvo · 2024-10-08T20:56:45Z

@pinak-p it is still being deployed, but you can use the neuronx-tgi docker image on an ec2 instance. https://github.com/huggingface/optimum-neuron/pkgs/container/neuronx-tgi. Alternatively, you can use directly optimum-neuron and create a pipeline (see the documentation).

pinak-p added the bug Something isn't working label Sep 25, 2024

dacorvo self-assigned this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codellama generates wierd tokens with TGI 0.0.24 #704

Codellama generates wierd tokens with TGI 0.0.24 #704

pinak-p commented Sep 25, 2024

dacorvo commented Sep 26, 2024 •

edited

Loading

dacorvo commented Sep 26, 2024 •

edited

Loading

dacorvo commented Oct 7, 2024

pinak-p commented Oct 8, 2024

dacorvo commented Oct 8, 2024

Codellama generates wierd tokens with TGI 0.0.24 #704

Codellama generates wierd tokens with TGI 0.0.24 #704

Comments

pinak-p commented Sep 25, 2024

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

dacorvo commented Sep 26, 2024 • edited Loading

dacorvo commented Sep 26, 2024 • edited Loading

dacorvo commented Oct 7, 2024

pinak-p commented Oct 8, 2024

dacorvo commented Oct 8, 2024

dacorvo commented Sep 26, 2024 •

edited

Loading

dacorvo commented Sep 26, 2024 •

edited

Loading