Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rope_scaling not implemented. Issue using deepseek-ai/deepseek-coder-6.7b-instruct #439

Open
michaelfeil opened this issue Jan 24, 2024 · 12 comments
Labels

Comments

@michaelfeil
Copy link

michaelfeil commented Jan 24, 2024

I am using the newest AMI image from yesterday, with optimum-neuronx 0.0.17 (https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) I have not tried using another image yet.

I am trying to evaluate AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct") while using torch.cpu I get the following output, for the neuron equivalent, I get \n\n\n\n * 512.

outputs_decoded=Sure, here is a simple implementation of the Quick Sort algorithm in Python:

'''python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        less_than_pivot = [x for x in arr[1:] if x <= pivot]
        greater_than_pivot = [x for x in arr[1:] if x > pivot]
        return quick_sort(less_than_pivot) + [pivot] + quick_sort(greater

Reproduction script:

from transformers import AutoTokenizer, AutoModelForCausalLM

def run_generationm_with_model(model_fn):
    model: AutoModelForCausalLM = model_fn()
    
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct")
    messages=[
        { 'role': 'user', 'content': "write a quick sort algorithm in python."}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    # 32021 is the id of <|EOT|> token
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, num_return_sequences=1, eos_token_id=32021)
    outputs_decoded = tokenizer.decode(outputs[0][len(inputs[0]):])
    print(f"model.cls={model.__class__} and inputs {tokenizer.decode(inputs[0])} outputs_decoded={outputs_decoded}")
    
    
def model_torch():
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct")
    return model

def model_neuron():
    from optimum.neuron import NeuronModelForCausalLM
    compiler_args = {"num_cores": 2, "auto_cast_type": "f16"}
    input_shapes = {
        "batch_size": 1,
        "sequence_length": 1024,
    }
    model = NeuronModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", export=True, **compiler_args, **input_shapes)
    return model

if __name__ == "__main__":
    run_generationm_with_model(model_neuron)
    run_generationm_with_model(model_torch)

Output:

2024-01-24 23:41:57.000553:  96357  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:57.000953:  96357  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_c96948172adcf9c8b465+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:58.000084:  96357  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:58.000228:  96362  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:58.000529:  96357  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_efe464d909d639de6c33+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:58.000650:  96357  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:58.000684:  96362  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_db8524dccbb520a8be0e+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:58.000790:  96362  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:58.000876:  96365  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:59.000368:  96362  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_dbb9375e305a952606f0+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:59.000368:  96357  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_8fd6c96184408083ace9+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:59.000375:  96365  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_82d24899460a51628ab8+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:59.000467:  96362  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:59.000506:  96357  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-24 23:41:59.000812:  96362  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_b028d4f002a8634d6f7c+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-24 23:41:59.000940:  96357  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.68.0+4480452af/MODULE_8a228e3b0a1ed4cce775+2c2d707e/model.neff. Exiting with a successfully compiled graph.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Both `max_new_tokens` (=512) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.
2024-Jan-24 23:42:22.0848 95923:96572 [1] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
2024-Jan-24 23:42:22.0848 95923:96572 [1] init.cc:137 CCOM WARN OFI plugin initNet() failed is EFA enabled?
model.cls=<class 'optimum.neuron.modeling.NeuronModelForCausalLM'> and inputs <|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
### Instruction:
write a quick sort algorithm in python.
### Response:
 outputs_decoded=





(@michaelfeil many \n ommitted)








Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.46it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.
model.cls=<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> and inputs <|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
### Instruction:
write a quick sort algorithm in python.
### Response:
 outputs_decoded=Sure, here is a simple implementation of the Quick Sort algorithm in Python:

``python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        less_than_pivot = [x for x in arr[1:] if x <= pivot]
        greater_than_pivot = [x for x in arr[1:] if x > pivot]
        return quick_sort(less_than_pivot) + [pivot] + quick_sort(greater_than_pivot)

# Test the function
arr = [10, 7, 8, 9, 1, 5]
print("Original array:", arr)
print("Sorted array:", quick_sort(arr))
``

This code works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted.
<|EOT|>
@jimburtoft
Copy link
Contributor

jimburtoft commented Jan 30, 2024

We have seen similar behavior with togethercomputer/LLaMA-2-7B-32K

You can also replicate the example above with this code:

#num_cores should be changed based on the instance.  inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
#larger models will need more cores.  You can make your model smaller by changing fp16 to f8.  Some models may requre num_cores to be a power of 2 
compiler_args = {"num_cores": 2, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 1, "sequence_length": 2048}

model_to_test = "deepseek-ai/deepseek-coder-6.7b-instruct"

model = NeuronModelForCausalLM.from_pretrained(model_to_test, export=True, **compiler_args, **input_shapes) 

from optimum.neuron import pipeline
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_to_test)

p = pipeline('text-generation', model, tokenizer)
p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)

Output:

Setting 'pad_token_id' to 'eos_token_id':32021 for open-end generation.
2024-Jan-30 15:04:44.0384 58491:62420 [0] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
2024-Jan-30 15:04:44.0384 58491:62420 [0] init.cc:137 CCOM WARN OFI plugin initNet() failed is EFA enabled?
[{'generated_text': 'My favorite place on earth is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is is iis is is is is is is is is isis is is is is'}] 

@michaelfeil
Copy link
Author

Could this be the effect of Rope Scaling?

@michaelfeil michaelfeil changed the title Issue using deepseek-ai/deepseek-coder-6.7b-instruct Rope_scaling not implemented. Issue using deepseek-ai/deepseek-coder-6.7b-instruct Jan 30, 2024
@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

1 similar comment
@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

@cszhz
Copy link

cszhz commented May 20, 2024

I met the same issue, is there any updated?

@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

5 similar comments
@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

@HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants