Issues running tiny Llama quick start example

Hello,

I am learning to use nanotron. I encountered a few issues while running the example training with tiny Llama that I was able to resolve with some modification of nanotron's source code.

For context, I am running the same installation scripts as provided and I modified a bit the tiny Llama config so that I can run it with 1 GPU by reducing some parameters like the number of heads and layers. Otherwise, everything else is as suggested in the quick start guide.

1) There is an issue with `llama.py` on the following line:
https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/models/llama.py#L1095

It should be instead as in `qwen.py`:
https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/models/qwen.py#L899

2) The following code snippet in `trainer.py` doesn't work with huggingface datasets:
https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/trainer.py#L574-L590

It assumes that `update_consumption_metrics()` method exists, but it only exists for Nanosets afaik.
I had to comment that part plus the following:
https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/trainer.py#L878-L885

3) The example configs .yaml have "lighteval: null". However, this causes an issue in trainer.py:

https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/trainer.py#L1178

There should be a null check for lighteval before, similarly to the following:
https://github.com/huggingface/nanotron/blob/c737f00f01e65bc44e7624695351da7ed756ec31/src/nanotron/trainer.py#L320

With the above changes I mentioned, I was able to run a small debug training run entirely. 
While it is possible that some of the issues among those 3 I mentioned are due to incorrect configuration or script arguments, as I am still discovering how to use Nanotron, I believe other people in a similar position could face the same issues.







	if hasattr(self.current_base_dl, "dataset"):
	self.current_base_dl.dataset.update_consumption_metrics(
	start_idx=(self.iteration_step - 1)
	* self.global_batch_size, # assumes we start from iteration_step=1
	end_idx=self.iteration_step * self.global_batch_size,
	sequence_length=self.sequence_length,
	)

	# Training Logs
	# Track consumed tokens for all dataset folders in current stage
	if hasattr(self.current_base_dl, "dataset"):
	consumption_stats = self.current_base_dl.dataset.get_consumption_stats()
	current_stage = self.metadata.data_stages[self.metadata.last_stage_idx]

	# Update consumed tokens for all folders in the consumption stats
	for folder_path, stats in consumption_stats.items():
	current_stage.consumed_tokens_per_dataset_folder[folder_path] = stats["tokens"]

	# Log consumption statistics
	if hasattr(self.current_base_dl, "dataset"):
	for dataset_name, stats in self.current_base_dl.dataset.get_consumption_stats().items():
	basic_log_entries.extend(
	[
	LogItem(f"dataloader/consumed_tokens/{dataset_name}", stats["tokens"], "human_format"),
	]
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues running tiny Llama quick start example #362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues running tiny Llama quick start example #362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions