Repeated Sequence Output with llama3.cuda

## Repeated Sequence Output with llama3.cuda

I'm encountering an issue with the `llama3.cuda` repository where I'm getting a repeating sequence in the output.

**Steps to Reproduce:**

1. Cloned the repository and built it using `make`.
2. Ran the following command:

```bash
./runcuda "i have a dream"
```

**Actual Output:**

```
i have a dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream dream
Token count: 50, elapsed: 0.047000s, 1021 tokens/s
```

**Expected Output (from README):**

```
$ ./runcuda "I have a dream"
"""
I have a dream. He dreams of a big, beautiful garden full of flowers and trees. He dreams of playing with his friends and eating yummy snacks.
One day, he was walking in the garden when he saw
Token count: 50, elapsed: 0.017000s, 2823 tokens/s
"""
```

**Commit Information:**

```
commit a05278f03b0aa9ae61baeea23c33067230463ca9 (HEAD -> master, origin/master, origin/HEAD)
Author: Sang Park <sang.park@dnotitia.com>
Date:   Tue Jun 4 02:20:05 2024 +0000

    Refactor conditional statements in llama3.cu

    The if/else chain in the llama3.cu file has been refactored to a switch statement. This change makes the code easier to read and understand.
```

**GPU Information:**

* GPU Name: NVIDIA RTX 2000 Ada Generation
* Driver Version: 552.74
* CUDA Version: 12.4
* Power Usage: 16W
* Total Memory: 8188MiB


Any help in resolving this issue would be greatly appreciated. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated Sequence Output with llama3.cuda #7