The next chapter in Karpathy's tutorial explains how to reproduce a model closely resembling #OpenAI's original #GPT2.
...but I'm *NOT* trying this on a desktop with a single GPU. The README informs us that this training takes about 4 days on a beefy node with 8 x A100 40GB. Nope!
https://github.com/karpathy/nanoGPT?tab=readme-ov-file#reproducing-gpt-2
#AI #LLM #GPT