How to learn LLMs - a subjective roadmap
Updated on
In this article, I explore the pragmatic approach to getting into the field of Large Language Models from the perspective of an ML practitioner with most of my prior experience in tabular data. My ultimate objective was to train a useful model to predict a salary based on the job description, using the state-of-the-art technology available. Below, I present a collection of materials (ordered according to my thought-flow) that I gathered during my journey.
Summary - the curent GPT SOTA & Roadmap
- State of GPT - a talk by Karpathy, May 2023
Winning solutions' writeups in Kaggle NLP competitions 2022-2023
The free market - let the best model win. Thousands of programmers compete for fame on a reasonable-scale infrastructure. Models that win here are the true State of the Art (given the limitations of a single A100 GPU).
- feedback-prize-2021
- us-patent-phrase-to-phrase-matching
- feedback-prize-effectiveness
- feedback-prize-english-language-learning
- chaii-hindi-and-tamil-question-answering
Papers on the most succesful models on Kaggle (the "free market") - working & cheap
Understanding the Foundations of Transformers
- The classic, foundational paper - Attention is all you need
- Intuition behind the Transformer architecture explained by Rachel Thomas and Jeremy Howard
- Key advantages of the Transformer architecture explained in 8 mins by Karpathy to Fridman
- Intro to transformers explained as an illustrative Kaggle notebook
Understanding LLMs
- Chatgpt
- LLama - An open source and free for commercial applications model from Meta
- Bard - Google response to ChatGPT powered by lamda
Other materials
- Experiments with GPT 4 - Sparks of AGI
- Karpathy - how to recreate chat gpt
- From ML to autonomus intelligence by LeCun
- Limitations of LLMs cited by Yann LeCun
- LLama available on Azure
- Reddit discussion accompanying the release of Llama 2 on the SOTA in LLMs