If you are already wondering why we are still talking about "predicting one next word", while the same ChatGPT answers with whole longreead - do not rack your brains. Language models easily generate long texts, but they do it on a word by word basis. Again, greatly simplifying, after generating each new word, the model re-runs through itself all the previous text along with the addition just written, and puts the next word, and the result is a coherent text.
GPT-2 was released in 2019 and surpassed GPT-1 both in terms of the amount of training text data and the size of the model itself by 10 times. Such a quantitative growth led to the fact that the model suddenly self-learned qualitatively new skills: from writing long essays to solving tricky problems that require building a picture of the world.