The chinchilla scaling law, originally suggested by the Google Deep Mind team, states that the number of parameters in a model should be proportional to the number of training tokens for the [[transformer]] architecture.