Generative space refers to the set of all possible values a generative model can create. Consider a generative model to create a 4-pixel image with three color values R, G, and B. There are 4 slots and a "vocabulary" of 3 resulting in $3^4 = 81$ possibilities. A 4-word sentence with a vocabulary of 5 words provides $5^4 = 625$ possibilities. In general, the number of possibilities is the vocabulary raised to the number of dimensions. $V^D$ For multi-modal examples, for example creating an image and generating a caption, calculate the generative space for each and multiply together. For example, the generative space for a 32-pixel image with 3-channel, 256-bit color combined with 20 word caption with 10,000 word vocabulary would be $256^3 * 256^3 * 256^3 * 10000^{20}$ As you can imagine, the generative space grows quite fast for applications like language and image generation!