Generative space refers to the set of all possible values a generative model can create.
Consider a generative model to create a 4-pixel image with three color values R, G, and B. There are 4 slots and a "vocabulary" of 3 resulting in $3^4 = 81$ possibilities.
A 4-word sentence with a vocabulary of 5 words provides $5^4 = 625$ possibilities.
In general, the number of possibilities is the vocabulary raised to the number of dimensions.
$V^D$
For multi-modal examples, for example creating an image and generating a caption, calculate the generative space for each and multiply together. For example, the generative space for a 32-pixel image with 3-channel, 256-bit color combined with 20 word caption with 10,000 word vocabulary would be
$256^3 * 256^3 * 256^3 * 10000^{20}$
As you can imagine, the generative space grows quite fast for applications like language and image generation!