PAELLADOC

Paella: A Simple and Friendly Text-to-Image Model
Paella is a new text-to-image model that works in a simplified space, similar to other models like StableDiffusion and MUSE. What makes Paella special is its use of a convolutional neural network (CNN) architecture. This choice makes Paella faster and able to handle larger inputs.
Key Features
Paella turns images into smaller, manageable pieces called visual tokens. During training, some of these tokens are replaced with others from a codebook. The model then tries to guess the original tokens, improving over time. The amount of change follows a simple schedule.
Sampling in Paella is easy. It starts with a random image and improves it step by step by adding less change each time. The model gives a range of possible tokens, and one is picked each time. This is repeated to create the final image.
The sampling process can be written in just 12 lines of code:
def sample(model_inputs, latent_shape, unconditional_inputs, steps=12, renoise_steps=11, temperature=(0.7, 0.3), cfg=8.0):
with torch.inference_mode():
sampled = torch.randint(0, model.num_labels, size=latent_shape)
initial_noise = sampled.clone()
timesteps = torch.linspace(1.0, 0.0, steps+1)
temperatures = torch.linspace(temperature[0], temperature[1], steps)
for i, t in enumerate(timesteps[:steps]):
t = torch.ones(latent_shape[0]) * t
logits = model(sampled, t, **model_inputs)
if cfg:
logits = logits * cfg + model(sampled, t, **unconditional_inputs) * (1-cfg)
sampled = logits.div(temperatures[i]).softmax(dim=1).permute(0, 2, 3, 1).reshape(-1, logits.size(1))
sampled = torch.multinomial(sampled, 1)[:, 0].view(logits.size(0), *logits.shape[2:])
if i < renoise_steps:
t_next = torch.ones(latent_shape[0]) * timesteps[i+1]
sampled = model.add_noise(sampled, t_next, random_x=initial_noise)[0]
return sampled
Benefits
Paella offers several advantages.
It is fast and efficient, thanks to its CNN-based design. It is simple and easy to understand, making it great for beginners. Paella can also do tasks like image inpainting and structural editing.
Use Cases
Paella is great for generating images quickly and efficiently. It can handle tasks like image inpainting and structural editing. It is also a good starting point for those new to generative AI due to its simple and understandable design.
Cost/Price
The cost or price of Paella is not mentioned.
Funding
Funding details for Paella are not provided.
Comments
Please log in to post a comment.