All your AI Agents & Tools i10X ChatGPT & 500+ AI Models & Tools

PAELLADOC

PAELLADOC
Pricing: No Info
AI, Machine Learning, Image Generation, Beginner Friendly, CNN

Paella: A Simple and Friendly Text-to-Image Model

Paella is a new text-to-image model that works in a simplified space, similar to other models like StableDiffusion and MUSE. What makes Paella special is its use of a convolutional neural network (CNN) architecture. This choice makes Paella faster and able to handle larger inputs.

Key Features

Paella turns images into smaller, manageable pieces called visual tokens. During training, some of these tokens are replaced with others from a codebook. The model then tries to guess the original tokens, improving over time. The amount of change follows a simple schedule.

Sampling in Paella is easy. It starts with a random image and improves it step by step by adding less change each time. The model gives a range of possible tokens, and one is picked each time. This is repeated to create the final image.

The sampling process can be written in just 12 lines of code:

def sample(model_inputs, latent_shape, unconditional_inputs, steps=12, renoise_steps=11, temperature=(0.7, 0.3), cfg=8.0):
with torch.inference_mode():
sampled = torch.randint(0, model.num_labels, size=latent_shape)
initial_noise = sampled.clone()
timesteps = torch.linspace(1.0, 0.0, steps+1)
temperatures = torch.linspace(temperature[0], temperature[1], steps)
for i, t in enumerate(timesteps[:steps]):
t = torch.ones(latent_shape[0]) * t
logits = model(sampled, t, **model_inputs)
if cfg:
logits = logits * cfg + model(sampled, t, **unconditional_inputs) * (1-cfg)
sampled = logits.div(temperatures[i]).softmax(dim=1).permute(0, 2, 3, 1).reshape(-1, logits.size(1))
sampled = torch.multinomial(sampled, 1)[:, 0].view(logits.size(0), *logits.shape[2:])
if i < renoise_steps:
t_next = torch.ones(latent_shape[0]) * timesteps[i+1]
sampled = model.add_noise(sampled, t_next, random_x=initial_noise)[0]
return sampled

Benefits

Paella offers several advantages.

It is fast and efficient, thanks to its CNN-based design. It is simple and easy to understand, making it great for beginners. Paella can also do tasks like image inpainting and structural editing.

Use Cases

Paella is great for generating images quickly and efficiently. It can handle tasks like image inpainting and structural editing. It is also a good starting point for those new to generative AI due to its simple and understandable design.

Cost/Price

The cost or price of Paella is not mentioned.

Funding

Funding details for Paella are not provided.

Comments

Loading...