AI Updates - October 2024

Generative AI

Oct 31, 2024

Generated by Google ImageFX from a prompt by the author

Liquid AI

Liquid Foundation Models

Liquid AI, a new MIT-spinup in Boston, has announced new foundation models. These consist of 3 models with 1.3B, 3B and 40B parameters, with the last also being an MoE (Mix of Experts) type of model. They claim that their model performs better than most models including the open-source Llama models.

The models can be tested in the Playground on their site.

Luma AI

Dream Machine 1.6 is the new version of the text-to-video generation model by Luma AI. I had previously tried an earlier version of the model and the results were not very special.

The examples on Luma’s page are good, but the prompts must have been meticulously engineered. I tried my hand at it, but was not impressed by the result, as shown below. (Just check the weird motion of the soldiers at the very end. The video also goes blurry as we use instructions like zoom out.)

Black Forest Labs

Flux 1.1 Pro

Black Forest Labs have announced the latest version of their text-to-image generation model, Flux 1.1. The claim is that the new version produces images 6 times faster and the video quality is better as compared to the previous version, Flux 1.

They have also announced the beta version of a BFL API.

I tried the beta model through together.ai and the results were not bad.

Image generated by Flux 1.1 Pro through a prompt by the author

Pika

Pika 1.5

Pika released a new version of their text-to-video generation model, Pika 1.5.

Pika lets the user use Pikaffects, such as melting, with impressive results.

I had used the 1.0 version, but the new one seems to produce much more impressive videos.

Apple

MM 1.5

Apple has released a new version of their small multi-modal models.

Apple Intelligence - Second Batch

Apple has released beta versions for new Apple Intelligence features as part of iOS 18.2 and MacOS 15.2. These include ChatGPT integration (with Siri and Writing Tools everywhere possible), Image Playground (an image generation tool generating animation and illustration-style images), Genmoji (a tool to create custom emojis) and Visual Intelligence (which is basically analysing scenes to describe images).

See this post for an in-depth analysis of Apple Intelligence features.

nVidia

NVLM

Nvidia has released their open-weights LLM, the NVLM family of models.

OpenAI

ChatGPT

OpenAI has updated ChatGPT so that it starts up with a prompt interface that looks suspiciously similar to the default Search view of Google or the recently popular AI-powered search engine Perplexity. Canvas is also a more prominent feature.

Mistral

Ministral models

Mistral has released two smaller models, Ministral 8B and Ministral 3B with 8B and 3B parameters. Mistral published results implying that their small models perform as good as some of the large models of competitors.

Anthropic

New models

Anthropic has released Claude 3.5 Sonnet New and Claude 3.5 Haiku. The interesting claim is that Claude 3.5 Sonnet New outperforms even OpenAI’s o1 model.

Computer Use

The latest Claude version allows Claude to use your computer and this is demonstrated in a simple web-site building demo, in which Claude builds a web site and solves coding problems by itself. Although experimental, this shows enormous potential for future AI coding tasks.

Midjourney

Midjourney now has an Image Editor that can take an existing image and edit it based on prompts. Open to paid memberships, this editor can also re-texture existing photos.

Adobe

Firefly Video Model

Adobe has released a previously announced video generation model to users in its waiting list. One important claim by Adobe is that its models have been trained only with material whose legal rights were acquired.

Back to Software Development

Discussion about this post