AI Updates-February 2024

Generative AI

Mar 04, 2024

Image generated with DALL-E 3 and depicting a software engineer working on artificial intelligence

OpenAI

GPT Mentions

OpenAI released a new feature which allows the user to type a @ to talk to one or more applications (GPTs) within a chat line and combine their functionality. Although open to limited users at the time, this might be a game changer.

SORA

OpenAI has announced a Text-to-Video model called SORA. It can produce much longer videos as compared to similar models. It is being red-team tested currently and is not open to the public yet.

Memory

OpenAI has introduced the use of memory in ChatGPT. Keeping chat prompts has now become the norm, whereas the user can opt out of using memory.

RWKV

This is an open-source initiative under the Linux Foundation. Their new model Eagle seems to match or outperform most multilingual LLMs. The distinctive feature of the RWKV models is that they are a hybrid between Recursive Neural Networks and Transformer models. This results in longer context windows and a faster performance. RWKV stands for Receptance Weighted Key Value and combines the efficient parallelizable training of transformers with the efficient inference of RNNs. You can find the theoretical paper here and there is a blog for the group here.

Google

Bard —> Gemini

Google has now re-branded their AI front-end Bard as Gemini. Gemini models were released recently, but the most powerful Ultra model has just been released in a version called Gemini Advanced and Google is offering this model through paid subscription.

ImageFX

Google has released an image tool called ImageFX which uses the Imagen 2.0 image model underneath. It is available through the Google AI Test Kitchen but only to users in the U.S, Kenya, New Zealand and Australia.

Gemini 1.5 Pro

Google launched Gemini 1.5 Pro, offering quality comparable to Gemini 1.0 Ultra while utilizing less compute power. It has a 1 million-token context window and improved understanding of all modes including video. It seems to be a Mixture-of-Experts type of model and is currently available in limited preview.

Gemma

Google has released two open-source model to be run on laptops (2B parameters) and desktops (7B parameters). Fine-tuned models are also available and have been fine-tuned on human textual interaction.

Apple

MGIE

Apple has collaborated with UC Santa Barbara to develop a model named the Multi-Modal Large Language Model (MLLM)-Guided Image Editing (MGIE). The theoretical paper can be found here. Initial results seem to indicate that by using a multimodal LLM to start with, instructions to pass to a Diffusion model to generat ethe image can be more instructive and edits can be described precisely, to yield accurate editing of images based on simple instructions.

Mistral

Mistral has released their next version of open LLM called Mistral Next. It can be accessed from here and first impressions are that it is good for matk problems.

Stability AI

Stable Diffusion 3

Stability AI has released an early preview of its Stable Diffusion 3 model for image generation. It has a combination of models with number of parameters ranging from 800M to 8B. It uses a Diffusion Transformer architecture (not all diffusion models use transformers)

Stability Video

Stability AI has released a website for the use of its previously released Stable Video Diffusion model.

Midjourney

Midjourney v.6 has added a Style option to use different styles in their image generation.

Leonardo AI

Leonardo has released a new version of their photorealistic image generation model named Lightning XL. They have also released an anime-specific model.

Back to Software Development

Discussion about this post