AI Updates-August 2024

Generative AI

Aug 31, 2024

Midjourney

Midjourney 6.1

Midjourney released their updated version 6.1. According to Midjourney, the new version has more coherent images; much better image quality; more precise, detailed, and correct small image features; new upscalers, faster speed (25% faster), improved text accuracy; a new personalization model with improved nuance, surprise, and accuracy and a new mode which takes 25% longer to (sometimes) add more texture at the cost of reduced image coherence.

Web Generation

Midjourney now allows users to create images from their website and give them 25 free images. The created images are very detailed and impressive.

Generated by Midjourney 6.1 from a prompt by the author

Google

Character.ai

Google has gone into a licensing deal with the founders of character.ai, a company running chatbots representing real and imaginary characters. Some employees will also join Google, but the company will exist in a somewhat revised structure.

Gemini Live

Google has announced that Gemini Live, which is basically a Gemini chatbot with voice capabilities similar to OpenAI’s GPT-4o is available on Android phones to Gemini Advanced subscribers and will be available soon for iOS users. It has 10 different voices to choose from. Gemini Live is going to be more integrated to the phones in future updates.

Gemini Gems and Updated Image model

Google has announced Gems, customized chatbots that can be trained on specific information. There are pre-defined Gems for specific tasks already in usable form.

They also released a new version of Imagen 3, their image generation model.

OpenAI

Advanced Voice Mode

OpenAI has enabled the Advanced Voice Mode for some ChatGPT Plus users. All Plus users will have this feature later this year.

Changes in Management

Several OpenAI top staff have left. OpenAI co-founder John Schulman has joined rival Antrhopic. Co-founder Greg Brockman has taken a sabbatical and Product Leader Peter Deng has left as well. This is the last of several management changes in the last year and the turmoil in OpenAI does not seem to end, especially after reports and analysis showing that OpenAI might be heading to bankruptcy given the high costs for its models and limited commercial return up to now.

Structured Output

OpenAI API can now return structured (JSON) output files to enable clients to get output in the JSON format they have supplied. This is possible with the new gpt-4o-2024-08-06 model OpenAI has just released. The new model ‘s also 50% cheaper in input tokens and 33% cheaper in output tokens as compared to the original GPT-4o.

Free DALL-E 3 Use

OpenAI now allows free tier users to generate two images per day with DALL-E 3. Previously, one had to have a Pro membership in order to use DALL-E 3.

GitHub

GitHub has announced GitHub Models, which allows users to develop using different AI models such as Llama 3.1, GPT-4o or Mistral Large 2 in a playground.

Stability AI

Stability AI has announced Stable Fast 3D, a model that can generate 3D asset from images. Unlike similar models that required minutes(Stability’s own model SV3D required 10 minutes), this model requires around 0.5 seconds to generate high-quality assets. It can be used for games and VR applications.

Stability has provided a figure showing how the model works.

Description of how Stability.AI’s Stable Fast 3D Works

Black Forest Labs - Flux

Black Forest Lab is a new European (German?) AI company which has just released Flux.1 in three variants (Schnell, Dev and Pro) and they claim to get better benchmarks than Stable diffusion. I tested their image generator and the results seem fine, but difficult to compare at this moment.

They also tease their next project, which is going to be text-to-video.

The Flux.1 open-weight model has 12B parameters and the claim is that it beats DALL-E 3 and Midjourney v. 6.

Image generated with flux.dev with a prompt from the author

xAI

Grok-2

xAI has released a new version of their Grok LLM named Grok-2. The model has two variants, Grok-2 and Grok-2 Mini, which are both available as Beta on the X platform.

Anthropic

Anthropic has now made Claude Artifacts available to all users, also on mobile. Artifacts are products produced during Claude chats, which the user may keep and further work on.

Back to Software Development

Discussion about this post