Midjourney
Midjourney 6.1
Midjourney released their updated version 6.1. According to Midjourney, the new version has more coherent images; much better image quality; more precise, detailed, and correct small image features; new upscalers, faster speed (25% faster), improved text accuracy; a new personalization model with improved nuance, surprise, and accuracy and a new mode which takes 25% longer to (sometimes) add more texture at the cost of reduced image coherence.
Web Generation
Midjourney now allows users to create images from their website and give them 25 free images. The created images are very detailed and impressive.
Google
Character.ai
Google has gone into a licensing deal with the founders of character.ai, a company running chatbots representing real and imaginary characters. Some employees will also join Google, but the company will exist in a somewhat revised structure.
Gemini Live
Google has announced that Gemini Live, which is basically a Gemini chatbot with voice capabilities similar to OpenAI’s GPT-4o is available on Android phones to Gemini Advanced subscribers and will be available soon for iOS users. It has 10 different voices to choose from. Gemini Live is going to be more integrated to the phones in future updates.
Gemini Gems and Updated Image model
Google has announced Gems, customized chatbots that can be trained on specific information. There are pre-defined Gems for specific tasks already in usable form.
They also released a new version of Imagen 3, their image generation model.
OpenAI
Advanced Voice Mode
OpenAI has enabled the Advanced Voice Mode for some ChatGPT Plus users. All Plus users will have this feature later this year.
Changes in Management
Several OpenAI top staff have left. OpenAI co-founder John Schulman has joined rival Antrhopic. Co-founder Greg Brockman has taken a sabbatical and Product Leader Peter Deng has left as well. This is the last of several management changes in the last year and the turmoil in OpenAI does not seem to end, especially after reports and analysis showing that OpenAI might be heading to bankruptcy given the high costs for its models and limited commercial return up to now.
Structured Output
OpenAI API can now return structured (JSON) output files to enable clients to get output in the JSON format they have supplied. This is possible with the new gpt-4o-2024-08-06 model OpenAI has just released. The new model ‘s also 50% cheaper in input tokens and 33% cheaper in output tokens as compared to the original GPT-4o.
Free DALL-E 3 Use
OpenAI now allows free tier users to generate two images per day with DALL-E 3. Previously, one had to have a Pro membership in order to use DALL-E 3.
GitHub
GitHub has announced GitHub Models, which allows users to develop using different AI models such as Llama 3.1, GPT-4o or Mistral Large 2 in a playground.
Stability AI
Stability AI has announced Stable Fast 3D, a model that can generate 3D asset from images. Unlike similar models that required minutes(Stability’s own model SV3D required 10 minutes), this model requires around 0.5 seconds to generate high-quality assets. It can be used for games and VR applications.
Stability has provided a figure showing how the model works.
Black Forest Labs - Flux
Black Forest Lab is a new European (German?) AI company which has just released Flux.1 in three variants (Schnell, Dev and Pro) and they claim to get better benchmarks than Stable diffusion. I tested their image generator and the results seem fine, but difficult to compare at this moment.
They also tease their next project, which is going to be text-to-video.
The Flux.1 open-weight model has 12B parameters and the claim is that it beats DALL-E 3 and Midjourney v. 6.
xAI
Grok-2
xAI has released a new version of their Grok LLM named Grok-2. The model has two variants, Grok-2 and Grok-2 Mini, which are both available as Beta on the X platform.
Anthropic
Anthropic has now made Claude Artifacts available to all users, also on mobile. Artifacts are products produced during Claude chats, which the user may keep and further work on.