Aiva
Aiva released a new version of their model that can now do “layer and section in-painting” which lets users edit previously generated AIVA music in unprecedented ways. A section of the music can be selected and music generated for each layer to extend the composition. It is also now possible to add new non-generated layers (e.g. adding percussion to the composition) and the notes are generated by the model.
Kling
Kling is a new text-to-video generator from Chinese company Kuaishou. It is highlighted as a competitor to OpenAI’s upcoming model Sora and it can produce 120-second videos 1080p resolution.
Apple
Apple Intelligence
Apple announced a new AI capability named Apple Intelligence in its 2024 World-Wide Developers’ Conference. It combines on-device fundamental models and LLMs reachable through the Internet. Apple uses a semantic dictionary on its devices to analyse and categorise tasks which it then apportions to different AI-related workflows. If the user task can be accomplished on the device then it is executed on the device (typically through an enhanced Siri or new capabilities in different applications). If the task is too complicated to be run on the on-device model, then Siri asks if it can send the task to ChatGPT. Apple and OpenAI have reached an agreement that allows Apple to use the free GPT4o model for these tasks. Apple is also building a Private Cloud facility so that the information is sent to secure servers which have far more security features as compared to OpenAI servers which are publicly available to all users.
Most of the features were not available when the first Developer Beta was released, but they will be released gradually through subsequent beta versions. Apple Intelligence will require an iPhone 15 or newer phone, or an iPad or Mac with an Apple Silicon processor, to benefit from the Neural processor chips on these.
Generative AI
Apple will include text generation and image generation in a lot of its applications (even allowing third-party developers access to the same functionality through its APIs. It will also release an independent image generation application similar to OpenAI’s DALL-E.
Apple also announced that they will be able to use other Generative AI models in the future, hinting at Gemini Pro.
Writing Tools
Users can rewrite, proofread, and summarize text almost everywhere in the system. This will most likely kill a few third-party tools such as Grammarly.
Transcription
Apple will be able to record and translate incoming calls, although the calling party will be immediately informed if this is happening.
Photos
Photos will be able to use text descriptions to search for photos (this is already available in the Beta version of MacOS 15 but it is not clear if this will be using Apple Intelligence when it is available in later Betas).
Google
NotebookLM
Google has released an AI-powered note-taking application named NotebookLM. The application is powered by Google’s Gemini language model. It can be used to upload documents, take notes and provide summaries, suggest questions using the underlying Gemini LLM.
V2A
Google has announced their V2A technology, which can be used to generate audio tracks based on an existing video and textual prompts. This technology is paired with the previously announced video-generating system Veo.
V2A can generate an unlimited number of soundtracks for any video input. Optionally, a ‘positive prompt’ can be defined to guide the generated output toward desired sounds, or a ‘negative prompt’ to guide it away from undesired sounds.
Luma Labs
Luma Labs has released their text-to-video engine called Dream Machine, which can generate 120 frames in 120 seconds. It is available for free use of up to 5 video generations per day, although paid memberships have priority and no daily limits.
It got positive reviews, especially due to the smooth transition between different frames, when the original video is extended through additional text prompts.
I generated a short video through a text prompt and extended it through an additional text prompt, to check how the extension and continuity works.
Prompt: A young girl approaching a security gate with lots of guards, with a worried expression on her face,yoou can see posters depicting security threats. the girl sees a young woman walking towards the gate, runs and holds her hand. The guards approach the family and use some electronic gadgets to check whether they pose any security risks.
Well, the result is debatable. First of all, the girl in the second part of the video has changed. There are also some awkward movements from the young woman, when she stumbles as she tries to hold the hand of the girl and then hits the wall in some weird movement. Suddenly the woman morphs into a guard, a second guard materialises and the man disappears, so there are some continuity problems, but I guess there will be improvements…
Runway ML
Runway has now released the third generation of their video generation tools (Gen-3 Alpha). The result is very impressive, being reminiscent of latest tools such as Sora or Kling. Runway is introducing a set of models called General World Models. They describe this concept as a model that builds an internal representation of an environment and use this to simulate events happening in the future in that environment.
Gen-3 Alpha is not yet available and as such I could not try it out, but the samples on Runway’s web page are impressive.
NVidia
NVidia has released the Nemotron-4 340B family of LLMs. It looks like they can perform close to OpenAI GPT-4o or similar competitors with less parameters and faster training. It uses the Grouped Query Attention (GQA) method to optimise the cache, which can be quite influential in performance. Here is a paper explaining their model.
Together.ai
Scientists in together.ai have used the Mixture-of-Agents architecture to achieve GPT4-like performance using inferior models. Here is a paper explaining their approach.
Antrhropic
Anthropic has released their newest Claude Sonnet 3.5 model, reported to be on par with GPT4o and Gemini 1.5 Pro. One distinctive use of Claude is that it creates the so-called artefacts, which can then be used in a sort of collaborative environment.
Safe Superintelligence Inc.
Former OpenAI Chief Scientist Ilya Sutskever has established a new AI lab named Safe Superintelligence Inc. Not much is known about the new lab, but they are looking for people.
Lawsuits
Music Industry
Associated Press announced that Sony, Universal and Warner have launched lawsuits against Suno AI and Uncharted Labs (developer of Udio) for plagiarising work of their member musicians.
OpenAI
OpenAI has released the ChatGPT desktop application for MacOS, available to all users.
Imbue
Imbue has released a 70B-parameter model that outperforms OpenAI’s GPT-4o in zero-shot tasks. Imbue is an American company established by the founders of Dropbox.