Amazon
Amazon is updating their Alexa voice assistant using Anthropic’s LLM. This new version will be a paid service, while the classic Alexa will stay free.
xAI
xAI’s new supercomputer for training AI models is now online and is named Colossus. Colossus consists of 100,000 Nvidia H100 GPUs.
xAI announced that a bigger supercomputer is now being built.
SSI
Safe Superintelligence (SSI), ex-OpenAI founder Ilya Sutskever’s new initiative got $1 Billion funding to establish safe AI.
Qwen
Chinese startup Qwen released Qwen2-VL, a model that can understand long videos. There are three different models with 2B, 7B and 72B parameters.
OpenAI
OpenAI o1
OpenAI has released a new model that is especially good in reasoning. The OpenAI blog claims that this new model is good at certain reasoning tasks, although it lacks some of the features the GPT-4 models have, such as searching the web for specific information etc. Although OpenAI did not call it so, there are signs/claims that this is the so-called “Strawberry” model OpenAI (especially Sam Altman) has been hinting at. The new model come in two versions, the Preview version and the Mini version, both currently being available to ChatGPT Plus users. The Mini version will be available to the free tier soon.
Administrative Changes
OpenAI is allegedly being transitioned into a for-profit company, changing its initial status as a non-profit organization. In the meantime, their CTO Mira Murati has left the company, along with two high-level managers.
Advanced Voice Mode
The Advanced Voice Mode is now being opened to Plus and Teams tiers of paying customers. The mode now has five new voices, Memory function (ChatGPT remembering earlier chats) and better accents.
Mistral
Pixtral 12B
Mistral has released their first multi-modal model. It has a context size of 128K and can handle 1024x1024 pixel images. Mistral has mentioned that the model is good for Optical Character Recognition and Information Extraction.
Google
NotebookLM
Now, this is an outright scary new AI feature by Google.
It is possible to test this feature on the NotebookLM page. If you upload one or more documents, then Google can create a podcast based on these, using two podcast hosts and the result is amazingly realistic. I tested this based on one of my recent Substack posts, which was describing one of my newsletters and its contents at its 1st Anniversary of launching.
There were a few small inaccuracies in the podcast, but other than that, the “deep dive” was spot on and it made a lot of sense. You can listen to the podcast below.
The feature is called “Audio Overview” and uses the Gemini 1.5 multi-modal model to generate the podcast (presumably first generating a script based on the provided source documents).
Gemini 1.5 Pro
Google has updated the Gemini Pro model with price decreases and increased rate limits.
Microsoft
Copilot 2
Microsoft has announced several new features in Copilot in Copilot 2.
Copilot Pages is a new dynamic canvas for multiplayer collaboration where a team can collaborate with Copilot to collect artifacts. This works like a hub where both Copilot and human team members can create and share different artifacts for efficiency.
Functionality of Copilot has been improved in all Office 365 products. Outlook can summarize emails, Excel can now process text and numbers and it has Python embedded, Powerpoint can create presentations from prompts, Teams can do transcription during a call, Word can generate text from prompts and so on.
Copilot Agents are AI assistants designed to automate and execute business processes. Agents can be defined through the agent builder.
Kuaishou
Kling AI 1.5
Kling has released the 1.5 version of text-to-video generation tool Kling AI. It can now produce 1080p videos. It can generate 4 videos simultaneously, giving the user a chance to select.
Meta
Llama 3.2
Meta has announced a set of new versions for its open-source Llama models. Llama 3.2 comes with 4 different models. 11B and 90B are small and medium-sized multimodal vision models, whereas 1B and 3B are lightweight, text-only models.