Meta
Emu Video
Meta has released their text-to-video model Emu Video, following in the footsteps of their text-to-video model Emu released earlier in 2023. Emu Video generates images conditioned on a text prompt and then generates video conditioned on both the text and the generated image. This “factorised” or split approach to video generation lets Meta train video generation models efficiently. Meta uses two diffusion models to generate the videos and has trained the model on 35 million text-video pairs.
Emu Edit
Meta also announced Emu Edit, which can take images and edit them based on text prompts. Emu Edit uses few-shot learning to deal with the new tasks given to the model.
Pika Labs
Pika Labs released their text-to-video model Pika. At this point, there is not much detail about the model, and Pika Labs is running a waitlist for those who want to try the new model.
Google
Gemini
Google and Google Deepmind announced their new model Gemini. It comes in Ultra, Pro and Nano sizes. Google claims that Gemini Ultra slightly outperforms GPT 4 in most benchmarks. Gemini uses multimodal prompting to generate the required results.
Google simultaneously released AlphaCode 2, their new version of the code generation model.
Google has deployed Gemini Pro to be used with their Bard engine. They are also going to use the Gemini Nano model in their Pixel phone.
Apple
MLX
Apple has released MLX, a Machine Learning framework with APIs in Python and C++ as an open-source library on Github. Although Apple does not seem to have built their own AI models, MLX can run on Apple hardware and can use popular AI models like LLaMA, Mistral or Stable Diffusion.
Mistral
Mixtral
The open-source LLM developer has released Mixtral, which is an 8-expert, sparse mixture of experts (SMoE) model with about 52 billion total parameters of which 12B are active at inference. Sparse Mixture-of-Experts models can decouple model size from inference efficiency by only activating a small subset of the model parameters for any given input token. (See this paper for more details on SMoE) MoEs are neural networks that are trained on subsets of data. These expert models use a much smaller number of parameters as compared to traditional deep learning networks.
Mixtral performance results show that it is performing similarly to GPT-3.5 or Llama 2 70B.
Open Hermes 2.5
I’ve also got information about a Mistral-based model named Open Hermes 2.5 released in October 2023. It is based on training the Mistral 7B model and was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high-quality data from open datasets. It has also been given an additional 100K examples for code generation. It performs well on many benchmarks, mainly for code generation.
Distillery
Distillery is a new image generation service which has been compared to Midjourney and is open source. It has been built by a company called FollowFox. It is based on Stable Diffusion 1.5 and the latest model is called Cosmopolitan. They trained the model with Midjourney data. It is an alpha version that can be tried through a Discord invite. It gives you 10 images to create per day. The results are fine, but I have not tried detailed prompts yet.
Writesonic
Although this was released in December 2022, I just discovered it now. Writesonic is an AI company in India. It has released Chatsonic, an advanced AI chatbot which uses ChatGPT but also has a link to Google search, thus effectively searching for up-to-date data while generating text and images. In free (premium) mode, it gives the user 10000 “premium words” and uses ChatGPT 3.5. The “Superior” model targets small teams and uses ChatGPT4.