Meta AI: How the social media giant uses your public posts to train its virtual assistant


Meta, the company formerly known as Facebook, has recently launched a new virtual assistant called Meta AI, which can help users create digital stickers, edit photos, and chat with AI personalities. But how does Meta AI learn to do these tasks? According to Meta’s president of global affairs, Nick Clegg, the answer is: by using your public Instagram photos and Facebook posts.

Meta AI: A new assistant powered by public data

Meta AI is a new assistant that users can interact with like a person, available on WhatsApp, Messenger, Instagram, and coming soon to Ray-Ban Meta smart glasses and Quest 3. It is powered by a custom model that leverages technology from Llama 2 and Emu, Meta’s latest large language and text-to-image models. In text-based chats, Meta AI has access to real-time information through its search partnership with Bing and offers a tool for image generation.

Meta AI can also help users create their own digital stickers based on text prompts, edit photos with text instructions, and chat with AI personalities like the rapper Snoop Dogg acting as a “Dungeons & Dragons” Dungeon Master. These features are meant to showcase the potential of Meta AI as a creative and conversational assistant that can enhance users’ online experiences.

How Meta AI uses public posts to learn

To train Meta AI, the company used public Facebook and Instagram posts that include both text and photos, Clegg told Reuters in an interview. He said that the company tried to exclude datasets that have a heavy preponderance of personal information, and that the AI did not touch private posts shared only with friends and family, or private messages on its platforms.

How the social media giant uses your public

Clegg explained that the public photos were used to train Emu, the text-to-image model that can generate realistic images from text descriptions. For example, if a user asks Meta AI to create a sticker of a unicorn wearing sunglasses, Emu will use the public photos to learn how to draw a unicorn and sunglasses, and then combine them into a new image.

The public text posts were used to train Llama 2, the large language model that can understand and generate natural language. For example, if a user asks Meta AI to chat with Snoop Dogg, Llama 2 will use the public posts to learn how to mimic Snoop Dogg’s style and vocabulary, and then generate responses that sound like him.

The privacy and legal implications of using public posts

While Meta claims that it respects users’ privacy and does not use private posts or messages to train its AI, some users may still feel uncomfortable knowing that their public posts are being used for this purpose. They may wonder if Meta has their consent to use their posts, and if they have any control over how their posts are used.

Moreover, there may be legal challenges to using public posts to train AI, especially if the posts contain copyrighted content. Clegg acknowledged that he expects “a fair amount of litigation” to determine whether using copyrighted materials to train AI is protected by fair use. He said that Meta thinks it is, but he strongly suspects that this will play out in court.

Meta’s terms of service state that users own the content that they post as long as it does not infringe on someone else’s intellectual property rights. However, the terms also grant Meta a license to use, copy, distribute, and modify the content for its services. This may include using the content to train its AI, but it is unclear if users are aware of this or agree to this.


Please enter your comment!
Please enter your name here