Last week highlighted the growing scrutiny on major players in the AI sector, particularly in connection with copyright issues. Two significant legal cases emerged, focusing on Meta and OpenAI, raising pertinent questions about the implications of using copyrighted materials in the training of AI models.
In the case against Meta, recently unsealed court documents reportedly indicate employees sought to manipulate the copyright information of various materials, including e-books obtained from the controversial piracy site Library Genesis (LibGen). Notably, indications suggest that Meta officials were aware of the use of pirated content to train the Llama models. An internal communication revealed a recommendation to eliminate lines containing terms like “ISBN,” “copyright,” and “all rights reserved.” Discussions among Meta employees further showed a competitive spirit, mentioning the intention to surpass rivals such as OpenAI’s GPT-4 while downplaying French competitor Mistral as “peanuts.”
During a deposition in December, CEO Mark Zuckerberg provided insights into these practices, stating that while such broad characterisations of the use of pirated content might appear negative, he believed there were nuanced considerations involved. When asked about an engagement with companies that openly acknowledge illegal activities, he suggested such admissions would raise significant caution. He noted that even platforms like YouTube host a variety of content, including some that could infringe copyright, but indicated that a blanket ban on their use by Meta employees wasn’t feasible.
Further investigation into these documents revealed that Meta had knowledge of Llama's training data comprising materials from LibGen and resources like CommonCrawl. Despite the recognition of possible repercussions under the EU AI Act for using such datasets, some internal communications suggested strategies to mitigate risks, including ‘red-teaming’ datasets to filter out potentially harmful information.
Simultaneously, a lawsuit involving The New York Times and OpenAI has brought further focus on the copyright concerns tied to AI model training. Legal representatives for plaintiffs, including the New York Daily News—a separate entity pursuing a case against OpenAI and Microsoft—expressed alarm over the issues potentially leading to extensive copyright infringement, likening it to disabling a home security alarm system.
Amidst these legal developments, two notable partnerships were announced in the AI media space. Axios revealed its collaboration with OpenAI to fund new local newsrooms in cities such as Pittsburgh and Kansas City, alongside providing Axios staff access to OpenAI’s enterprise technology. In a similar vein, the Associated Press (AP) made headlines with its partnership with Google. This deal involves supplying real-time information to support Google's Gemini app, designed to enhance the accuracy and timeliness of news provided to global audiences.
In terms of corporate adjustments in the AI news landscape, Apple opted to suspend its AI-generated news alerts following backlash over inaccuracies in summarised notifications. Notably, a report by DoubleVerify surfaced detailing a network of over 200 websites producing inaccurate AI-generated content while posing as legitimate publishers.
On the innovation front, new developments continued to emerge from various companies in the AI sector. Anthrologic, a startup co-founded by former MediaMonks executives, launched with a focus on creating AI agents for brands. Concurrently, Adobe introduced a generative AI tool within its Firefly platform, targeting retailers with enhanced capabilities for creating personalized content.
Moreover, regulatory scrutiny is intensifying as the U.S. Supreme Court upheld restrictions on TikTok unless sold to a domestic entity, while the U.K.'s Competition and Markets Authority initiated an investigation into Google's search and advertising business, ensuring fair competition for AI startups. Meanwhile, the Federal Trade Commission forwarded its investigation concerning Snapchat’s My AI chatbot to the Justice Department, citing potential risks to young users as a primary concern.
These developments signal a dynamic period for AI automation, with ongoing legal challenges and industry partnerships shaping the future landscape of the business and media sectors.
Source: Noah Wire Services