By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Barnard | Law Firm
  • Latest News
  • About Barnard
    • About US
    • Our Services
    • Our Team
  • Calculators
    • Transfer Costs
    • Bond Costs
    • Bond Repayments
  • Contact
Reading: AI, Copyright, and Training Data: the Legal Questions that Remain
Aa
Barnard | Law Firm
  • Latest Articles
  • About Us
  • Our Services
  • Our Team
  • Contact Us
© Barnard Incorporated. All Rights Reserved.
Barnard BriefsIntellectual PropertyNews and Insights

AI, Copyright, and Training Data: the Legal Questions that Remain

By Viteshen Naidoo 4 Min Read
Share

Deepseek, Meta and OpenAI

Copyright issues surrounding AI training data remain as complex as ever, recent developments in the AI space concerning Deepseek, OpenAI and Meta have reignited concerns over how generative AI models are trained and whether their methodologies infringe upon existing copyright protections.

Contents
Deepseek, Meta and OpenAIThe Lack of Transparency in AI Training DataThe Need for Legal and Policy Direction

OpenAI currently faces multiple legal challenges regarding its alleged use of copyrighted materials to train models like ChatGPT. Similarly, Deepseek has come under scrutiny for its approach to data acquisition and usage—ironically, by OpenAI itself.

Most recently, Meta (previously Facebook) has drawn attention by allegedly purposely stripping pirated libraries and other sources to train its models without proper attribution or licensing or any regard to copyright laws.

This scenario raises significant legal and ethical concerns and from a legal standpoint continues to dirty the waters surrounding liability and enforcement surrounding the use and application of AI training data, making it difficult to identified utilised works and additionally, complicating the enforcement of copyright protections. The complexity surrounding training data exacerbates these concerns, as companies often claim that their datasets are proprietary or obtained through ambiguous means. This lack of transparency not only hinders copyright holders from asserting their rights but also raises broader questions about fair compensation for original creators.

The Lack of Transparency in AI Training Data

One of the most pressing challenges is the lack of transparency in how AI models acquire and utilise data, making it difficult for rights holders to assess infringements or assert their claims. The EU AI Act offers a proposed solution, which imposes specific requirements on training data, particularly for high-risk AI models. Under the Act, AI developers must ensure that datasets used for training are legally obtained, traceable, and free from bias, with clear documentation on their sources.

This, however, is not the universal approach. The United States has notably taken a different stance, favouring lighter regulations to encourage AI innovation and competitiveness. Recent comments by U.S. Vice President JD Vance, warning Europe of “excessive” regulation in the AI industry, reinforce this divide. Vance cautioned that stringent regulatory frameworks like the EU AI Act could stifle technological progress, deter investment, and place European AI developers at a disadvantage compared to their U.S. and Chinese counterparts.

Given this perspective, it is highly unlikely that similar provisions, such as strict training data transparency or punitive fines—will be incorporated into U.S. AI policy in the near future.

The Need for Legal and Policy Direction

The growing body of litigation surrounding AI training data signals an urgent need for judicial clarity and legislative reform. Should these approaches continue to diverge, the landscape involving the  publication and distribution of creative works in the future will change drastically.

This may proceed to take the form of increasingly complex Technology Prevention Measures (TPM’s) being implemented increasing the cost and inaccessibility of works, or simply a greater adoption of closed access to information policies, further decreasing the accessibility of currently freely available information.

The continuing differences of opinion in approaches raises important questions for the current  manner in which works are published and distributed. Will it develop into a closed, tightly controlled ecosystem where content is locked behind proprietary barriers, or will it embrace a more open, yet legally accountable, model that balances innovation with copyright protection.

The decisions made by relevant role-players in the coming years will have drastic consequences on the accessibility, ownership, and control of digital works, ultimately shaping the future of creative expression and knowledge dissemination.

Viteshen Naidoo 28th February 2025
Share this Article
Facebook LinkedIn Email Copy Link Print
By Viteshen Naidoo
Follow:
Associate

Discuss this article with me:

Ad image

You Might Also Like

The Blame Game: Who Pays When Products Fail?

11 Min Read

Clause for Concern: Understanding the Risks in Legal Agreements

6 Min Read

How Courts in South Africa Decide on Placement for Vulnerable Children

4 Min Read

Does a ‘Father’ have a Right to Demand a Paternity Test?

4 Min Read
Tree White

© Barnard Inc. All Rights Reserved.

  • Barnard is a Level 2 BEE contributor
  • Privacy Policy
  • Careers
  • Law Students
  • Fidelity Fund Certificates
  • Testimonials
Welcome Back!

Sign in to your account

Lost your password?