Part 1 – Open Source AI Models: How Open Are They Really?, Legal Tech News

Time 8 Minute Read
May 19, 2025
Publication

Using an open AI model can provide significant advantages, including avoidance of licensing fees and greater control over your data. However, adoption of open source AI is not as straightforward as using conventional open source software (OSS) because an AI model has more components than just the software.

Although most generative AI models like ChatGPT and GitHub Copilot are proprietary, some AI providers have started to release AI models under an open source or similar licensing arrangement. Meta released its Llama AI model in early 2023, Google released Gemma in 2024, DeepSeek released R1 in January 2025, and there are several others. Using an open AI model can provide significant advantages, including avoidance of licensing fees and greater control over your data. However, adoption of open source AI is not as straightforward as using conventional open source software (OSS) because an AI model has more components than just the software. Building, understanding and modifying an AI model requires training data and the internal weights and parameters that are used for calculations during operation, and that is where open source AI starts to diverge from traditional OSS licensing.

Open Source Software Licensing

The OSS movement began back in the 1980s and 90s when Richard Stallman of MIT founded the Free Software Foundation and published the GPL 2.0 open source license. The Open Source Initiative (OSI) was formed in 1998 with one of its objectives to make OSS more accessible to a range of constituencies. Over the next few decades, the use of OSS increased substantially as programmers and businesses became more comfortable with the terms of the most popular OSS licenses. These include permissive licenses such as MIT, BSD and Apache 2.0, and copyleft licenses such as GPL 2.0, GPL 3.0 and Affero GPL. Permissive licenses allow users to copy, modify and distribute the source code while generally imposing only a few obligations, such as the requirement to include a copyright notice, attribution to the author, a copy of the open source license and a disclaimer of liability. Copyleft licenses such as GPL 2.0 and Affero GPL also allow users to use, modify and distribute the OSS, but they include more onerous requirements such as the obligation to distribute source code and apply the same terms to a modified or combined work, which can potentially ensnare proprietary software that is combined or linked with the OSS. These issues can be complex, but they only relate to one element—the software.

Open Source AI

Open source AI introduces more variables to the equation. In addition to source code to run the AI model, a developer who wants to reproduce or improve the AI model needs training data or at least detailed information on the training data, source code to train the model, internal weights and parameters, and documentation to describe the model architecture. Releasing only the software, as in the OSS paradigm, does not enable a developer to fully understand, reproduce and modify the AI model.

Recognizing the differences between OSS and open source AI, the OSI released an open source AI definition (OSAID) in October 2024. The objectives of the OSI in releasing the OSAID were to enable any user to use, study, modify and share the AI model for any purpose without having to request permission. See https://opensource.org/ai/open-source-ai-definition. The OSAID explains that the preferred form for making modifications to machine learning systems must include data information, code and parameters. Data information refers to detailed information about the training data that would enable a skilled person to build a substantially equivalent system. Code refers to the complete source code used to train and run the AI system. Parameters are the weights and parameters that are refined during training of the AI model and used in operation when the AI model makes its predictions.

The OSI’s objectives in enabling any user to use, study, modify and share an open source AI model will likely play an important role in accelerating the development of high quality, community generated AI models, like the OSS development process. In addition, the OSAID may facilitate greater visibility into “open washing,” i.e., the practice of releasing only one component of an AI model while claiming it to be fully open source.

The OSAID, however, may be considered a relatively high standard to meet due to its requirement to release all of the components of the AI system. Some AI providers aim to facilitate use of their AI models but have decided not to release all components at this relatively early stage in large language model (LLM) development. These considerations have led to a middle ground known as “open weights AI” which is starting to take hold.

Open Weights AI

In an open weights AI release, the AI provider releases the weights and parameters that are needed to run the AI model, but typically will not release the training data, detailed information on the training data, or the training algorithms. The rationale for an open weights release is that it provides a number of significant benefits to the user without requiring the AI provider to publicly disclose all of its trade secrets and know-how in the AI model. In particular, an open weights release enables a user to fine tune the AI model using its own additional training data to modify or add internal weights and parameters. This avoids the significant cost of training the AI model from scratch and facilitates rapid deployment of a customized AI model. These benefits may be desirable for companies needing a customized AI model that do not have the resources or expertise to build one from scratch.

On the other hand, an open weights AI release does not enable the user to fully understand, reproduce or adapt the underlying AI model, including any inherent biases, because the user does not have access to the training data or training algorithms. These limitations may be unacceptable for certain types of users such as academic researchers or government regulators who need to understand, reproduce and/or improve the model and identify any potential biases in the training data. See https://opensource.org/ai/open-weights. These limitations may also make it impossible for a developer to know whether the training process used copyrighted works without permission.

Openness of AI Models

So, what is the current landscape for some of the well-known AI models that are open source or open weights? DeepSeek released its R1 model in January 2025 under the permissive MIT license, along with a technical paper explaining its Mixture of Experts architecture. See https://arxiv.org/abs/2501.12948. It created a significant disruption in the tech industry based on claims that it performs on the same level as well-known LLMs but was trained for approximately $6 million, a fraction of the cost of training similarly performing LLMs. The release of DeepSeek R1 is considered an open weights model because DeepSeek released the weights and parameters, but not the underlying training data. Other AI models that may be considered open weights include Mistral AI’s Pixtral Large, xAI’s Grok-1 and Ali Baba’s Qwen 3.

The list of true open source AI models according to the OSI is relatively short. The OSI reports the following models as being validated for compliance with the OSAID: Pythia (Eleuther AI), OLMo (AI2), Amber and CrystalCoder (LLM360) and T5 (Google). The OSI also indicates a few others would probably pass if they changed their licenses and legal terms. See https://opensource.org/ai/faq.

Evolving Strategies

It hasn’t been long since OpenAI amazed the world with ChatGPT in November 2022, and the introduction of open source AI models and open weights AI models is even more recent. Both AI providers and AI users are looking for the most advantageous ways to deploy these resources, and in the current environment, the open weights AI models seem to strike a balance that works for some AI providers and AI users. The AI industry is at an early stage in terms of settling on standard practices for openness in releasing and licensing the various components of AI models. For those businesses and users who have the necessary resources and technical expertise, the open source and open weights AI models are a welcome development following the positive trend of open source software. For others, use of proprietary AI models makes more sense.

In addition to the technical challenges of building, fine tuning and operating open source or open weights AI models, there are other important considerations relating to the legal risks and legal terms governing use of open models vs. proprietary models. This is the subject of Part 2 in this series, which will address those legal and practical challenges and advantages of open vs. proprietary AI models.


Reprinted with permission for May 19, 2025 issue of Legaltech News. Further duplication without permission is prohibited. All rights reserved.

Related Insights

Jump to Page