A comprehensive comparison of AI service licenses and tools for coding and business building.

Are you a business owner and developing software using AI tools? Have you considered whether, by submitting data to an AI model or service, you’re transferring sensitive business data to a third party with your consent for further processing? Or perhaps the software house you work with uses such tools – is this software house transferring code to AI models that can learn from your business without your consent?

These are just examples – there are many scenarios in which you create software that supports your business, or even serves as the foundation of your business, but using artificial intelligence. The pace of AI development is currently unresolved worldwide. Similarly, there are many issues not only with copyright regarding generated content (code, sounds, images), but also – something the world seems to have forgotten lately – how data is trained (who remembers that Meta trained its models using illegal torrent books?). People also forget about the very fact that AI has access to the data you share with it (just think of Google’s desire to give Gemini access to your email and calendar). The topic is very broad, so in this article we will focus solely on the business aspects of programming, code copyright, and the processing of data (in this case: everything provided in the prompts) by models for the purpose of learning.

Problematic cases of AI use in business

Case 1 – authorship of the generated code

The first issue I want to address is AI-generated code. Let’s look at it from the perspective of a developer who can feed various data sets into an AI model:

  • A prompt containing a verbal description of the required functionality
  • A prompt containing pseudocode to be rewritten into functional code.
  • A prompt containing part or all of the repository (so that the AI has access to context and a better understanding of what to write) with instructions to add some functionality
  • A prompt containing a description of the business and its strategic elements (so that the AI understands the context) asking you to write a function

Regardless of the fact that the quality of the AI’s response will vary in each case, each of these queries will yield ready-made functionality. But a question arises: who is the author of the generated code? The business owner, the programmer, or the AI? This question is crucial because only the code’s author has the right to transfer it to another person. If we assume that the programmer is not the author, then they have no right to transfer such code to a contractor.

Case 2 – what do we pass to AI

The previous example contains another hidden aspect: what happens to the data sent to the AI model, effectively a third-party company? While some of these prompts may not raise any concerns, others can prove to be very disastrous. Consider the following types of queries:

  • “Create a function that will calculate the power of the number given in the parameter”
  • “I’m developing software with integration with a healthcare system. Create a function that will redirect the user from the payment panel to the prescription refill panel.”
  • “I have an idea for a product: [product description]. Your assignment is to create a code prototype for this business.”
  • “I’m sending you my current code. Please check it for security.”
  • “I’m sending you my user database, which is running slowly. Suggest optimizations that will speed up queries.”

I think, as a careful reader, you already know where I’m heading. While in the first example I’m passing on publicly available knowledge, with each subsequent example I’m passing on increasingly sensitive knowledge to the model, even user data (including, by implication, so-called PIDs – Personal Identification Data – which falls under the GDPR). Where should we draw the boundaries of what can be passed on to the model? The short answer will be unsatisfactory: it depends – and requires further analysis, which I will, of course, conduct later in this article.

Case 3 – who is responsible for errors?

Let’s fast forward for a moment and consider the third element – liability. Let’s say we’ve resolved the first two aspects and the code has been legally deployed to a production server. But then a problem arises – a security flaw – data leaks, the business is sued, and a problem arises:

  • The business claims that the software house is to blame for the error.
  • Software house places responsibility on developer who – perhaps in an unauthorized way – used AI to create dangerous code
  • The developer claims that the error is not his fault, but rather that it is either the tester who did not detect the error or the AI provider who “produced” the code based on his query.

Regardless of the court’s verdict, can we determine where responsibility actually lies? That’s why it’s so important for software houses to have standards and procedures in place for working with AI tools, just like at Sailing Byte, where developers know what they can use and to what extent.

Beyond responsibility itself, there are other security aspects. A model may respond by using old (potentially dangerous) design patterns. A model may also produce code that is incompatible with the security of the architecture in which the design is implemented.

Case 4 – License Risk

Currently, AI models can search for information on the internet (this is a mental shortcut, but the actual effect of this mechanism is important). Various information sources and source code are available online under various licenses:

  • GPL/AGPL licensed repositories – which require open source code for derivatives
  • Apache/MIT/BSD licensed repositories – which in most cases can be used without harm in commercial projects
  • Creative Commons licensed materials – which require, for example, attribution to the author
  • Code leaks or code fragments of commercial projects that are fully copyrighted

AI can leverage “internal” knowledge and “external” sources to produce developer-ready code. However, the developer may not even be aware that some code fragments may violate licensing terms. The long-term consequences for businesses are easily imagined.

Similar examples can be multiplied, but what is crucial at this point is to analyze what can be done about this problem.

Case Summary

Below is a brief summary of these four cases in tabular form.

Axis problemMain risks For businessRegulation between business and software houseRegulated in the AI model license?
Who is the author and who can transfer rights to the code?Inability to effectively transfer rights, authorship dispute, limited IP protectionProvisions in agreements on the use of AI, the issue of authorship and the transfer of copyrightYes (declarations regarding output rights)
What data can be sent to an external model and where should the boundaries be drawn?Disclosure of trade secrets, violation of GDPR, leakage of strategic informationPolitics: What not to include in AI, data maskingPartially (processing rules, no training on data)
Who is responsible for damage caused by incorrect code?Contractual and tort liability for errors, security incidents, customer and regulatory claimsQA and security procedures, AI usage logging, and liability clauses in contractsPartially (limitation of AI provider liability)
What code does AI output “come from” and what licenses can it be attached to?Unintentional license violations, code redistribution, copyright disputesLicense control processes, AI code tagging, licensing risk and indemnity clauses in contractsPartially (declarations and indemnification, but not full elimination of risk)

As you can see from this summary, two aspects emerge that business owners and software house owners should consider: first, what the contract with the client should include regarding the use of AI in the project, and second, what licensing provisions should AI contain so that the software house can use it? While the first element lies firmly between the software house and the business and would be difficult to analyze here, we can consider the second case and compare AI licenses and what can and cannot be achieved, depending on the provider used.

Providers of models and development tools

The Role of HuggingFace in Model Licensing Analysis

HuggingFace is a leading platform that provides access to and the ability to host numerous open-source models. In addition to the base models, it also includes a multitude of so-called “fine-tuned” models and their quantizations (which allow models to run on hardware even half as powerful, with minimal loss of quality). However, other aspects of HuggingFace aside, we’re interested in the licenses available on the portal – in order of preference, these are:

  • Apache2.0 (e.g. Flux2)
  • MIT (e.g. GLM or DeepSeek-OCR)
  • OpenRail (e.g. Supertonic-2 TTS)
  • CC-by-NC or CC-BY-SA (for example NLLB-200)
  • Llama (in various versions, of course Llama models and derivatives)
  • Gemma (Gemma models and derivatives)

When analyzing a specific selected model that uses other models – for example MoE (Mixture of Experts) or is a fine-tuned model, it is worth checking not only the license of the model itself but also the licenses of the models on which it is based – it may turn out that the model license was selected incorrectly.

Hugging Face has the advantage of also providing so-called “uncensored” models, which may be important in some business applications.

Generic AI Providers

The first category to analyze are the major AI vendors we know of, for example:

  • ChatGPT by OpenAI
  • Gemini by Google
  • Claude from Anthropic

It should be noted, however, that the terms of use of the model may differ depending on whether we use the model “directly” (for example, via chat on the ChatGPT website) or via an API key (where there may be additional settings – for example, for the European region!) . If the differences are significant for the scope I checked here, a separate entry will appear in the “AI API Providers” section.

AI API Providers – Model as a Service

This category includes all types of providers that provide APIs for model use, but only when the conditions differ for the market being studied. For example, an entry for the OpenAI API will appear here, but not for Perplexity, as the terms of use for the available API are practically the same as for a “regular” client.

It is worth noting that while most providers use industry standards (such as MCP or OpenAI API), there are exceptions that are not always compatible (such as Perplexity).

AI Assistants for Developers and Code Analysis

Of course, besides general-purpose models (which can be used not only for coding but also for business analysis within a company), there are also dedicated tools. One might instinctively suspect that the terms of use for such models might be tailored to code processing and use – but we’ll check that out. This group includes, among others:

  • Github Copilot
  • Claude Code
  • Cursor

For this purpose, there are tools that analyze the code and are not a programmer’s “assistant”, for example:

  • Sourcegraph Cody
  • Greptile

General Rules for Using Downloadable AI Models

Let’s consider methods for using downloadable models—that is, all the models you can find on HuggingFace and install “somewhere.” Models come in different sizes—the general rule is that the larger the model, the better, but it also requires better hardware. Not every model can run on home hardware, and the cloud isn’t always suitable for business. So:

  • Relatively small models can be run on a high-powered home PC, so technically, the programmer could have them on their own computer. In this case, most of the issues I’m addressing in this article don’t arise.
    • I described the “homemade” LLM tools in this article
  • For larger models there are three main options:
    • Or renting processing power (for example, graphics cards) by the hour from a service provider (such as OVH) – which, however, requires the ability to install such models
    • Or model hosting services (such as those provided by HuggingFace) – which, however, requires analyzing the data processing conditions for a given service
    • Or using intermediaries to host LLMs (such as OVH Public Cloud AI Endpoints or LiteLLM).
  • There is also an option to purchase hardware (for example, Nvidia computing graphics cards) and host it yourself – but this is an extreme solution for specific cases that I will not consider here

Such solutions, of course, provide the greatest possible privacy. Will the costs of maintaining such tools justify hosting, maintaining, and managing them yourself? This is something each business must calculate for itself, as it depends heavily on scale and usage.

Read which tool is best for hosting local models at: https://sailingbyte.com/blog/the-ultimate-comparison-of-free-desktop-tools-for-running-local-llms/ .

Because these models are downloadable, I’ve dedicated a separate section to them, specifically to analyzing the licensing of the models themselves. However, an alternative is to host such models in the cloud—for example, OVH AI Endpoints or Google Vertex AI—and these services are compared to the other services mentioned here.

A brief analysis of the licenses of some of the available models

It’s important to distinguish between the fact that when using an AI tool, we should comply with several licensing “layers.” One layer is the provider (which we compare later in this article), and the second layer is the model itself. While it doesn’t make much sense to analyze these licenses for things like GDPR compliance, it does make sense to analyze input and output rights. I conducted such an analysis, and here are the results:

License /ModelOutput RightsResponsibility legalAdditional comments
Apache 2.0 (Flux.2)Full output rights​“AS IS”, missing warrantyCommercial OK with license retention
MIT (DeepSeek-OCR)Full output rights​“AS IS”​The most permissive
OpenRAIL -MUser keeps lawUser respondsPass it on “responsible use” restrictions
CC-BY-NC (NLLB-200)Just non-commerciallyBreach = loss licenseSaaS/API Ban
Llama LicenseCommercial OK, no competition training allowed“AS IS”, >100M users = llama consentRequires “Built with Llama”
Gemma LicenseOutput OK, prohibit Model Derivatives“AS IS”Pass it on restrictions
Commercial (Gemini3, GPT5.2, Claude4.5, Grok4.1)Output rights (API only)Vendor no respondsNo weight modifications

Example scenarios for these licenses:

License or modelFine-tuning for SaaS clientsModel-Based API SalesTraining your own LLM on outputsRedistribution modified model
Apache 2.0 (Flux2)✅ (from NOTICE)
MIT (GLM, DeepSeek-OCR)✅ (with copyright)
OpenRAIL (Supertonic-2)⚠️ ( move clauses )⚠️ ( move clauses )⚠️ (responsible use)⚠️ ( move clauses )
CC-BY-NC/SA (NLLB-200)❌ (NC blocks commercialism )⚠️ (share-alike on data ?)
Llama (Llama3+)✅ (up to 700M users /month)❌ ( only for Llama-der.)⚠️ (from “Built with Llama”)
Gemma (Gemma2+)⚠️⚠️ ( move restrictions )❌ (distillation ban .)⚠️ ( move restrictions )
Gemini 3 (Google)~~ ( via API only )~~
GPT-5.2 (OpenAI)~~ ( via API only )~~
Claude Sonnet 4.5 (Anthropic)~~ ( via API only )~~
Grok 4.1 ( xAI )~~ ( only via API/ xAI )~~

As you can see, the most open models for commercial use are those based on the Apache, MIT, and in some cases Llama and Gemma2 licenses. However, in cases of using “fully commercial” models, the best option may be to use an intermediary that is connected to (or provides) the API of a given model.

Evaluation of suppliers for problematic aspects

For selected providers, I collected data regarding data processing location, authorship, and liability. A question that’s practically impossible to answer simply is “licensing risk”—every model and every solution that uses internet resources will be subject to this risk. In my opinion, there’s a business loophole here that could even be exploited by creating a tool to verify code legality (I’m sure such tools exist, but that’s beyond the scope of this article).

Moreover, each provider had to be considered from several perspectives: a regular user, an enterprise user, and API access (of course, where available) – hence the target table is so large that I had to divide it into sections to make it easier to read.

General Note on Enterprise Plans

Virtually every vendor analyzed has some form of “enterprise” option that is “negotiable,” but few vendors specify what to expect from the outset. Therefore, if you’re looking for an “enterprise” or “team” solution, unfortunately, it ‘s likely necessary to contact the vendor directly for a specific, detailed offer. Personally, I believe a vendor’s paid option can be used as a benchmark, based on the principle that the same approach is taken with “smaller” vendors, a similar approach can be expected with “larger” vendors. I mention this because, in this setting, each item in the table related to “enterprise” must be analyzed in an actual, negotiated private agreement, rather than based on advertising generalities. This is because virtually every element analyzed here is negotiable with the vendor at the enterprise level (I also mention this to avoid mentioning it everywhere in the tables in the enterprise context).

Place of data processing and level of compliance with GDPR

While in the case of local models, the location is naturally on the user’s own computer, different “cloud” providers have different locations. Furthermore, there’s the GDPR compliance or non-compliance declaration. The situation can be more complicated with general AI providers, where it’s impossible to control the data processing region (as is typically possible with APIs and LLM hosting).

Generic AI Providers

Most popular models (ChatGPT, Gemini, Claude) in consumer plans process data globally or in the US, without guaranteeing EU residency, forcing reliance solely on general privacy policies and standard contractual clauses regarding GDPR. The situation changes dramatically in enterprise plans, where giants like OpenAI and Google offer a choice of processing locations (including European regions) and the signing of dedicated data processing agreements (DPAs). The exception is the European Mistral, which by default hosts data in the EU for both consumers and businesses, ensuring “native” GDPR compliance without the need for complex configuration.

AI API Providers

For API services (OpenAI, Claude), the standard for entry-level plans is data processing on servers in the US, with OpenAI API offering the ability to select data residency (US or Europe) for qualifying business customers, a key advantage for entities with stringent compliance requirements. In the context of GDPR, both companies offer commercial customers a data processing amendment (DPA) with SCC clauses, but OpenAI provides a more flexible approach to data localization “at rest” in Europe within its API platform.

Assistant Programmer

The coding assistant market is highly polarized: free cloud solutions (GitHub Copilot, Codeium) typically transfer data to the US, while Enterprise plans offer the option to select an EU region (e.g., GitHub Enterprise Cloud, Codeium in Frankfurt). From a GDPR perspective, the safest option is a tool that allows for “local-first” or “self-hosted” operation (Tabnine, Continue, OpenClaw), where code never leaves the company’s infrastructure, eliminating the issue of data transfer. However, considering the balance between cloud usability and compliance, GitHub Copilot Enterprise offers the most comprehensive legal framework (EU Data Boundary) and technical support.

AI infrastructure

Infrastructure providers are divided into global giants (Azure, Google Cloud), which offer EU regions and advanced data residency mechanisms in their Enterprise plans, and European providers. While all major players ensure GDPR compliance through a DPA, OVHcloud stands out as a “sovereign” provider, subject to French jurisdiction and offering SecNumCloud standards, making it unrivaled in terms of legal data protection from access by foreign jurisdictions (e.g., the US CLOUD Act).

AI Code Analysis

Cloud code analysis tools (e.g., Sourcegraph Cody) process data in the US by default, offering GDPR compliance primarily through privacy policies and DPAs in higher-tier plans. For companies requiring strict control, the availability of a “self-hosted” or “VPC” option (available from Sourcegraph Enterprise or Greptile) is crucial, allowing data to be contained within your own infrastructure, ensuring full compliance with internal security regulations and GDPR.

Input and output rights and copyright transfer

Generic AI Providers

In the category of general AI providers, such as ChatGPT, Claude, or Gemini, consumer users typically retain rights to the input while gaining full or partial rights to the output, although providers like OpenAI and Anthropic offer opt-outs for service improvement licenses; in enterprise plans, output rights are fully transferred to the client, while ownership of the input is retained. The model training policy in free plans allows for data usage with an opt-out option for most (e.g., Claude, Perplexity), while enterprise plans guarantee no training on client data, except for feedback.

AI API Providers

API services like OpenAI API and Claude API ensure user ownership of input and output across both plans, with full ownership of output transferred to the client in the enterprise, without vendor recourse. For model training, API data is not used for training by default in either plan unless explicitly opted in or provided feedback, with Zero Data Retention options for qualifying clients.

Assistant Programmer

In the developer assistant category (e.g., GitHub Copilot, Tabnine, Cursor IDE), input rights remain with the user, while output/suggestions are typically owned by the user upon acceptance, with minimal vendor claims in pro/enterprise plans. Model training on free inputs is possible with an opt-out (e.g., Copilot Free), but enterprise plans like GitHub Copilot Business completely prohibit training on customer data. Tabnine has a no-train, no-retain policy on user code and full rights to suggestions.

AI infrastructure

Infrastructure platforms like Microsoft Azure AI, OVHcloud (absolutely no training or data retention), and Together AI guarantee full retention of input rights and transfer of output rights to the user, often with additional IP protection in the enterprise (e.g., CCC in Azure). The training policy is restrictive: no training on customer data without consent, with Zero Retention policies in OVHcloud and Together AI, even in the baseline plans.

AI Code Analysis

For code analyzers like Sourcegraph Cody or Greptile, the user retains full rights to input and output across all plans, with clear declarations of code and results ownership. Model training is prohibited in enterprise plans (e.g., Cody Enterprise with ZDR), and in free plans – limited or opt-out for analytical purposes. Sourcegraph Cody – explicitly retains ownership of all inputs and outputs with zero training in enterprise.

Responsibility for errors

Generic AI Providers

Services like OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude are provided “as is” with no guarantee of accuracy, with a liability limit of 6-12 months of fees or $100; the user bears the risk of errors and must verify output. Enterprise plans have similar limits, though with exceptions for IP indemnity (e.g., Anthropic has no liability limit), shifting most of the risk to the client.

AI API Providers

APIs such as the OpenAI API and Claude API are provided “as is” without warranty, with liability limits of 12 months’ fees and exclusion of consequential damages; no specific guarantees, even for enterprise users. Users accept the risk of output errors, with a focus on verification.

Assistant Programmer

Assistants like GitHub Copilot, Tabnine, and Amazon Q offer “as is” output, with full user responsibility for errors, licensing, and verification; limits are up to $100 or 6-12 months of fees. Enterprise applications often include IP indemnity (e.g., Copilot Copyright Commitment, Amazon Q), but without guaranteeing substantive accuracy.

AI infrastructure

Infrastructure platforms (HuggingFace, Vertex AI, Azure AI) offer “as is” pricing with caps of 12 months or $50-100 USD, with the user verifying the output independently. Enterprise plans add SLA credits and negotiable indemnities (e.g., Vertex AI two-stage, Together AI up to $1M USD).

AI Code Analysis

In analyzers like Sourcegraph Cody or Greptile, full responsibility for errors rests with the user, with “as is” and warranty exclusions. Enterprise offers uncapped indemnity for IP in outputs (e.g., Sourcegraph ).

Summary – download file

The source table is relatively large and includes a division into enterprise and “standard” commercial, as well as additional notes. Abbreviations are used in places, which I’m listing here. A download link for the full table can be found below – it’s simply too large to include directly in the article.

Glossary of abbreviations

  • GDPR: General Data Protection Regulation, an EU law regulating the processing of personal data, often mentioned in the context of AI services’ compliance with the GDPR.
  • GDPR: General Data Protection Regulation, Polish name GDPR, defining requirements for data processing in consumer and enterprise services.
  • DPA: Data Processing Addendum, an agreement regulating the roles of the data processor and controller in the context of GDPR compliance for enterprise plans.
  • ZDR: Zero Data Retention, a mechanism to ensure that input and output data is not stored after processing, available in some enterprise API plans.
  • SCC: Standard Contractual Clauses, a legal mechanism enabling the transfer of data outside the EEA while complying with the GDPR.
  • SOC: System Organization Controls (e.g., SOC 2 Type II), a standard for security certification and internal controls, confirming compliance of AI services with audit requirements.
  • CCPA: California Consumer Privacy Act, a US consumer privacy law mentioned in the context of compliance with US consumer services.
  • Input: Input data that a user provides to an AI model, such as prompts or content subject to copyright and training prohibitions.
  • Output: Output generated by an AI model, such as responses or code suggestions, which service providers often transfer ownership to the user.

Link to the full table: https://docs.google.com/spreadsheets/d/1s0FFlQ0FAJwEY0Ls7M0czfwdMN17k2dm/edit?usp=sharing&ouid=105290996233915394675&rtpof=true&sd=true

Using AI and the cost of software development

Currently, there’s a significant discrepancy in both research results and subjective perceptions regarding how AI impacts developer efficiency. Those with a vested interest (either because their tool is built around AI or they run a company heavily involved in AI) always praise solutions, talk about the “end of development” and “useless programmers.” On the other hand, we already know of cases where AI-generated code is only fit for throwaway, or where programmers waste more time reviewing and fixing pointless code than if they had written it themselves. Sailing Byte believes that efficiency will increase for a smart, informed programmer familiar with prompt engineering, but if an engineer relies too heavily on generated code, they will be severely disappointed. At the same time, increased efficiency comes with a price, because good AI tools or company infrastructure require a price, and this isn’t a cheap tool. In summary, a business customer can either try to create “their own code” by themselves, risking stability and security (which may have business justification, for example, in creating prototypes), but when using the services of a conscious software house, they should not expect a price reduction at this stage due to “better efficiency”.

Summary

Let me start with a very important note – although I have made every effort to ensure the information presented here is as accurate as possible, it may contain errors, and I am not a lawyer. Furthermore, the regulations and data processing policies of each provider may change. This list is indicative. Always remember to check the current privacy policies and terms and conditions of the services in question before making any business decisions or using any software.

Author

Łukasz Pawłowski

CEO of Sailing Byte

Sailing Byte CEO and former PHP developer. Founder of a software house specializing in a partnership-driven approach, with expertise in Laravel, React.js, and Flutter. My objective is to deliver scalable SaaS solutions through Agile methodologies—offering clients a blend of experience, knowledge, and the right set of collaborative tools. To achieve this, I am committed to sharing my expertise on this blog with clients and readers across Europe, the UK, and the USA, empowering their businesses to flourish.