Copyright and Machine Learning
ChatGPT has taken the world in a storm. By various analysts ChatGPT reached 100 million users in just two months making it the fastest growing service in history. The issue however is that OpenAI, a company behind ChatGPT has started charging for their services, while their machine learning models have been trained on data scraped from internet, likely without permission of creators of that data.
I am not a lawyer, and I am not sure how “fair use” rules will be interpreted when it comes to usage of copyrighted data for training machine learning models. To my knowledge there is no case law that we could rely on. However I am quite sure that no judge or a court will be able to stop technological progress brought by emergence of Artificial intelligence.
Gene Friedman used to be know as New York’s Taxi king. He operated more than 3000 taxis in New York City. To operate a Taxi in NYC and be able to pick up people hailing a car on the street, one needs a taxi shield – a medallion. At the peak of the prices, such medallion was worth more than 1 million dollars. Gene Friedman financed acquisitions of these medallions by loans from Credit Unions. Then Uber appeared. The value of medallions started dropping fast – and it is not worth 10% of what it used to be at the peak. As a consequence, Credit Unions decided to sue Uber. In the case Credit Unions vs Uber, judge ruled against Credit Unions. What caught my eye were the following statements in this judgment.
“In this day and age, even with public utilities, investors must always be wary of new forms of competition arising from technological developments…. It is not the court’s function to adjust the competing political and economic interests disturbed by the introduction of Uber-type apps… Any expectation that the medallion would function as a shield against the rapid technological advances of the modern world would not have been reasonable.”
However, I am neither a Credit Union nor Uber. I am not a competitor of OpenAI. OpenAI has started charging for their services. These services are likely made possible by using data created by other people – maybe even text, images and videos available on this website. However, I have not given any permission for such use, and from now on, as you can see in the footnote in red, I am explicitly stating that such permission is needed in written from a copyright holder. Let’s see how the world will develop and if some new DMCA will be introduced to adjust for the emergence of new AI platforms. Until as a society we define clear rules for using other people’s intellectual contributions for training machine learning algorithms used to generate profit to large corporation I will keep such copyright statement here.