AI and Theft: The Issue That Has Gone Missing

A couple years ago, I noted how, if various broligarchs were to steal hard won–and expensive–biomedical data that they then used to build a tool to solve a medical problem like how to turn a clinical bacterial genome sequence into a list of antibiotics that could be used to treat that infection, everyone would recognize that as an obvious act of theft. I concluded:

If LLMs were actually as valuable as everyone claims, then it would be worthwhile to pay authors. Obviously, if you get the data for free, then the expected benefits of LLMs don’t have to be very high. But if (hopefully, when) one factors in the cost of data generation–which is to say, writing–then the gains from LLMs have to be much higher than currently envisioned.

This is an unusual situation for our techbro overlords: it’s not the coders who create most of the value, it’s the data generators who provide the real value and are the real cost. Unlike much of the data, Silicon Valley deals with (consumer information provided for free by customers, and bought very cheaply), this kind of data acquisition is expensive. And if your potential product’s gains can’t cover the costs of the data generators, that’s a bad business model.

On the other hand, having a bunch of LLMs that sound like a bunch of those nineteenth century forsooth and verily reply guy assholes would be kind of hilarious, so maybe the actual available free data does have some utility…

As the relentless push to include AI in everything continues, it’s worth noting that the theft issue has not really been resolved decisively one way or the other. Not only is that a potential (large) liability for these companies*, but it also bears on the discussions about whether AI will be used responsibly.

Could it be developed and used responsibly? Of course. Will it be developed and used responsibly? Given that these same companies potentially stole billions of dollars of property to train these models, the answer very well could be no.

*Of course, transferring billions of dollars from Silicon Valley motherfuckers to artists, photographers, and writers would be an amazing thing for the arts and humanities in the U.S.

This entry was posted in AI, Ethics. Bookmark the permalink.

Leave a Reply