@alexjc on July 26, 2023

#ai ethics   #plagiarism   #behind the scenes   #tweets  

If you're angry about companies crawling the net to steal your text/images and train their machines using it, this one's for you! I and everyone else hate Terms of Service, but as a counterpoint to the "we need an opt-out mechanism for data collection" argument: 85% of the top domains in the LAION2B-en dataset already opt out through their TOS.

(LAION is a series of datasets of images + captions that are used to train models.)