Share this article
Latest news
With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low
Copilot in Outlook will generate personalized themes for you to customize the app
Microsoft will raise the price of its 365 Suite to include AI capabilities
Death Stranding Director’s Cut is now Xbox X|S at a huge discount
Outlook will let users create custom account icons so they can tell their accounts apart easier
Google reportedly allowed OpenAI to scrap data from YouTube videos for GPT-4 training
The AI company used a million hours of videos from YouTube
3 min. read
Published onApril 7, 2024
published onApril 7, 2024
Share this article
Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more
As perthe latest reportfrom The New York Times, OpenAI scrapped data from YouTube videos to train its most advanced large language model (LLM), GPT-4. The AI company reportedly used a million hours of YouTube videos for GPT-4 training.
Interestingly, people from the concerned department at Google, which also owns YouTube, know about OpenAI’s practice of transcribing YouTube videos.
Google allegedly has the same approach so it allows OpenAI to scrap data from YouTube videos for GPT-4 training
The report suggests that OpenAI has developed a new model – theWhisper audio transcription model, which helped the AI company to scrap YouTube video data. It is worth noting that the company is well aware that it might come under the scanner of government bodies. However, it went ahead with the practice believing it was fair use.
The NY Times claimed that OpenAI scrapped data from YouTube videos and podcasts to train its two AI models. The report further mentions the involvement of OpenAI president, Greg Brockman, in the company’s shady approach for GPT-4 training.
The news agency further reported that Google is also practicing the same for training its Gemini AI which is a direct violation of the creator’s copyrights. However, Google said that it scraps data from YouTube videos only when the original creator consents to it.
The NY Times also talked about a report from The Times that Google tweaked its privacy policy last year. Talking about the same, it mentioned:
One motivation for the change, according to members of the company’s privacy team and an internal message viewed by The Times, was to allow Google to be able to tap publicly available Google Docs, restaurant reviews on Google Maps, and other online material for more of its A.I. products.
Previously, OpenAI CTO Mira Murati confirmed that their new AI video model,SoraAI, is trained on publicly available video data.
YouTube is aware of OpenAI’s practice but seemingly hesitates to interfere
In arecent interviewwith Bloomberg, YouTube CEO Neil Mohan said such practicesare a clear violation of terms of service. He added:
One of those expectations is that the terms of service are going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.
When asked about OpenAI using the data from YouTube videos forGPT-4 training, he gave an unsatisfactory answer. Mohan said that he is aware of the reports and added it may or may not have used data from YouTube videos.
Lastly, it is not new for AI companies like OpenAI and Google to use publicly available data for AI training. That said, these companies are well aware that they can be scrutinized for the same matter by the regulators.
Do you think companies using user data to train AI is a shady tactic? What’s your take on this? Share your views in the comments below.
More about the topics:AI,Google,OpenAI
Vlad Turiceanu
Windows Editor
Passionate about technology,Windows, and everything that has a power button, he spent most of his time developing new skills and learning more about the tech world.
Coming from a solid background in PC building and software development, with a complete expertise in touch-based devices, he is constantly keeping an eye out for the latest and greatest!
User forum
0 messages
Sort by:LatestOldestMost Votes
Comment*
Name*
Email*
Commenting as.Not you?
Save information for future comments
Comment
Δ
Vlad Turiceanu
Windows Editor
Coming from a solid background in PC building and software development, he’s a Windows 11 Privacy & Security expert.