
By Davey Alba and Rachel Metz
A massive public dataset used to build popular artificial intelligence image generators contains at least 1,008 instances of child sexual abuse material, a new report from the Stanford Internet Observatory found.
LAION-5B, which contains more than 5 billion images and related captions from the internet, may also include thousands of additional pieces of suspected child sexual abuse material, or CSAM, according to the report. The inclusion of CSAM in the dataset could enable AI products built on this data — including image generation tools like Stable Diffusion — to create new, and potentially realistic, child abuse content, the report warned.
The rise of increasingly powerful AI tools has raised alarms in part because these services are built with troves of online data — including public datasets such as LAION-5B — that can contain copyrighted or harmful content. AI image generators, in particular, rely on datasets that include pairs of images and text descriptions to determine a wide range of concepts and create pictures in response to prompts from users.
In a statement, a spokesperson for LAION, the Germany-based nonprofit behind the dataset, said the group has a “zero tolerance policy” for illegal content and was temporarily removing LAION datasets from the internet “to ensure they are safe before republishing them.” Prior to releasing its datasets, LAION created and published filters for spotting and removing illegal content from them, the spokesperson said.Christoph Schuhmann, LAION’s founder, previously told Bloomberg News that he was unaware of any child nudity in the dataset, though he acknowledged he did not review the data in great depth. If notified about such content, he said, he would remove links to it immediately.
A spokesperson for Stability AI, the British AI startup that funded and popularized Stable Diffusion, said the company is committed to preventing the misuse of AI and prohibits the use of its image models for unlawful activity, including attempts to edit or create CSAM. “This report focuses on the LAION-5B dataset as a whole,” the spokesperson said in a statement. “Stability AI models were trained on a filtered subset of that dataset. In addition, we fine-tuned these models to mitigate residual behaviors.”
LAION-5B, or subsets of it, have been used to build multiple versions of Stable Diffusion. A more recent version of the software, Stable Diffusion 2.0, was trained on data that substantially filtered out “unsafe” materials in the dataset, making it much more difficult for users to generate explicit images. But Stable Diffusion 1.5 does generate sexually explicit content and is still in use in some corners of the internet. The spokesperson said Stable Diffusion 1.5 was not released by Stability AI, but by Runway, an AI video startup that helped create the original version of Stable Diffusion. Runway said it was released in collaboration with Stability AI.
“We have implemented filters to intercept unsafe prompts or unsafe outputs when users interact with models on our platform,” the Stability AI spokesperson added. “We have also invested in content labeling features to help identify images generated on our platform. These layers of mitigation make it harder for bad actors to misuse AI.”
LAION-5B was released in 2022 and relies on raw HTML code collected by a California nonprofit to locate images around the web and associate them with descriptive text. For months, rumors that the dataset contained illegal images have circulated in discussion forums and on social media.“As far as we know, this is the first attempt to actually quantify and validate concerns,” David Thiel, chief technologist of the Stanford Internet Observatory, said in an interview with Bloomberg News.
For their report, Stanford Internet Observatory researchers detected the CSAM material by looking for different kinds of hashes, or digital fingerprints, of such images. The researchers then validated them using APIs dedicated to finding and removing known images of child exploitation, as well as by searching for similar images in the dataset.
Much of the suspected CSAM content that the Stanford Internet Observatory found was validated by third parties like Canadian Centre for Child Protection and through a tool called PhotoDNA, developed by Microsoft Corp., according to the report. Given that the Stanford Internet Observatory researchers could only work with a limited portion of high-risk content, additional abusive content likely exists in the dataset, the report said.
While the amount of CSAM present in the dataset doesn’t indicate that the illicit material “drastically” influences the images churned out by AI tools, Thiel said it does likely still have an impact. “These models are really good at being able to learn concepts from a small number of images,” he said. “And we know that some of these images are repeated, potentially dozens of times in the dataset.”
Stanford Internet Observatory’s work previously found that generative AI image models can produce CSAM, but that work assumed the AI systems were able to do so by combining two “concepts,” such as children and sexual activity. Thiel said the new research suggests these models might generate such illicit images because of some of the underlying data on which they were built. The report recommends that models based on Stable Diffusion 1.5 “should be deprecated and distribution ceased wherever feasible.”
– With assistance from Marissa Newman and Aggi Cantrill.
More stories like this are available on bloomberg.com
©2023 Bloomberg L.P.
Reach Out
Don’t hesitate to reach out to us to discuss your specific needs. Our team is ready and eager to provide you with tailored solutions that align with your firm’s goals and enhance your digital marketing efforts. We look forward to helping you grow your law practice online.
Our Services:
Blog Post Writing
We do well-researched, timely, and engaging blog posts that resonate with your clientele, positioning you as a thought leader in your domain.
Content Writing
Beyond articles and content for blogs, we delve into comprehensive content pieces like eBooks, and case studies, tailored to showcase your expertise.
Website Content Writing: First impressions matter. Our content ensures your website reflects the professionalism, dedication, and expertise you bring to the table.
Social Media Management
In today’s interconnected world, your online presence extends to social platforms. We help you navigate this terrain, ensuring your voice is consistently represented and heard.
WordPress Website Maintenance
Your digital office should be as polished and functional as your physical one. We ensure your WordPress site remains updated, secure, and user-friendly.
For more information, ad placements in our attorney blog network, article requests, social media management, or listings on our top 10 attorney sites, reach out to us at canyoncrestguide@gmail.com.
Warm regards,
Canyon Crest Guide Newspaper

Whether you’re a startup or an established brand, business directories offer an affordable, yet powerful tool to elevate your brand recognition and reach. Sign up, stand out, and let your business soar to new heights, sign up to one of our directory websites:
Canyon Crest Directory
Riverside Ca Business Directory
The Riverside Coupon Directory

Newspaper Ads Canyon Crest CA
Like us on Facebook Here
Canyon Crest Guide
5225 Canyon Crest Drive Ste.71 #854 Riverside CA 92507
Tony Ramos 951-235-3518
For great backlinks to your website sign up to one of our directory websites:
Canyon Crest Directory
Riverside Ca Business Directory
The Riverside Coupon Directory
We all want to be satisfied, even though we know some people who will never be that way, and others who see satisfaction as a foreign emotion that they can’t hope to ever feel.
Newspaper Ads Canyon Crest CA
Click To See Full Page Ads
Click To See Half Page Ads
Click To See Quarter Page Ads
Click To See Business Card Size Ads
If you have questions before you order, give me a call @ 951-235-3518 or email @ canyoncrestnewspaper@gmail.com
Like us on Facebook Here
Source link
Mayorkas say impeachment effort ‘does not rattle me’
WASHINGTON — This isn’t the kind of history Alejandro Mayorkas wanted to make. The son of...
Behind JuJu Watkins, USC women’s basketball has become ‘the hunted’ – Orange County Register
They danced in the rain on Figueroa Avenue, a throng of trumpets gathering to serenade the...













0 Comments