On AI Slop vs OSS Security

03 Nov, 2025

Author's Note

I have spent the better part of a decade in the bug bounty industry, and my perspective on this industry is shaped by this experience. The first five years were spent as a bug hunter and vulnerability researcher, where I developed an intimate understanding of what it takes to find, verify, and responsibly disclose security vulnerabilities. The last five years have been spent at HackerOne (Nov, 2020 - Present), starting as a vulnerability triager where I personally reviewed tens of thousands of submissions, and now as a Team Lead overseeing technical services with a focus on triage operations.

This combination of experience, having worked both sides of the vulnerability reporting ecosystem, from the researcher's perspective to the bug bounty platform's operations, gives me a particular vantage point on how the industry is fracturing under the weight of AI-generated noise. I've reviewed the borderline cases where it's genuinely unclear whether a report is a hallucination or a real finding. I've felt the frustration of maintainers whose inboxes are drowning. I've also seen the pressure that platforms face trying to maintain quality while scaling to handle the deluge.

However, I want to be explicit: these are my personal views and observations, informed by my professional experience but not representative of my employer in any way. HackerOne, as an organization, has its own perspectives, strategies, and positions on these issues. My analysis here reflects my own thinking about the systemic problems I see and potential solutions, not company doctrine or strategy.

The Anatomy of the Problem

There are fundamental issues with how AI has infiltrated vulnerability reporting, and they mirror the social dynamics that plague any feedback system.

First, the typical AI-powered reporter, especially one just pasting GPT output into a submission form, neither knows enough about the actual codebase being examined nor understands the security implications well enough to provide insight that projects need. The AI doesn't read code; it pattern-matches. It sees functions that look similar to vulnerable patterns and invents scenarios where they might be exploited, regardless of whether those scenarios are even possible in the actual implementation.

Second, some actors with misaligned incentives interpret high submission volume as achievement. By flooding bug bounty programs with AI-generated reports, they feel productive and entrepreneurial. Some genuinely believe the AI has found something real. Others know it's questionable but figure they'll let the maintainers sort it out. The incentive is to submit as many reports as possible and see what sticks, because even a 5% hit rate on a hundred submissions is better than the effort of manually verifying five findings.

The result? Daniel Stenberg, who maintains curl, now sees about 20% of all security submissions as AI-generated slop, while the rate of genuine vulnerabilities has dropped to approximately 5%. Think about that ratio. For every real vulnerability, there are now four fake ones. And every fake one consumes hours of expert time to disprove.

The Human Cost Nobody Talks About

A security report lands in your inbox. It claims there's a buffer overflow in a specific function. The report is well-formatted, includes CVE-style nomenclature, and uses appropriate technical language. As a responsible maintainer, you can't just dismiss it. You alert your security team, volunteers, by the way, who have day jobs and families and maybe three hours a week for this work.

Three people read the report. One person tries to reproduce the issue using the steps provided. They can't, because the steps reference test cases that don't exist. Another person examines the source code. The function mentioned in the report doesn't exist in that form. A third person checks whether there's any similar functionality that might be vulnerable in the way described. There isn't.

After an hour and a half of combined effort across three people, that's 4.5 person-hours—you've confirmed what you suspected: this report is garbage. Probably AI-generated garbage, based on the telltale signs of hallucinated function names and impossible attack vectors.

You close the report. You don't get those hours back. And tomorrow, two more reports just like it will arrive.

The curl project has seven people on its security team. They collaborate on every submission, with three to four members typically engaging with each report. In early July 2025, they were receiving approximately two security reports per week. The math is brutal. If you have three hours per week to contribute to an open source project you love, and a single false report consumes all of it, you've contributed nothing that week except proving someone's AI hallucinated a vulnerability.

The emotional toll compounds exponentially. Stenberg describes it as "mind-numbing stupidities" that the team must process. It's not just frustration, it's the specific demoralization that comes from having your expertise and goodwill systematically exploited by people who couldn't be bothered to verify their submissions before wasting your time.

The Broader Burnout Crisis

According to Intel's annual open source community survey, 45% of respondents identified maintainer burnout as their top challenge. The Tidelift State of the Open Source Maintainer Survey is even more stark: 58% of maintainers have either quit their projects entirely (22%) or seriously considered quitting (36%).

Why are they quitting? The top reason, cited by 54% of maintainers, is that other things in their life and work took priority over open source contributions. Over half (51%) reported losing interest in the work. And 44% explicitly identified experiencing burnout.

But here's the gut punch: the percentage of maintainers who said they weren't getting paid enough to make maintenance work worthwhile rose from 32% to 38% between survey periods. These are people maintaining infrastructure that powers billions of dollars of commercial activity, and they're getting nothing. Or maybe they get $500 a year from GitHub Sponsors while companies make millions off their work.

The maintenance work itself is rarely rewarding. You're not building exciting new features. You're addressing technical debt, responding to user demands, managing security issues, and now—increasingly—sorting through AI-generated garbage to find the occasional legitimate report. It's like being a security guard who has to investigate every single alarm, knowing that 95% of them are false, but unable to ignore any because that one real threat could be catastrophic.

When you're volunteering out of love in a market society, you're setting yourself up to be exploited.

And the exploitation is getting worse. Toxic communities, hyper-responsibility for critical infrastructure, and now the weaponization of AI to automate the creation of work for maintainers—it all adds up to an unsustainable situation.

One Kubernetes contributor put it simply: "If your maintainers are burned out, they can't be protecting the code base like they're going to need to be." This transforms maintainer wellbeing from a human resources concern into a security imperative. Burned-out maintainers miss things. They make mistakes. They eventually quit, leaving projects unmaintained or understaffed.

What AI Slop Actually Looks Like

A typical AI slop report will reference function names that don't exist in the codebase. The AI has seen similar function names in its training data and invents plausible sounding variations. It will describe memory operations that would indeed be problematic if they existed as described, but which bear no relationship to how the code actually works.

One report to curl claimed an HTTP/3 vulnerability and included fake function calls and behaviors that appeared nowhere in the actual codebase. Stenberg has publicly shared a list of AI-generated security submissions received through HackerOne, and they all follow similar patterns, professional formatting, appropriate jargon, and completely fabricated technical details.

The sophistication varies. Some reports are obviously generated by someone who just pasted a repository URL into ChatGPT and asked it to find vulnerabilities. Others show more effort—the submitter may have fed actual code snippets to the AI and then submitted its analysis without verification. Both are equally useless to maintainers, but the latter takes longer to disprove because the code snippets are real even if the vulnerability analysis is hallucinated.

Here's why language models fail so catastrophically at this task: they're designed to be helpful and provide positive responses. When you prompt an LLM to generate a vulnerability report, it will generate one regardless of whether a vulnerability exists. The model has no concept of truth—only of plausibility. It assembles technical terminology into patterns that resemble security reports it has seen during training, but it cannot verify whether the specific claims it's making are accurate.

This is the fundamental problem: AI can generate the form of security research without the substance.

The CVE System is Also Collapsing

While AI slop floods individual project inboxes, the broader CVE infrastructure faces its own existential crisis. And these crises compound each other in dangerous ways.

In April 2025, MITRE Corporation announced that its contract to maintain the Common Vulnerabilities and Exposures program would expire. The Department of Homeland Security failed to renew the long-term contract, creating a funding lapse that affects everything: national vulnerability databases, advisories, tool vendors, and incident response operations.

The National Vulnerability Database experienced catastrophic problems throughout 2024. CVE submissions jumped 32% while creating massive processing delays. By March 2025, NVD had analyzed fewer than 300 CVEs, leaving more than 30,000 vulnerabilities backlogged. Approximately 42% of CVEs lack essential metadata like severity scores and product information.

Now layer AI slop onto this already-stressed system. Invalid CVEs are being assigned at scale. A 2023 analysis by former insiders suggested that only around 20% of CVEs were valid, with the remainder being duplicates, invalid, or inflated. The issues include multiple CVEs being assigned for the same bug, CNAs siding with reporters over project developers even when there's no genuine dispute, and reporters receiving CVEs based on test cases rather than actual distinct vulnerabilities.

The result is that the vulnerability tracking system everyone relies on is becoming less trustworthy exactly when we need it most. Security teams can't rely on CVE assignments to prioritize their work. Developers don't trust vulnerability scanners because false positive rates are through the roof. The signal-to-noise ratio has deteriorated so badly that the entire system risks becoming useless.

What Doesn't Work

Banning submitters doesn't work at scale. You can ban an account, but creating new accounts is trivial. HackerOne implements reputation scoring where points are gained or lost based on report validity, but this hasn't stemmed the tide because the cost of creating throwaway accounts is essentially zero.

Asking people to "please verify before submitting" doesn't work. The incentive structure rewards volume, and people either genuinely believe their AI-generated reports are valid or don't care enough to verify. Polite requests assume good faith, but much of the slop comes from actors who have no stake in the community norms.

Trying to educate submitters about how AI works doesn't scale. For every person you educate, ten new ones appear with fresh GPT accounts. The problem isn't knowledge—it's incentives.

Simply closing inboxes or shutting down bug bounty programs "works" in the sense that it stops the slop, but it also stops legitimate security research. Several projects have done this, and now they're less secure because they've lost a channel for responsible disclosure.

None of the easy answers work because this isn't an easy problem.

What Might Actually Work

Disclosure Requirements represent the first line of defense. Both curl and Django now require submitters to disclose whether AI was used in generating reports. Curl's approach is particularly direct: disclose AI usage upfront and ensure complete accuracy before submission. If AI usage is disclosed, expect extensive follow-up questions demanding proof that the bug is genuine before the team invests time in verification. This works psychologically. It forces submitters to acknowledge they're using AI, which makes them more conscious of their responsibility to verify. It also gives maintainers grounds to reject slop immediately if AI usage was undisclosed but becomes obvious during review. Django goes further with a section titled "Note for AI Tools" that directly addresses language models themselves, reiterating that the project expects no hallucinated content, no fictitious vulnerabilities, and a requirement to independently verify that reports describe reproducible security issues.

Proof-of-Concept Requirements raise the bar significantly. Requiring technical evidence such as screencasts showing reproducibility, integration or unit tests demonstrating the fault, or complete reproduction steps with logs and source code makes it much harder to submit slop. AI can generate a description of a vulnerability, but it cannot generate working exploit code for a vulnerability that doesn't exist. Requiring proof forces the submitter to actually verify their claim. If they can't reproduce it, they can't prove it, and you don't waste time investigating. Projects are choosing to make it harder to submit in order to filter out the garbage, betting that real researchers will clear the bar while slop submitters won't.

Reputation and Trust Systems offer a social mechanism for filtering. Only users with a history of validated submissions get unrestricted reporting privileges or monetary bounties. New reporters could be required to have established community members vouch for them, creating a web-of-trust model. This mirrors how the world worked before bug bounty platforms commodified security research. You built reputation over time through consistent, high-quality contributions. The downside is that it makes it harder for new researchers to enter the field, and it risks creating an insider club. But the upside is that it filters out low-effort actors who won't invest in building reputation.

Economic Friction fundamentally alters the incentive structure. Charge a nominal refundable fee—say $50—for each submission from new or unproven users. If the report is valid, they get the fee back plus the bounty. If it's invalid, you keep the fee. This immediately makes mass AI submission uneconomical. If someone's submitting 50 AI-generated reports hoping one sticks, that's now $2,500 at risk. But for a legitimate researcher submitting one carefully verified finding, $50 is a trivial barrier that gets refunded anyway. Some projects are considering dropping monetary rewards entirely. The logic is that if there's no money involved, there's no incentive for speculative submissions. But this risks losing legitimate researchers who rely on bounties as income. It's a scorched earth approach that solves the slop problem by eliminating the entire ecosystem.

AI-Assisted Triage represents fighting fire with fire. Use AI tools trained specifically to identify AI-generated slop and flag it for immediate rejection. HackerOne's Hai Triage system embodies this approach, using AI agents to cut through noise before human analysts validate findings. The risk is obvious: what if your AI filter rejects legitimate reports? What if it's biased against certain communication styles or methodologies? You've just automated discrimination. But the counterargument is that human maintainers are already overwhelmed, and imperfect filtering is better than drowning. The key is transparency and appeals. If an AI filter rejects a report, there should be a clear mechanism for the submitter to contest the decision and get human review.

Transparency and Public Accountability leverage community norms. Curl recently formalized that all submitted security reports will be made public once reviewed and deemed non-sensitive. This means that fabricated or misleading reports won't just be rejected, they'll be exposed to public scrutiny. This works as both deterrent and educational tool. If you know your slop report will be publicly documented with your name attached, you might think twice. And when other researchers see examples of what doesn't constitute a valid report, they learn what standards they need to meet. The downside is that public shaming can be toxic and might discourage good-faith submissions from inexperienced researchers. Projects implementing this approach need to be careful about tone and focus on the technical content rather than attacking submitters personally.

Sustainability is Hard

Every hour spent evaluating slop reports is an hour not spent on features, documentation, or actual security improvements. And maintainers are already working for free, maintaining infrastructure that generates billions in commercial value. When 38% of maintainers cite not getting paid enough as a reason for quitting, and 97% of open source maintainers are unpaid despite massive commercial exploitation of their work, the system is already broken.

AI slop is just the latest exploitation vector. It's the most visible one right now, but it's not the root cause. The root cause is that we've built a global technology infrastructure on the volunteer labor of people who get nothing in return except burnout and harassment.

So what does sustainability actually look like?

First, it looks like money. Real money. Not GitHub Sponsors donations that average $500 a year. Not swag and conference tickets. Actual salaries commensurate with the value being created. Companies that build products on open source infrastructure need to fund the maintainers of that infrastructure. This could happen through direct employment, foundation grants, or the Open Source Pledge model where companies commit percentages of revenue.

Second, it looks like better tooling and automation that genuinely reduces workload rather than creating new forms of work. Automated dependency management, continuous security scanning integrated into development workflows, and sophisticated triage assistance that actually works. The goal is to make maintenance less time-consuming so burnout becomes less likely.

Third, it looks like shared workload and team building. No single volunteer should be a single point of failure. Building teams with checks and balances where members keep each other from taking on too much creates sustainability. Finding additional contributors willing to share the burden rather than expecting heroic individual effort acknowledges that most people have limited time available for unpaid work.

Fourth, it looks like culture change. Fostering empathy in interactions, starting communications with gratitude even when rejecting contributions, and publicly acknowledging the critical work maintainers perform reduces emotional toll. Demonstrating clear processes for handling security issues gives confidence rather than trying to hide problems.

Fifth, it looks like advocacy and policy at organizational and governmental levels. Recognition that maintainer burnout represents existential threat to technology infrastructure. Development of regulations requiring companies benefiting from open source to contribute resources. Establishment of security standards that account for the realities of volunteer-run projects.

Without addressing these fundamentals, no amount of technical sophistication will prevent collapse.

The Arms Race Ahead

The CVE slop crisis is just the beginning. We're entering an arms race between AI-assisted attackers or abusers and AI-assisted defenders, and nobody knows how it ends.

HackerOne's research indicates that 70% of security researchers now use AI tools in their workflow. AI-powered testing is becoming the industry standard. The emergence of fully autonomous hackbots—AI systems that submitted over 560 valid reports in the first half of 2025—signals both opportunity and threat.

The divergence will be between researchers who use AI as a tool to enhance genuinely skilled work versus those who use it to automate low-effort spam. The former represents the promise of democratizing security research and scaling our ability to find vulnerabilities. The latter represents the threat of making the signal-to-noise problem completely unmanageable.

The challenge is developing mechanisms that encourage the first group while defending against the second.

This probably means moving toward more exclusive models. Invite-only programs. Dramatically higher standards for participation. Reputation systems that take years to build. New models for coordinated vulnerability disclosure that assume AI-assisted research as the baseline and require proof beyond "here's what the AI told me."

It might mean the end of open bug bounty programs as we know them. Maybe that's necessary. Maybe the experiment of "anyone can submit anything" was only viable when the cost of submitting was high enough to ensure some minimum quality. Now that AI has reduced that cost to near-zero, the experiment might fail soon if things don't improve.

So, net-net, here's where we are:

When it comes to vulnerability reports, what matters is who submits them and whether they've actually verified their claims. Accepting reports from everyone indiscriminately is backfiring catastrophically because projects are latching onto submissions that sound plausible while ignoring the cumulative evidence that most are noise.

You want to receive reports from someone who has actually verified their claims, understands the architecture of what they're reporting on, and isn't trying to game the bounty system or offload verification work onto maintainers.

Such people exist, but they're becoming harder to find amidst the deluge of AI-generated content. That's why projects have to be selective about which reports they investigate and which submitters they trust.

Remember: not all vulnerability reports are legitimate. Not all feedback is worthwhile. It matters who is doing the reporting and what their incentives are.

The CVE slop crisis shows the fragility of open source security. Volunteer maintainers, already operating at burnout levels, face an explosion of AI-generated false reports that consume their limited time and emotional energy. The systems designed to track and manage vulnerabilities struggle under dual burden of structural underfunding and slop inundation.

The path forward requires holistic solutions combining technical filtering with fundamental changes to how we support and compensate open source labor. AI can be part of the solution through better triage, but it cannot substitute for adequate resources, reasonable workloads, and human judgment.

Ultimately, the sustainability of open source security depends on recognizing that people who maintain critical infrastructure deserve more than exploitation.

They deserve compensation, support, reasonable expectations, and protection from abuse. Without addressing these fundamentals, no amount of technical sophistication will prevent the slow collapse of the collaborative model that has produced so much of the digital infrastructure modern life depends on.

The CVE slop crisis isn't merely about bad vulnerability reports. It's about whether we'll choose to sustain the human foundation of technological progress, or whether we'll let it burn out under the weight of automated exploitation.

That's the choice we're facing. And right now, we're choosing wrong.