Who’s Scanning the Internet and Why It Matters: What You Need to Know

The Digital Probes You Never See: How Researchers Scan the Web and What It Means for You

In an age where digital visibility is as critical as physical security, internet-wide scanning has become a standard tool for researchers, cybersecurity analysts, and unfortunately, malicious actors too. But what happens when researchers — often with good intentions — scan the internet? Are their methods ethical? Could they be dangerous? A blog post by Johannes B. Ullrich, Ph.D., Dean of Research at SANS.edu, dives into this increasingly vital conversation. As online scanning becomes more common and automated, understanding the purpose, intent, and consequences of these actions is essential.

This deep dive explores how organizations track scanning behavior, what protocols researchers should follow, and how these activities might affect your network. It also looks into recommendations from RFC 9511, a recently published guideline that encourages transparency and attribution in scanning operations. Whether you’re a system admin, a researcher, or just a curious observer of internet behavior, this article offers insight into a topic that’s often invisible but ever-present in the background of our digital lives.

Here’s What’s Happening with Internet Scanners Today:

Researchers have been analyzing global internet activity for years, monitoring groups that perform systematic scans of IP addresses across the web. Right now, 36 such groups have been identified, with over 33,000 IPs involved in scanning activities. These aren’t necessarily bad actors — many are academic or commercial researchers trying to map or secure digital infrastructure.

However,

RFC 9511, titled “Attribution of Internet Probes,” was recently highlighted as a potential solution. It recommends transparency from scanners: adding URLs to probe packets, publishing probe descriptions at accessible URLs (like /.well-known/probing.txt), and making the IPs traceable to real organizations or people. These measures allow system admins to identify who is probing their networks — and why.

Scans, even when well-meaning, can sometimes crash systems. A past example involved Cisco routers crashing when hit with empty UDP packets. This shows how delicate scanning can be and why responsible practices are needed.

Ullrich discusses how their team has considered removing ethical scanners from their blocklist, especially when they are transparent and identifiable. He suggests that instead of blocking these scans, system owners could use existing data from services like Shodan or CENSYS, which already collect extensive scanning data that researchers can tap into.

The article closes by noting that researchers are generally taken at their word unless clear malicious intent is discovered. New scanners are always emerging, and the SANS team remains vigilant, constantly updating their tracking feeds.

What Undercode Say:

The conversation around internet-wide scanning taps into broader cybersecurity debates about privacy, ethics, and trust. While scanning can be a crucial tool in identifying vulnerabilities and ensuring robust security, it also raises questions about consent and risk.

First, let’s address the ethics. The internet is a shared space, but it doesn’t mean open access for any kind of traffic. Ethical scanning must be based on transparency and non-intrusiveness. That’s where RFC 9511 plays a key role. By requiring scanners to identify themselves, it promotes accountability — a foundational principle in ethical cybersecurity. This kind of attribution turns faceless traffic into a traceable, analyzable source of data, which helps defenders understand intent.

The article makes a strong case for using pre-existing data before launching new scans. Services like Shodan and CENSYS already provide rich datasets that researchers can tap into without further burdening the internet or risking system instability. This is not only efficient, but also more ethical. It minimizes potential disruptions and helps researchers focus on analysis rather than data collection.

There’s also the issue of harm. As illustrated by the Cisco router incident, even a well-meaning probe can crash systems. This is a stark reminder that the internet is fragile in unexpected ways. Without proper coordination or caution, scans can unintentionally cause outages, slowdowns, or even security misfires if systems react defensively.

Ullrich’s discussion on blocklists touches on a practical dilemma for network defenders. Should you block all unknown scans? Probably not. While blocking may offer temporary relief, it also removes opportunities for learning. Recognizing patterns, identifying sources, and classifying scan behavior can help security teams gain better visibility into their attack surface and prioritize defense mechanisms.

Trust, ultimately, is the linchpin. Researchers scanning the web must earn and maintain that trust through transparency, consistency, and responsible behavior. If they fail to clearly attribute their scans, or if they behave like attackers (e.g., exploiting bugs or masking their identity), then they should expect to be treated as threats.

With new organizations popping up almost weekly, staying ahead of scanning trends is no longer optional. Network defenders need up-to-date threat feeds, but they also need context. Not every scan is an attack, but every scan could lead to one — especially if it’s harvesting data for malicious actors down the line.

So, where does this leave us? In a world where automated scanning is a norm, organizations need to embrace a model of “assumed visibility.” You’re always being watched, so focus on how to interpret the watchers — their motives, their tools, and their ethics.

Fact Checker Results: ✅

🔍 The blog post is credible and comes from a respected cybersecurity expert.
🌐 RFC 9511 is a legitimate, publicly available document outlining scanning attribution best practices.
💡 The data and examples (like the Cisco bug) are historically accurate and widely recognized in the field.

Prediction:

As scanning becomes increasingly automated and commercialized, we predict an eventual regulatory push requiring researchers to follow standards like RFC 9511. Expect broader adoption of attribution practices and perhaps even centralized registries for research-based scans. Over time, this transparency will become a litmus test for distinguishing ethical researchers from cybercriminal reconnaissance.

References:

Reported By: isc.sans.edu
Extra Source Hub:
https://www.github.com
Wikipedia
Undercode AI

Image Source:

Unsplash
Undercode AI DI v2

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram

Listen to this Post