The Persistent Attacker: Mapping Out the Multi-Week Reconnaissance Phase of a Targeted Attack

header

Modern threat actors don’t smash through the front door. They watch, probe, and map your infrastructure over days—sometimes weeks—before they ever trigger an alert. By the time your SOC detects the anomaly, the early reconnaissance signals that would have revealed their tooling, origin, and intent are long gone.

Cloudflare retains your traffic logs for seven days. Sophisticated attackers know this. You need to know it too.

The Anatomy of a Multi-Week Recon Campaign

Targeted intrusions against SaaS platforms and developer infrastructure rarely begin with a loud exploit. They begin with low-and-slow enumeration: HTTP fingerprinting, subdomain brute-forcing over rotating IPs, and probing API error codes for stack traces or version disclosures.

A well-resourced attacker distributes this activity across three to four weeks at a request rate indistinguishable from organic noise. Each individual request looks benign. The pattern is the payload—and the pattern only becomes visible when you can query the full longitudinal dataset.

Your Cloudflare logs contain the entire story. The rotating User-Agent strings, the sequential 404 cadences against your /api/v1/ namespace, the subtle shift from one ASN to another on day 12. Without persistent log storage, that forensic narrative evaporates on day eight.

The Forensic Crash: When Cloudflare Deletes the Evidence You Need

Picture this: your monitoring system surfaces a spike in 403 responses and malformed Authorization headers at 2:47 AM on a Tuesday. Your on-call engineer opens the incident channel and starts pulling thread.

The anomaly is clearly ongoing—but the behavioral baseline needed to confirm it started 18 days ago. You navigate to Cloudflare’s dashboard. You filter by clientRequestHTTPMethodName, the specific bot User-Agent header your WAF flagged, and the target path. The query executes.

Forensic Fact: Cloudflare Logs Retention defaults to 7 days on most plans. Any traffic event older than that window is permanently purged—no archive, no recovery, no appeal. In a multi-week recon scenario, you have already lost the most incriminating data before the incident is even declared.

The results come back empty for anything older than seven days. Not sparse. Empty. The payload headers that would have fingerprinted the attacker’s toolchain—gone. The sequential ASN pivots that proved coordinated infrastructure—gone. You’re now investigating a confirmed breach with no forensic record of how it began.

You file the incident with incomplete attribution. The post-mortem is inconclusive. The attacker returns in 60 days, better informed.

The Pipeline Headache: Why “Just Use S3” Is an Engineering Sprint, Not a Solution

The standard DevOps retort is: “Set up a Logpush job to S3 and query it with Athena.” In theory, correct. In practice, this is a multi-sprint infrastructure commitment that most lean engineering teams cannot justify for infrequent security incidents.

Here is what “just use S3” actually requires: provision an S3 bucket with appropriate lifecycle policies, configure IAM roles and bucket policies to scope Cloudflare’s Logpush permissions without over-granting, validate that the Logpush job is streaming correctly without silent failures, and then—when an incident occurs—write ad-hoc Athena SQL against schema-inconsistent JSON blobs while under operational pressure.

The Athena query layer alone is a trap. Cloudflare log fields are not uniformly populated. Your edgeResponseStatus may be null on certain request types. Your SQL joins break silently on malformed records. You spend 40 minutes writing TRY_CAST wrappers during an active incident, not investigating the attacker.

Then there’s the cost modeling. Athena charges per terabyte scanned. A high-traffic production domain generating 50GB of compressed logs per month means your first forensic query on six months of unpartitioned data is not free. None of this is fast. None of this is simple. And none of it should be the responsibility of an engineer who is supposed to be tracing an attacker.

The Three-Step Solution: Token → Domain → Forever

This is the problem a properly designed log retention product solves. Not “configure your own pipeline”—paste a token and go.

Step 1: Paste Your Cloudflare API Token

Generate a scoped Cloudflare API token with Logs:Read and Zone:Read permissions. Paste it into the dashboard. The integration validates your token permissions instantly and surfaces every zone associated with your account. No IAM, no bucket policies, no SDK configuration.

Step 2: Select Your Domain

Choose the Cloudflare zone you want to retain. Multi-domain accounts can retain all zones independently. The integration begins backfilling immediately from the current retention window—so you don’t lose the last seven days while setup completes. Every new HTTP request, firewall event, rate-limit trigger, and WAF match is captured from this point forward, permanently.

Step 3: Query It Forever

Your historical log data is indexed and searchable in a structured dashboard—no SQL, no schema debugging, no Athena cold-start latency. Filter by IP, ASN, User-Agent, HTTP method, response code, URI path, or any Cloudflare log field. Run a timeline query across 90 days. Cross-reference the bot probe from week one against the credential stuffing attempt in week three.

Forensic Fact: Attackers conducting multi-stage recon rely on the assumption that you cannot correlate their week-one activity with their week-four exploitation attempt. Indefinite log retention closes this assumption entirely—and turns your Cloudflare edge into a permanent forensic record rather than a rolling 7-day window.

When your next incident fires, your engineer pulls up 14 months of traffic history in under 30 seconds. Attribution is immediate. The post-mortem is complete. The attacker’s infrastructure is mapped from first probe to final payload.

Why This Matters More for Startups Than Enterprises

Enterprise security teams have dedicated SIEM infrastructure, Splunk licenses, and full-time threat intelligence staff. You probably don’t. As a startup founder or a two-person DevOps team, your security posture lives or dies on the tooling you chose to configure before the incident happened.

The gap isn’t technical sophistication—it’s forensic preparedness. The enterprise didn’t win because they’re smarter. They won because they had the log data. Every security decision made after a breach is only as good as the evidence available to make it.

Cloudflare’s 7-day window was designed for billing efficiency, not incident response. Treating it as your security data layer is an architectural decision you will regret at the worst possible moment.

Permanent Readiness Is a One-Time Decision

You will not get a warning before a targeted attacker begins their reconnaissance. You will not receive a notification when Cloudflare purges the logs that would have proven their identity. The gap in your forensic record will only become visible when you need that data most—during an active incident, under pressure, with incomplete information.

The setup takes minutes. The protection is permanent. And unlike an S3+Athena pipeline, there is nothing to break, maintain, or debug during an incident.

Don’t fly blind during your next incident. Retain your logs before they roll over today.

→ Start retaining your Cloudflare logs — no infrastructure required

Tagged: cloudflare-log-retention, incident-response, devops-security, forensic-logging, startup-security, siem-alternative