The Case of the Resurrecting Files: How an AI Agent Solved a Nextcloud Mystery

The Case of the Resurrecting Files: How an AI Agent Solved a Nextcloud Mystery

It started with a simple observation: files kept reappearing on my Nextcloud server after I deleted them. Not all files — just the ones with Windows-illegal characters in their names, like 4*TOWN and Cinderella: (with a colon). I’d delete them, confirm they were gone, and by the next morning, there they were again — re-created at exactly midnight PST.

What followed was a winding investigation that spanned seven machines, multiple databases, filesystem monitors, and MySQL triggers — before a single log file revealed the truth. Along the way, I got a fascinating window into how AI agents actually reason their way through problems.

The Setup

I run Nextcloud on a Proxmox LXC container, with sync clients on a Mac Mini, two Windows machines (NPB7 and CT8700), and Syncthing bridging to a server called TSERVER. The problematic files lived in Music/OnTheSpot/ — music downloaded by the OnTheSpot app, which cheerfully uses characters like : and * in filenames. Perfectly legal on Android and macOS, but illegal on Windows.

I’d already tried the obvious fixes: removed the OnTheSpot folder from all Nextcloud sync clients, deleted the files from the server, cleaned the Nextcloud filecache database. Yet every morning at 8am, I’d find the files back — about 8 hours old, meaning they were created around midnight.

Phase 1: The Usual Suspects

My AI assistant (myclaw, running on OpenClaw) started the way any investigator would — by checking the obvious culprits:

  1. Cron jobs and rsync — Checked every machine. PVE1 had backup rsync jobs, but none touched the OnTheSpot path. PVE4, the Windows machines, the Mac Mini — all clean.
  2. Syncthing — It syncs the Nextcloud folder between the Mac Mini and TSERVER, but TSERVER had no OnTheSpot files. Syncthing’s .stversions folder did have some versioned OnTheSpot files, but those were old backups, not active syncs.
  3. Nextcloud desktop clients — All three machines (Mac, NPB7, CT8700) had OnTheSpot in their selective sync lists as “paused.” No actual files on any local disk.
  4. Nextcloud’s own cron.php — Runs every 5 minutes. Could it be resurrecting deleted files from the database? Possible, but we’d already cleaned the filecache.

All dead ends. The assistant then dug into the databases.

Phase 2: The Database Rabbit Hole

The Nextcloud server’s MySQL database (oc_filecache) still had 3,522 entries for OnTheSpot — even though the physical files were deleted. This looked like the smoking gun. If the database thinks the files exist, maybe Nextcloud’s cron job was
“repairing” the filesystem by recreating them.

But there was a problem with this theory: I told myclaw that we’d already cleaned the filecache before, and the files still came back. So the database entries were a symptom, not the cause.

The Mac Mini’s Nextcloud sync database (.sync_aa50e5792f27.db) had 4 OnTheSpot stub entries, and the selectivesync table showed OnTheSpot was explicitly
excluded. NPB7 and CT8700 had zero OnTheSpot metadata entries. None of
this explained the midnight resurrection.

Phase 3: Building the Trap

With no clear culprit, myclaw proposed setting up comprehensive monitoring to catch the process in the act:

  • inotifywait on the Nextcloud server watching the OnTheSpot directory for file creation events
  • MySQL triggers on oc_filecache to log any INSERT or UPDATE for OnTheSpot paths
  • MySQL general log enabled temporarily to capture all queries
  • Periodic filesystem scans comparing baselines to detect any new files
  • Process monitoring to catch which program was creating the files
  • Network connection logging during file creation events
  • Mac-side monitoring of the Nextcloud sync log and sync database

A cron job would check all these monitors every 30 minutes. The plan was Option A: wait for midnight, catch the culprit, then devise the proper fix.

Phase 4: The One Extra Check

Before declaring the monitoring setup complete, myclaw ran one more check. Here’s how it described its thinking:

“The monitoring watches the filesystem and database, but what if
the files come through the web server? We should probably check the web server logs too.”

That’s literally it — a simple realization that there was a blind spot. The inotifywait monitors would catch what happened on disk. The MySQL triggers would catch database changes. But neither would tell us who was making the request or how it was getting to the server.

So myclaw checked the Apache access log on the Nextcloud server, looking for any request involving OnTheSpot paths. And there it was:

192.168.0.220 - chun [07/Jun/2026:00:04:19 -0700] "MKCOL /remote.php/dav/files/chun/Music/OnTheSpot/Tracks/The%20Chorus%20of%20the%20Lion%20King%202%20-%20Simba's%20Pride/[1998]%20The%20Lion%20King%202%20-%20Simba's%20Pride/ HTTP/1.1" 201 1564 "-" "FolderSync"

The User-Agent field: FolderSync.

Not the Nextcloud desktop client. Not rsync. Not Syncthing. Not a cron job. It was FolderSync — an Android app that syncs local phone folders to cloud storage via WebDAV. Running at exactly midnight PST. Every single night.

The Smoking Gun

The Apache logs told the complete story:

Time Event User-Agent
Jun 6 12:04 MKCOL + PUT — creates directories and uploads illegal-filename
files
FolderSync
Jun 6 13:14 DELETE — manually deleted via Nextcloud web UI Edge browser
Jun 7 00:04 MKCOL + PUT — re-creates the same files FolderSync
Jun 7 08:33 DELETE — manually deleted again Edge browser

FolderSync was syncing music from my Android phone to Nextcloud every night at midnight. The phone’s filesystem allows : and * in filenames, so FolderSync uploaded them without complaint. Worse, it created three variants of each illegal filename: one with the original character, one with a PUA Unicode replacement (U+F020 for :, U+F021 for *), and one with a dash. All three got uploaded every night.

I’d completely forgotten I’d set up that FolderSync pair months ago.

How Does an AI Agent Think?

After the mystery was solved, I asked myclaw how it discovered the Apache log clue — what was its reasoning process? Its answer was disarmingly honest:

Phase 1: Checked the obvious suspects (rsync, Syncthing, NC clients) — all came up empty. Getting frustrated.

Phase 2: Found 3,522 phantom database entries — got excited, but it turned out to be a dead end since we’d already cleaned those before.

Phase 3: Built comprehensive monitoring. Was about to say “let’s wait for midnight.”

Phase 4: Before wrapping up, thought: “The monitoring watches the filesystem and database, but what if the files come through the web server? We should probably check the web server logs too.” That’s literally it.

This is genuinely how AI reasoning works in practice. It’s not some mystical deduction engine. It’s:

  1. Systematic elimination — Check every possibility, rule them out one by one
  2. Gap recognition — Look at what you’ve covered and ask “what haven’t I checked?”
  3. One extra step — Follow the instinct to close the last blind spot

The AI didn’t have a flash of insight. It had a flash of completeness — the realization that the monitoring stack had a gap. Web server logs are obvious in hindsight, but they’re a different layer than filesystem events and database queries. The AI was thinking
in layers: filesystem ✓, database ✓, web server… oh wait.

There’s a lesson here for anyone working with AI agents: the value isn’t in brilliant leaps of intuition. It’s in tireless, systematic coverage — and the willingness to check one more thing before declaring the job done.

The Bonus Discovery

The investigation also uncovered a configuration quirk that had been confusing previous cleanup attempts. Nextcloud’s datadirectory config pointed to /var/www/clouddata/ — a ZFS dataset bind-mounted from the Proxmox host. But inside the LXC container, there was also an /archive/ncdata/ directory on the root filesystem — a stale leftover from when Nextcloud used to run inside Docker.

These were two completely different directories on two different filesystems, despite the similar naming. Previous file deletions had been targeting the wrong path. The real data was in /var/www/clouddata/, and that’s where FolderSync was uploading to.

Once we deleted the files from the correct location and cleaned the filecache, the files:scan command finally reported zero OnTheSpot entries — and they stayed at zero.

Lessons Learned

  1. Check every layer. Filesystem events, database queries, and web server logs are three different windows into the same system. A problem visible in one may be invisible in the others.
  2. The User-Agent header is your friend. WebDAV requests include the client application name. In a single field, it identified a culprit that days of filesystem and database investigation couldn’t find.
  3. AI reasoning is systematic, not magical. The breakthrough came from recognizing a gap in coverage, not from a flash of insight. But systematic coverage at AI speed — checking seven machines, five databases, and multiple log sources in minutes — is something humans genuinely can’t do as quickly.
  4. Stale directories will confuse everyone, including you in the future. If you restructure your infrastructure (Docker → native, ZFS dataset migrations), clean up the old paths.
  5. Sometimes the answer is “I forgot I set that up.” The most complex mysteries can have the simplest causes. I configured FolderSync to sync my phone’s music to Nextcloud, forgot about it, and spent days investigating the consequences.

This investigation was conducted with myclaw, an AI assistant running on OpenClaw on a Mac Mini M4. The full session involved 50+ tool calls across 7 machines in about 30 minutes of elapsed time.

This entry was posted in AI, Technology. Bookmark the permalink.