Published on

How a Ghost VM Blocked My GKE Cluster Deletion and How It Was Fixed

Overview

When working with Google Kubernetes Engine (GKE), cleaning up resources isn’t always as straightforward as clicking “Delete.”

I ran into a situation recently where trying to delete a Kubernetes cluster led to a frustrating error.

The system insisted a virtual machine still existed, even though it had been deleted.

Despite the error referencing a VM, that VM did not show up in the Compute Engine dashboard or via gcloud compute instances list. It simply didn't exist or so I thought.

In this post, I’ll walk you through a strange but insightful issue I encountered while deleting a GKE cluster and how a ghost VM almost haunted my entire setup.


Error

    ❌ Delete Kubernetes Engine Cluster: `my-cluster`

    **Error**:  
    Resource is in use by other resources:  
    The subnetwork resource  
    `projects/<project-id>/regions/us-central1/subnetworks/default`  
    is already being used by  
    `projects/<project-id>/zones/us-central1-c/instances/<ghost-vm-name>`

    [Retry](#)

Initial approach:

How about we delete this Ghost VM being referenced?

The Mystery of the Ghost VM

Where was this VM?

    gcloud compute instances list

The referenced Virtual Machine(VM) in the error was not part of the list.

Infact it could not be found.

After some investigation, It was found that:

  • The VM no longer existed as an active instance
  • But it was still referenced in an unmanaged instance group
  • That instance group was the last thing tying the cluster’s networking resources together

This is what’s known as a ghost reference — a VM is gone, but something still thinks it's alive.


What was learnt from this error

1. 🔍 List all unmanaged instance groups:

gcloud compute instance-groups unmanaged list

however, this particular instance group though it listed the VM's in use as 1, it actually had no Vms attached

What Really Happened

Unmanaged Instance Group (UIG)

There was an unmanaged instance group (k8s-ig--7xxxxxxxxx) that referenced a ghost VM (one that no longer existed).

Backend Service

That instance group was in use by a backend service
→ Think of the backend service as the thing that tells Google where to route traffic.

Load Balancers

The backend service was still being used by one or more load balancers — even though they weren’t actively serving anything.

Cluster Deletion Blocked

Because the backend service still depended on the instance group (and the instance group thought it had a VM), Google Cloud said:

“Hey, I can’t delete your GKE autopilot cluster, because something’s still using it.”

so full dependency chain was

    Load Balancers
    Backend Service
    Unmanaged Instance Group
    Ghost VM (doesn’t exist anymore)

Resolution

To successfully delete the Autopilot cluster, we had to:

  1. Delete the loadbalancers.
  2. Delete the backend service.
  3. Finally delete the remove the instance VM
  4. Delete the instance group and the cluster