Deployment Failures

When a deployment fails, Clank marks it with a red Failed status and stops the pipeline. This page covers the most common failure modes, how to identify them from the deployment logs, and how to fix each one.

Every deployment has a Logs tab in the dashboard that shows the build output and container logs. Start there when diagnosing any failure.

Build failed

What you see: The deployment fails during the build phase. The logs show Docker build output ending in an error.

Possible causes (most to least likely):

Missing dependency or package — Your package.json, requirements.txt, or equivalent references a package that cannot be resolved.
- Fix: Check the error message for the specific package name. Make sure it is published and spelled correctly. Run the install command locally to reproduce the issue.
Dockerfile syntax error — A malformed instruction in your Dockerfile.
- Fix: If you are using a custom Dockerfile, validate the syntax. If Clank auto-generated the Dockerfile, check that your project structure matches what the detector expects (e.g., a package.json at the repo root for Node.js projects).
Wrong base image — The specified base image does not exist or the tag is invalid.
- Fix: Verify the image and tag exist on Docker Hub or your registry. Common mistake: using node:20 when only node:20-slim is available for your architecture.
Build command failure — The build step itself (npm run build, pip install, etc.) exits with a non-zero code.
- Fix: Read the build output in the deployment logs. The error is usually a compilation error, type error, or missing environment variable needed at build time.

Health check timed out

What you see: The build succeeds and the container starts, but the deployment fails with a health check timeout.

Possible causes:

App not listening on the configured port — Clank sends HTTP requests to the port specified in your service settings. If your app listens on a different port, the health check never gets a response.
- Fix: Check your app’s listening port and update the service port setting in the dashboard. Common defaults: Node.js (3000), Python/Flask (5000), Python/Uvicorn (8000), Go (8080).
Wrong health check path — The health check hits a path that does not exist or returns a non-2xx status.
- Fix: Set the health check path to a route your app actually serves. A simple GET / or GET /health that returns 200 is sufficient.
App takes too long to start — Some applications (especially JVM-based ones) need more time to initialize than the default timeout allows.
- Fix: Increase the health check timeout in the service settings. If your app takes more than 60 seconds to start, investigate startup performance.
App crashes on startup — The container starts but the process exits immediately. See Container crashed below.

Container crashed

What you see: The deployment logs show the container exiting shortly after start. The status may show “Container exited” or the health check times out because the process is no longer running.

Possible causes:

Missing or incorrect environment variable — The app tries to read a required variable that is not set, and exits with an error.
- Fix: Check the container logs in the deployment’s Logs tab. Look for messages like “missing required config” or “undefined variable”. Add the missing variable in the Environment tab and redeploy.
Permission error — The container process cannot access a file, directory, or port it needs.
- Fix: Check if your Dockerfile switches to a non-root user that lacks access. For bound ports below 1024, either switch to a higher port or run as root.
OOM killed — The container exceeded its memory limit and was killed by the kernel.
- Fix: Check the logs for “Killed” or “OOMKilled”. Increase the memory limit in the service resource settings, or reduce your application’s memory usage.
Entrypoint or CMD error — The Dockerfile’s entrypoint or command is wrong.
- Fix: Verify the CMD or ENTRYPOINT instruction in your Dockerfile. A common mistake is referencing a script that is not executable or does not exist at the expected path.

Image pull failed

What you see: The deployment fails early with a message about failing to pull the image.

Possible causes:

Invalid image name or tag — The image reference has a typo or the tag does not exist.
- Fix: Double-check the image name and tag. Search for the image on Docker Hub or your registry to confirm it exists.
Private registry without credentials — The image is in a private registry and Clank does not have credentials to pull it.
- Fix: Configure registry credentials in your service settings. For GitHub Container Registry, you need a personal access token with read:packages scope.
Rate limiting — Docker Hub enforces pull rate limits for unauthenticated and free-tier users.
- Fix: Add Docker Hub credentials to avoid anonymous rate limits, or wait and retry.

Resource exhaustion

What you see: Deployments fail intermittently, or succeed but the container is killed shortly after starting.

Possible causes:

Disk full — The server has no space left for pulling images or writing container layers.
- Fix: SSH into the server and run docker system prune to remove unused images and containers. Check available disk with df -h. Clank runs periodic cleanup automatically, but large images can fill disk between runs.
Out of memory (server-level) — The server does not have enough RAM to run the container alongside existing workloads.
- Fix: Check memory usage on the server with free -h. Stop unused services, reduce memory limits on other containers, or move workloads to a server with more RAM.

Escalation

If none of the above resolves your issue:

Check the deployment logs thoroughly. The root cause is almost always visible in the build output or container logs.
Try deploying locally. Build and run the Docker image on your local machine to rule out platform-specific issues.
Check server health. Run clank-agent doctor on the server to verify Docker, network, and disk are healthy.
Contact support with your deployment ID and the relevant log output. You can find the deployment ID in the URL when viewing a deployment in the dashboard.