Apple container vs Colima: local datastore benchmarks on an M4 Mac

A practical benchmark comparing Apple's container runtime and Colima across Redis, Postgres, ClickHouse, DuckDB, and Neo4j on one M4 Mac.

  • containers
  • colima
  • cloudflare
  • databases
  • benchmarks

Apple’s new container CLI made me curious about something practical: should I keep using Colima for local infrastructure, or is Apple’s runtime already good enough to use for real development services?

I did not want to answer that with a hello-world container. Most of my local container usage is not hello-world. It is databases, cache services, graph stores, and small analytical jobs. So I benchmarked Apple container against Colima across a few datastore shapes:

  • Redis for cache / key-value
  • Postgres for OLTP-style relational work
  • ClickHouse for OLAP server work
  • DuckDB for embedded analytics
  • Neo4j for graph queries

This is a personal benchmark on one machine, not a universal claim about container runtimes. Still, the results were useful because they were not one-dimensional.

The short version:

  • Apple container was faster for the long-running services I tested over localhost TCP.
  • Colima was easier operationally and much faster for the short-lived DuckDB workload.
  • The interesting difference was not just throughput. It was also startup behavior, volume behavior, and how much runtime-specific setup each image needed.

My takeaway:

Apple container is worth testing seriously for long-running local services on Apple Silicon. Colima is still the smoother Docker-compatible baseline. The right answer depends on the workload.

What I tested#

Test machine:

Test machine

The local machine and runtime versions used for this run.

Item Value
Mac Apple M4
Memory 16 GiB
OS macOS 26.5.1, build 25F80
Apple container 1.0.0
Colima 0.10.3
Docker via Colima client 29.6.0, server 29.2.1

Docker Desktop was not installed and was not part of this test.

For Redis, Postgres, ClickHouse, and DuckDB, both runtimes were capped at:

  • 2 CPUs
  • 4 GiB memory

I used 2 CPUs because my Colima VM was capped at 2 CPUs. The first broad run failed when I tried to use 4 CPUs, so I normalized both runtimes to the lower available cap.

Neo4j was the first benchmark I ran, before the later 2 CPU normalization. I kept it in the article because it is a useful graph-database data point, but I treat it separately from the normalized Redis/Postgres/ClickHouse/DuckDB suite.

Workloads#

Workload matrix

Each datastore represented a different local development shape.

Store Shape Image Workload
Redis KV/cache redis:7.4-alpine mixed GET, SET, INCR, MGET, MSET over 50k keys
Postgres OLTP postgres:16-alpine 20k accounts, 200k events, mixed reads/writes/updates/aggregates
ClickHouse OLAP server clickhouse/clickhouse-server:latest 1M-row MergeTree, mixed count, rollup, filtered aggregate, top-N queries
DuckDB Embedded OLAP duckdb/duckdb:latest 1M-row DuckDB table plus Parquet scan in short-lived containers
Neo4j Graph neo4j:5-community 9,485 graph nodes, 35,332 relationships, mixed graph read queries

Redis, Postgres, ClickHouse, and Neo4j ran as services. The host benchmark client connected over localhost TCP.

DuckDB is different. I did not run DuckDB as a service. I ran the DuckDB CLI inside short-lived containers against a mounted workspace. That matters, because DuckDB ended up showing the opposite result from the long-running services.

Headline results#

At concurrency 4, Apple container was faster for every long-running service workload in this run.

Headline service results at concurrency 4

Long-running services connected over localhost TCP.

Store Metric Colima Apple container Faster
Redis ops/s 15,378.75 27,434.62 Apple winner
Postgres ops/s 15,141.07 26,371.10 Apple winner
ClickHouse ops/s 200.90 230.34 Apple winner
Neo4j ops/s 372.09 627.69 Apple winner

DuckDB went the other direction.

DuckDB short-lived command results

DuckDB ran as a short-lived CLI workload against a mounted workspace.

DuckDB metric Colima Apple container Faster
Setup 0.342 s 0.883 s Colima winner
Query batch p50 0.180 s 0.806 s Colima winner
Query batch p95 0.188 s 0.841 s Colima winner
Full workload command 1.690 s 6.571 s Colima winner

That split is the main point of the benchmark. If I had only tested services, Apple would look like the clear answer. If I had only tested DuckDB, Colima would look like the clear answer. Testing both made the result more useful.

Service concurrency detail#

Each service workload ran at concurrency 1, 4, and 8. Each level ran for 10 seconds.

Service concurrency detail

Each concurrency level ran for 10 seconds.

Store Runtime c=1 ops/s c=4 ops/s c=8 ops/s c=4 p50 c=4 p95
Redis Colima 5,062.47 15,378.75 19,687.21 0.253 ms 0.341 ms
Redis Apple faster c=4 10,555.34 27,434.62 22,310.45 0.139 ms 0.213 ms
Postgres Colima 5,386.89 15,141.07 19,190.96 0.243 ms 0.409 ms
Postgres Apple faster c=4 11,370.94 26,371.10 27,644.91 0.120 ms 0.287 ms
ClickHouse Colima 129.92 200.90 221.68 17.942 ms 42.390 ms
ClickHouse Apple faster c=4 135.80 230.34 238.54 14.030 ms 42.035 ms
Neo4j Colima 194.62 372.09 401.92 7.787 ms 26.957 ms
Neo4j Apple faster c=4 251.75 627.69 499.03 4.623 ms 13.996 ms

All final workload summaries reported zero operation errors.

Redis and Postgres showed the strongest Apple wins in the normalized suite. ClickHouse also favored Apple, but by a smaller margin. Neo4j favored Apple as well, though again, it came from the earlier graph run with a different resource cap.

Startup and readiness#

Throughput was not the only thing I measured. I also tracked pull time, detached start-command time, service readiness, and full workload command duration.

Startup and readiness

Lifecycle timings collected alongside the workload runs.

Store Runtime Pull Start command Ready Workload command
Redis Colima 2.601 s 0.157 s 0.007 s 30.948 s
Redis Apple 1.068 s 0.795 s 0.007 s 31.084 s
Postgres Colima 1.945 s 0.145 s 1.056 s 31.610 s
Postgres Apple 1.130 s 0.688 s 1.031 s 31.713 s
ClickHouse Colima 1.937 s 0.125 s 4.096 s 30.346 s
ClickHouse Apple slow pull 87.784 s 0.761 s 4.554 s 30.398 s
DuckDB Colima faster 1.240 s n/a n/a 1.690 s
DuckDB Apple 11.199 s n/a n/a 6.571 s

Pull timings include warmed and partially warmed reruns, so they are not treated as the headline result.

I do not treat pull time as the headline result. Some images were already warm because I reran parts of the suite while fixing benchmark issues. The clearest cold-ish Apple pull/unpack measurement in this session was ClickHouse, which took 87.784 seconds.

The more consistent lifecycle observation was this:

  • Colima returned from detached docker run -d faster.
  • Apple container run -d took roughly 0.7 to 0.8 seconds for these service containers.
  • Once the service process was starting, readiness was similar for Redis, Postgres, and ClickHouse.

What broke#

The failures were useful. They showed the operational differences more clearly than the happy path.

CPU caps#

The first broad run requested 4 CPUs and failed under Colima:

range of CPUs is from 0.01 to 2.00, as there are only 2 CPUs available

I changed the suite to use 2 CPUs for both runtimes. That made the comparison fairer.

Apple container CPU argument parsing#

Apple container rejected --cpus 2.0:

The value '2.0' is invalid for '--cpus <cpus>'

Passing 2 fixed it.

Neo4j bind mounts#

Colima worked with direct host bind mounts for Neo4j data, logs, import, and plugins.

Apple container did not work with the same Neo4j bind-mounted setup. The official Neo4j image tried to change ownership of /logs and got:

Operation not permitted

The working Apple path used named volumes instead.

Postgres named volumes#

Postgres initially failed under Apple container:

initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
initdb: detail: It contains a lost+found directory, perhaps due to it being a mount point.

Apple named volumes are ext4 images, and the root contains lost+found. Postgres does not want to initialize directly into a non-empty data directory.

The fix was:

PGDATA=/var/lib/postgresql/data/pgdata

ClickHouse auth#

ClickHouse required explicit credentials in this image. I set:

CLICKHOUSE_DB=bench
CLICKHOUSE_USER=bench
CLICKHOUSE_PASSWORD=bench
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1

DuckDB invocation#

The DuckDB image needed the binary invoked explicitly:

duckdb /workspace/bench.duckdb

That was a benchmark-runner fix, not really a runtime finding.

What Apple container did well#

Apple’s strongest result was service throughput.

Redis, Postgres, ClickHouse, and Neo4j all had higher throughput under Apple container in this run. The advantage was largest for Redis, Postgres, and Neo4j. ClickHouse was closer but still favored Apple.

That makes Apple container interesting for local development workflows where:

  • the service runs for a while
  • the client connects over localhost TCP
  • the image works cleanly with Apple’s volume model
  • Docker CLI compatibility is not the main requirement

What Colima did well#

Colima was smoother.

It used the normal Docker CLI. It was easier to script. Detached service starts returned faster. Bind mounts behaved more like I expected from Docker-shaped workflows.

And DuckDB was not close: Colima was much faster for the short-lived embedded analytics job.

That makes Colima still attractive for:

  • Docker-compatible local workflows
  • tools that expect Docker behavior
  • scripts built around docker run
  • mounted-file workloads
  • short-lived containerized commands

Colima being boring is a feature.

What surprised me#

The interesting result is not that one runtime is faster than the other. The interesting result is that workload shape changed the answer.

If I had only tested Redis and Postgres, Apple container would look like the obvious choice.

If I had only tested DuckDB, Colima would look like the obvious choice.

If I had only tested Neo4j, I would have seen Apple’s graph-query throughput and missed the volume-model friction.

Testing several datastore shapes made the split clearer:

  • long-running services favored Apple
  • short-lived file-backed analytics favored Colima
  • operational simplicity favored Colima
  • service throughput favored Apple

That is the result I trust most from this session.

How I would use this today#

For my own local development, this does not replace Colima outright.

Colima remains my default compatibility runtime because it maps cleanly to Docker workflows and existing tooling.

Apple container is the runtime I would test selectively for long-running local services where:

  • the image is known to work
  • the volume setup is understood
  • the service benefits from the throughput profile
  • Docker CLI compatibility is not required

For the graph-database case specifically, both paths make sense:

  • Colima for the smoother Docker-compatible path
  • Apple container for faster query throughput, using named volumes by default

Caveats#

These are personal local tests on one Apple M4 machine with 16 GiB RAM.

This is not a production benchmark.

This is not a Docker Desktop benchmark.

This does not measure multi-day reliability, Compose workflows, Kubernetes behavior, backup/restore, memory pressure under larger datasets, or production durability.

Some pull timings were warm or partially warmed by reruns, so pull time is not the main result.

ClickHouse and DuckDB used latest images in this pass. That is fine for this personal test, but not ideal for a fully reproducible benchmark suite.

The Neo4j result was carried forward from the earlier graph-database run and used a different resource cap than the later Redis/Postgres/ClickHouse/DuckDB suite.

FAQ#

Is this Docker Desktop vs Apple container?#

No. Docker Desktop was not installed on this machine and was not part of the benchmark.

Did Apple container win?#

For long-running service workloads in this run, yes, Apple container had higher throughput.

For the short-lived DuckDB embedded workload, no. Colima was much faster.

For operational simplicity, Colima was smoother.

That is why I do not reduce the result to a single winner.

Why use 2 CPUs?#

Because my Colima VM was capped at 2 CPUs. The first broad-suite run failed when the runner requested 4 CPUs. I changed the suite to use 2 CPUs for both runtimes so Apple would not get a higher CPU cap than Colima.

Why is Neo4j treated differently?#

Neo4j was the original benchmark that started the session. It was run before the later 2 CPU normalization. I kept it in the article because it is a useful graph-database data point, but I keep it separate from the normalized datastore suite.

Why was DuckDB so different?#

DuckDB was measured as an embedded CLI workload inside short-lived containers using a mounted workspace. Redis, Postgres, ClickHouse, and Neo4j were long-running services over localhost TCP. Different shape, different result.

Would larger datasets change the result?#

Possibly. Larger datasets, longer runs, heavier write pressure, different volume modes, and memory pressure could all change the shape.

Final read#

Apple container looks genuinely strong for long-running local datastore services on Apple Silicon. In my tests, Redis, Postgres, ClickHouse, and Neo4j all had better throughput under Apple container.

Colima remains the easier operational baseline. It is Docker-compatible, predictable, and better for the short-lived DuckDB embedded workload I tested.

So I do not frame this as a replacement story.

I frame it this way:

Apple container is now worth testing seriously for local services. Colima is still the compatibility baseline. The right answer depends on the workload.

That is a more useful result than a single winner.

References#

Disclosure#

This article was written with assistance from ChatGPT/Codex based on my local benchmark session, commands, outputs, and review direction.