← Back to case studies

Database engineering

Zero-downtime MariaDB migration to new hardware: tarpipe replication seeding on a multi-terabyte production database

PlatformFreeBSD · Hardware RAID 10 · MariaDB

Dataset4TB across two arrays

Transfer methodUFS snapshot · tarpipe · mdconfig

Verificationpt-table-checksum · pt-table-sync

DowntimeZero — 2am code-push cutover

Result2+ years additional operational life

The situation

The client's production database server was a FreeBSD system running hardware RAID 10 across two arrays — 4TB of total usable space. It had served the application well, but the combination of aging hardware and a dataset that had grown steadily over the years had brought the situation to a planning decision: new hardware was needed, and it needed more disk. The question was how to get there without taking the application offline.

This was a revenue-critical production database. The application ran continuously, write operations happened around the clock, and the client had no maintenance window they were willing to accept. "We'll take it down for a few hours" wasn't on the table. The migration had to be invisible to the application and to the users depending on it.

Before any hardware was ordered, the first step was understanding the database in detail — not just its size, but its behavior under load.

Phase 1 — Analysis before action

Understanding write patterns with mytop

mytop — a real-time MySQL/MariaDB monitor in the tradition of Unix top — was used to observe the database under normal production load: which tables were being written most frequently, which queries were running longest, what the thread state distribution looked like at different times of day. This wasn't just due diligence — it directly informed the migration plan. The most-written tables were the ones where replication lag would matter most and where the verification step after seeding would need the most attention.

The analysis also confirmed that the lowest-traffic window was consistently around 2am. That became the target cutover time — not arbitrary, but derived from the actual behavior of the system.

New server specification and Puppet provisioning

The replacement server was specced with headroom for several years of dataset growth — more spindles, larger arrays, hardware RAID controller with battery-backed write cache. Once the hardware was built, it was provisioned via Puppet: MariaDB installed, configured, and brought to a known-good state without manual intervention. The configuration was derived from the existing server — same buffer pool sizes, same character sets, same SQL mode settings — so the new server's behavior would be identical to the old one. Starting from Puppet meant the configuration was documented, version-controlled, and reproducible from day one.

Phase 2 — Seeding the replica without mysqldump

The conventional approach to seeding a new MySQL replica — mysqldump followed by import — does not scale to multi-terabyte datasets. A full logical dump of several terabytes takes many hours to export and many more to import, during which the new server is building indexes from scratch on every table. For a dataset of this size, that approach would have introduced unacceptable delay and risk.

Instead, the migration used a filesystem-level copy of the live data files — a technique that requires precision but is orders of magnitude faster for large datasets.

Step 1 — Lock, snapshot, record position

On the primary, FLUSH TABLES WITH READ LOCK was issued. This is a global read lock that ensures all pending writes are flushed to disk and the on-disk data files are in a consistent state. With the lock held, a UFS snapshot was taken of the data filesystem — a point-in-time consistent image of the database files at the filesystem level. The binary log filename and position were recorded from SHOW MASTER STATUS. This position is the anchor: it tells the replica exactly where in the binary log stream to begin replicating once the data files are in place. The lock was then released. Total lock duration: seconds, not minutes.

Step 2 — Mount the snapshot and transfer via tarpipe

The UFS snapshot was mounted as a memory disk using mdconfig, giving a consistent, read-only view of the data directory at the exact moment the lock was taken. The data was then transferred to the new server using a tarpipe:

tar -cf - /data/mysql | ssh newserver "tar -xf - -C /data/mysql"

A tarpipe streams a tar archive directly over SSH without writing an intermediate file on either end. For multi-terabyte datasets this matters significantly: there is no staging file consuming disk on either server, no second pass to transfer a dump file, and tar's sequential read pattern on large files is highly efficient on spinning disk. The transfer rate is bounded by the network link and the disk read speed — not by the overhead of mysqldump's row-by-row logical serialization or the subsequent index rebuild on import. Files that are tens or hundreds of gigabytes transfer at near-wire speed.

File ownership and permissions were corrected on the destination after transfer — MySQL's data files need to be owned by the mysql user on the receiving server, and a filesystem-level copy preserves the source ownership which may not match.

Step 3 — Start replication from the recorded position

With the data files in place, MariaDB was started on the new server. CHANGE MASTER TO was issued with the binary log filename and position recorded at the time of the snapshot. Replication started and the new server began consuming the binary log from that exact point — replaying every write that had occurred on the primary since the snapshot was taken, catching up to real time. For a busy database this can take minutes or hours depending on how much write traffic accumulated during the transfer, but it requires no lock on the primary and is entirely transparent to the application.

Replication lag was monitored via SHOW SLAVE STATUS until Seconds_Behind_Master reached zero and held there. At that point the new server was fully current.

Phase 3 — Verification before trust

Replication being current is a necessary condition for cutover, but not a sufficient one. Binary log replication faithfully reproduces writes — but it also faithfully reproduces any data anomalies, corruption, or gaps that existed before replication started. Before promoting the new server, the data had to be verified against the source at the row level.

pt-table-checksum from the Percona Toolkit was run against the primary, generating checksums for every table in the database. These checksums were then compared against the replica using pt-table-sync, which identifies any rows that differ between primary and replica. In a clean replication setup, there should be no differences. Any discrepancies found before cutover could be resolved; any found after would be a production data integrity problem.

The verification run confirmed the replica was consistent with the primary across the full dataset — every table, every row. No gaps, no divergence.

Phase 4 — Building the forward-looking topology

With a verified, current replica in place, the topology was extended before the cutover. A second new server — specced identically to the first — was provisioned via Puppet and seeded using the same snapshot/tarpipe procedure, this time seeded from the new replica rather than the aging primary. This server would become the replica of the new primary after cutover, providing read distribution and a hot standby from day one on the new hardware.

Bringing the second server up before the cutover meant that when the cutover happened, the production topology was immediately restored: new primary with a replica already attached and current, rather than running exposed on a single server while the second one caught up.

Phase 5 — The 2am cutover

The cutover was scheduled for 2am — the low-traffic window identified during the analysis phase. It was executed as a code push: the application's database configuration was updated to point at the new primary's address, deployed via the existing deployment pipeline. There was no manual intervention during the cutover itself, no ad-hoc commands run under pressure at 2am.

The sequence was:

The old primary's replication relationship with the new server was cleanly decoupled — STOP SLAVE on the new server, RESET SLAVE ALL to remove the master configuration. The new server was confirmed to be in read-write mode. The application configuration was deployed pointing at the new server. The old primary was left running in read-only mode as a fallback for a brief observation window, then decommissioned once the application had been confirmed stable on the new hardware.

The application experienced no interruption. No queries were lost. The 2am traffic — lower than daytime but not zero — flowed without incident onto the new server. By the time the client's business day started, the migration was complete and had been running on new hardware for several hours.

The entire procedure — from the FLUSH TABLES lock to the application pointing at new hardware — was designed so that every step was either reversible or already verified before it was taken. There was no moment where the only path was forward.

What changed — and what the client got

The client gained two-plus years of additional operational life on infrastructure that handled their continued growth without further emergency intervention. The new hardware had headroom. The new topology — primary plus replica, both on current hardware, both provisioned via Puppet — was documented, reproducible, and monitored.

The same procedure has been applied in multiple engagements. The specific tools change slightly depending on the filesystem and environment — UFS snapshot on FreeBSD, ZFS snapshot on newer systems, mdconfig or the ZFS equivalent for the memory disk mount — but the structure is the same: analyze first, seed from a filesystem snapshot rather than a logical dump, verify with Percona tools before trusting the replica, build the full topology before cutover, execute the switch as a code deployment rather than a manual operation.

Application downtime

Zero. Cutover executed as a code push at 2am. No queries lost, no maintenance window.

Transfer method

tarpipe from UFS snapshot — near-wire-speed transfer of multi-TB dataset with no intermediate files and no mysqldump overhead.

Data integrity

pt-table-checksum + pt-table-sync confirmed row-level consistency across the full dataset before cutover.

Topology after cutover

New primary + replica already attached and current — full HA topology restored immediately, not retroactively.

Outcome

2+ years additional operational life. New hardware specced with room for continued dataset growth.

The lesson

The difference between a zero-downtime database migration and a stressful maintenance window is almost entirely in the preparation. By the time the 2am cutover happened, every unknown had been resolved: the data was verified identical, the new topology was already running, the cutover was a code deployment rather than a sequence of manual commands, and the old server was standing by as a fallback. There was nothing left to improvise. The procedure works because it separates the risky work — data transfer, verification, topology setup — from the cutover itself, and because it treats the cutover as the final step of a process that's already complete, not the beginning of one.

For databases where mysqldump is impractical at scale, the filesystem snapshot + tarpipe approach closes the gap between "we need to move this" and "we can move this without downtime." The lock duration on the primary is measured in seconds. The transfer runs at wire speed. The binary log replication catches up the delta. The Percona tools verify the result. Each step is independently verifiable before the next begins.

← Previous: PHP 5 jail migration All case studies →