On-line Information Migration from HBase to TiDB with Zero Downtime | by Pinterest Engineering | Pinterest Engineering Weblog

Ankita Girish Wagh | Senior Software program Engineer, Storage and Caching

A library with five levels connected by white staircases. Blue couches are on each level, some have readers sitting on them with a book. Bookcases are white filled with rows and rows of books.

At Pinterest, HBase is without doubt one of the most crucial storage backends, powering many on-line storage companies like Zen (graph database), UMS (vast column datastore), and Ixia (close to actual time secondary indexing service). The HBase Ecosystem, although having numerous benefits like robust consistency at row degree in excessive quantity requests, versatile schema, low latency entry to knowledge, Hadoop integration, and so forth. can’t serve the wants of our purchasers for the following 3–5 years. This is because of excessive operational price, extreme complexity, and lacking functionalities like secondary indexes, assist for transactions, and so forth.

After evaluating 10+ totally different storage backends and benchmarking three shortlisted backends with shadow site visitors (asynchronously copying manufacturing site visitors to non manufacturing setting) and in-depth efficiency analysis, we now have determined to make use of TiDB as the ultimate candidate for Unified Storage Service.

The adoption of Unified Storage Service powered by TiDB is a significant difficult undertaking spanning over a number of quarters. It includes knowledge migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem whereas sustaining our availability and latency SLA.

On this weblog publish, we are going to first be taught the varied approaches thought of for knowledge migration with their commerce offs. We are going to then do a deep dive on how the information migration was carried out from HBase to TiDB for one of many first use circumstances having 4 TB desk measurement serving 14k learn qps and 400 write qps with zero downtime. Lastly we are going to find out how the verification was performed to attain 99.999% knowledge consistency and the way the information consistency was measured between the 2 tables.

Basically, a simplified technique for knowledge migration with zero downtime consists of the next:

  1. Assuming you have got database A and also you want to migrate the information to database B, you’ll first begin to double write to database A and database B.
  2. Import the dump of database A in database B whereas resolving conflicts with dwell writes.
  3. Do a validation of each knowledge units.
  4. Cease writing to database A.

Every use case is totally different and might current its personal set of distinctive challenges.

We thought of numerous approaches for doing knowledge migration and finalized the methodology primarily based on numerous commerce offs:

  1. Doing double writes ( writing to 2 sources of truths in sync/async trend) from the service to each tables (HBase and TiDB) and utilizing the TiDB backend mode within the lightning for knowledge ingestion.
    This technique is the only and best to implement. Nevertheless, the pace provided by TiDB backend mode is 50GB/hour so it’s solely helpful for knowledge migration of smaller tables.
  2. Take a snapshot dump of the HBase desk and stream dwell writes from HBase cdc (change knowledge seize) to a kafka matter, then do knowledge ingestion of that dump utilizing local mode within the lightning instrument. Later, begin double writes from the service layer and apply all updates from the kafka matter.
    This technique was troublesome to implement because of sophisticated battle decision when making use of the cdc updates. Moreover, our dwelling grown instruments for capturing HBase cdc solely retailer the important thing. Therefore some improvement effort was additionally wanted.
  3. A substitute for the above technique is the place we learn the keys from cdc and retailer them in one other knowledge retailer. Later, after beginning the double writes to each tables, we learn their newest worth from the supply of reality (HBase) and write to TiDB. This technique was applied however had the chance of shedding the updates if the async path of storing keys by way of cdc had availability points.

After evaluating commerce offs of all methods, we determined to take the next method described in the remainder of this weblog.

Time period Definitions:

Consumer: A downstream service/library which talks to thrift service

Service: Thrift service which serves on-line site visitors; for the aim of this migration, it’s Ixia

MR Job: An software which is run on map scale back framework

Async write: The service returns an OK response to the shopper with out ready for a response from the database

Sync write: The service returns a response to the shopper solely after receiving a response from the database

Double write: The service writes to each underlying tables in both sync or async method

Data Migration Flow — briefly describes the steps explained in implementation section. Initial State, Migration phase, Validation and reconciliation phase.
Data migration flow — Briefly describes the steps explained in implementation section. Steps 4–7 — Reconcile the delta, Enable double sync writes to TiDB and HBase, Enable sync write to TiDB and async write to HBase and enable read from TiDB, Final State.

Implementation Particulars

Since HBase is schemaless and TiDB makes use of strict schema, earlier than this migration could be began, a schema must be designed containing right knowledge varieties and indexes. For the aim of this 4 TB desk, there may be 1:1 mapping between HBase and TiDB schemas. This implies the TiDB schema was designed by utilizing a map scale back job to investigate all columns in a hbase row and their most measurement. The queries have been then analyzed to create right indexes. Listed here are the steps:

  1. We used hbasesnapshotmanager to take HBase snapshot and retailer it as csv dump in s3. We saved the CSV rows as Base64 encoded to work round particular character limitations. Then we used TiDB lightning in local mode to begin ingesting this csv dump whereas doing base64 decoding earlier than storing the row in TiDB. As soon as ingestion is completed and the TiDB desk is on-line, begin async twin writes to TiDB. The async twin writes be sure that TiDB SLA doesn’t influence service SLA. Though we now have a monitoring / paging setup for TiDB, we stored TiDB within the shadow mode at the moment.
  2. Carry out snapshotting of HBase and TiDB desk utilizing a Map scale back Job. The rows have been first transformed into a standard object and saved as SequenceFiles in S3. We developed a customized TiDB Snapshot Supervisor utilizing MR Connector and used hbasesnapshotmanager for HBase.
  3. Learn these sequence information utilizing a map scale back job that writes the unrivaled rows again to s3.
  4. Learn these unmatched rows from s3, learn its newest worth from the service (backed by HBase), and write the worth to the secondary database (TiDB).
  5. Allow double sync writes so writes go to each HBase and TiDB. Run the reconcile job in step 3, 4 & 5 to check knowledge parity in TiDB and HBase day by day to get stats on knowledge mismatch between TiDB and HBase and reconcile them. The double sync write mechanism didn’t have a rollback in case write to 1 db fails. Therefore the reconciliation jobs must run periodically to make sure there isn’t any inconsistency.
  6. Preserve sync write to TiDB and allow async write to HBase. Allow reads from TiDB. At this stage the service SLA fully depends on the supply of TiDB. We hold async writes to HBase as a finest effort to take care of knowledge consistency in case we have to rollback once more.
  7. Cease writing to HBase fully and deprecate the HBase desk.
  1. Inconsistency situations because of backend unavailability
    The double writes framework constructed at Ixia service layer doesn’t rollback writes in case they partially fail because of unavailability of both database. This type of situation is taken care of by operating reconciliation jobs periodically to maintain each HBase & TiDB tables in sync. The first database, HBase is taken into account because the supply of reality when fixing such inconsistencies. This in follow means if a write had failed in HBase however succeeded in TiDB, in the course of the reconciliation course of, will probably be deleted from TiDB.
  2. Inconsistency situation because of race situation throughout double writes and reconciliation.
    There’s a risk of writing stale knowledge to TiDB if the occasions occur within the following sequence: (1)reconciliation job reads from HBase; (2) dwell write is written to HBase synchronously and TiDB asynchronously; (3) reconciliation job writes beforehand learn worth to TiDB.
    This class of points can also be resolved by operating reconciliation jobs a number of occasions as a result of after each run, the variety of such inconsistencies decreases considerably. In follow, the reconciliation job solely wanted to be run one time to attain 99.999% consistency between HBase and TiDB for the 4 TB desk serving 400 write QPS. This was verified by taking the dump of each HBase and TiDB tables a second time and evaluating its values. Throughout comparability of rows, we noticed 99.999% consistency for the tables.
  1. We noticed 3x-5x p99 latency discount for reads. The p99 question latency went down from 500 ms to 60ms for this use case.
  2. Learn after write consistency was achieved, which is one among our objectives for migrating use circumstances particularly from ixia.
  3. Less complicated structure by way of the variety of parts concerned as soon as the migration is full. This may drastically help when debugging manufacturing points.

In-house TiDB Deployment

Deploying TiDB within the Pinterest infrastructure has been an excellent studying expertise for us since we’re not utilizing TiUP (TiDB’s one cease deployment instrument). It is because numerous duties of TiUP overlap with inner pinterest techniques (for instance deployment techniques, operational tooling automation service, metrics pipelines, TLS certificates administration, and so forth.) and the price of bridging the hole between the 2 outweighed its advantages.

Therefore we keep our personal code repo of TiDB releases and have construct, launch, and deployment pipelines. There are numerous nuances on secure cluster administration, which we needed to be taught the exhausting means as TiUP takes care of it in any other case.

We now have our personal TiDB platform constructed on high of Pinterest’ AWS infrastructure the place we are able to do model upgrades, occasion sort upgrades, and cluster scaling operations with no downtime.

Information Ingestion

We bumped into a few points when doing the information ingestion and reconciliation listed under. Please be aware that we obtained full assist from Pingcap on each step. Now we have additionally contributed some patches to the TiDB codebase which have been merged upstream.

  1. TiDB lightning model 5.3.0 didn’t assist automated TLS certificates refresh, which was a tough downside to debug because of lack of related logs. Pinterest’s inner certificates administration service refreshes certificates each 12 hours, due to this fact we needed to undergo some failed ingestion jobs and work with pingcap to get it resolved. The function has since been launched within the TiDB 5.4.0 model.
  2. The native mode of lightning consumes numerous sources and impacted on-line site visitors on a separate desk being served from the identical cluster in the course of the knowledge ingestion part. Pingcap labored with us to supply quick time period and long run remediation of the Placement Guidelines so the duplicate serving on-line site visitors doesn’t get impacted by native mode.
  3. TiDB MR Connector wanted some scalability fixes to have the ability to snapshot a 4 TB desk in cheap time. The MR Connector additionally wanted some TLS enhancements, which have since been contributed and merged.

After tunings and fixes, we have been capable of ingest 4 TB of information in ~ 8 hours, and operating one spherical of reconciliation and verification took round seven hours.

Ixia

The desk we migrated as a part of this train is served by ixia. We bumped into a few reliability points with the async/sync double writes and question sample modifications. The problems on the thrift service (ixia) turned more durable to debug on account of the complicated distributed techniques structure of ixia itself. Please learn extra about it in our different weblog.

We want to thank all previous and current members of the Storage and Caching crew at Pinterest for his or her contributions in serving to us introduce one of many newest NewSQL applied sciences in Pinterest’s Storage stack.

We want to thank the Pingcap crew for continued assist, joint investigations and RCAs of the sophisticated points.

Lastly, we want to thank our purchasers for having immense persistence and exhibiting super assist as we do that enormous migration, one desk at a time.

To be taught extra about engineering at Pinterest, try the remainder of our Engineering Weblog, and go to our Pinterest Labs web site. To discover life at Pinterest, go to our Careers web page.