Database sharding with Active Record

26th February 2019

What is Database Sharding?

Database Sharding is a highly scalable approach for improving the throughput of large database-centric business applications. It is a type of database partitioning that separates very large databases into smaller, faster, easier-managed parts called data shards. Shard means “a small part of a whole”.

There are two ways of database partitioning: vertical and horizontal.

Vertical: “Involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.”

Horizontal: “Involves putting different rows into different tables. For example, customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.”

Unlike horizontal partitioning, database sharding partitions the problematic tables in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that the database load for the large partitioned table can now be split across multiple either logical or physical servers and not just multiple indexes on the same logical server.

Reasons to implement database sharding.

Scale application growth in database transactions and large volumes of data.
Allow easier data migration
Implement horizontal scalability for multi-tenant applications.

Why did I implement Database Sharding?

We needed an API where each client can access their database information. What does it mean? We have some enterprise clients and each client has its own database where they are saving all their information. There is an API that needs to consume that information but, as you can see, there are multiple database instances. After doing some research, I found that database sharding is a good approach for this case where the representation of each enterprise client is as a Company and each Company has its own information (database shard). LET’S DO IT!.

How to implement database sharding on Active Record.

First, take a look at the ar-octopus gem.

Octopus is a better way to do Database Sharding in ActiveRecord. Sharding allows using multiple databases in the same Rails application.

The API is designed to be as simple as possible. Octopus focuses on the end user by giving the power of multiple databases but with reliable code and flexibility. Octopus is compatible with Rails 4 and Rails 5.

Octopus supports:

Sharding (with multiple shards, and grouped shards).
Replication (Master/slave support, with multiple slaves).
Moving data between shards with migrations.

There are two ways of declaring your databases shards in your application:

Declare your database instances inside of config/shards.yml
Dynamic shard configuration. Octopus allows us to set shards using ruby code.

Advantages of dynamic shard configuration:

Configures the open connections on demand. NOTE: those connections will be handled by ActiveRecord connection pool.
It is flexible to setup with Ruby code.

Setting Up

Install the gem in your Gemfile.

Gemfile

source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.5.3'

gem 'rails', '~> 5.2.1'
gem 'pg', '>= 0.18', '< 2.0'
gem 'ar-octopus'

Add a migration to create Databases table to save shards information.
The API will have the databases credentials on its database and, in order to secure the information, I used this gem attr_encrypted to encrypt the credentials.

To generate the migration, run: *

bin/rails generate migration CreateDatabases

class CreateDatabases < ActiveRecord::Migration[5.2]
  def change
    create_table :databases do |t|
      t.string :encrypted_host
      t.string :encrypted_host_iv
      t.string :encrypted_password
      t.string :encrypted_password_iv
      t.string :encrypted_username
      t.string :encrypted_username_iv
      t.string :encrypted_name
      t.string :encrypted_name_iv

      t.timestamps
    end
  end
end

We need to add the Database model and this is how it looks like:

app/models/database.rb

# frozen_string_literal: true

class Database < ApplicationRecord
  attr_encrypted :username, 
    key: CredentialsReader.credentials[:api_token_encrypted]
  attr_encrypted :name, 
    key: CredentialsReader.credentials[:api_token_encrypted]
  attr_encrypted :password, 
    key: CredentialsReader.credentials[:api_token_encrypted]
  attr_encrypted :host, 
    key: CredentialsReader.credentials[:api_token_encrypted]
end

On this table, we are going to have all our shards credentials. Then, when we need to create a new shard connection, we are going to fetch the credentials from here.

To save the company information we need to add the Company model:

app/models/company.rb

class Company < ApplicationRecord
  has_one :database

  validates :name, uniqueness: true
  validates :name, presence: true
  validates :api_token, presence: true
end

This table is going to have both the company name and a relationship with a database.

The service to create the shards configuration dynamically is the next:

# frozen_string_literal: true

module Databases
  class ConnectionService
    attr_reader :database, :company_name

    def initialize(company)
      @company_name = company.name
      @database = company.database
    end

    def call
      setup_database_connection unless shard_setup?
    end

    private

    def shard_setup?
      environment_config = Octopus.config[Rails.env.to_sym]
      environment_config.present? && environment_config.key?(company_name)
    end

    def setup_database_connection
      Octopus.setup do |config|
        config.environments = [Rails.env.to_sym]

        config.shards = {
          company_name.downcase.to_sym => company_database_configs
        }
      end
    rescue StandardError => exception
      exception.message
    end

    def company_database_configs
      {
        adapter: 'postgresql',
        database: database.name,
        username: database.username,
        password: database.password,
        host: database.host,
        encoding: 'unicode'
      }
    end
  end
end

First, we need to pass the company object that is requesting information in the constructor and we are going to get their databases credentials. On the call method, we set up the connection in case it does not exist yet.

The connection has an identifier and, in this case, is the name of the company. With this identifier, we can ask for an existing connection or create a new one.

Now we can create our connection:

Screen-Shot-2019-02-26-at-4.07.25-PM

I previously added information to the Company table.

Now that we have the connection established, we can get the information that we want. In my case, I needed the Members information.

I added a Member model: this is important because ActiveRecord needs to manage that information through the model.

The member model class is next:

app/models/member.rb

class Member < ApplicationRecord
end

To access the correct shard, you need to use the established connection by the company key.

Screen-Shot-2019-02-26-at-4.22.27-PM

Now, we can create as many connections as we need and get information from different shards. In case you still do not have shards hosted on a server and you want to test it locally, there is a way to create more than one database in your Rails app.

In conclusion, database sharding is an option for horizontal scalability and Octopus is a complete, well-documented gem to implement database sharding.

About michelada.io

Tags

Database sharding with Active Record

Karen Ventura

View Comments

Karen Ventura

Share Article

About michelada.io