Configuring the RavenDB SQL replication bundle

As part of a data visualisation project for a software application, I've had to work recently on extracting unstructured NoSQL data from RavenDB and building an integrated HTML dashboard to provide history and statistics to the user base.

Initially I'd built an SSIS package that used HTML requests to page through JSON served up by a custom set of indexes I'd created, but in the run up to an upgrade to RavenDB 2.5 (previous production version had ropey replication support at best), I've been able to start working with the SQL replication bundle to provide a far more scalable solution, and due to the lack of up-to-date documentation I wanted to put some of my experiences out there to help anyone else going through the same process.

For the purposes of this article, I've put together a simplistic walkthrough.  In production we'd obviously want to analyse our documents and design & build the SQL schema prior to configuring RavenDB, but these steps should take you through setting this up in a lab environment and build a base to work from.

First - there are two ways to enable replication for a database in RavenDB.  The first is to create a database with SQL replication selected as an option (either on a new database or by backing up, deleting and then restoring/recreating) - shown below.  The second, by adding ";SqlReplication" to the "Raven/ActiveBundles" key in the database definition document in the system database.

Enable RavenDB SQL replication

Once you've got replication enabled for your database, we're going to setup a couple of simple documents to base our SQL schema and replication scripts on.  When replicating a JSON document to SQL tables, the root JSON elements go into a parent table (here we're mapping the users collection to the Users table in SQL) with any nested JSON arrays being replicated out to their own tables.

Creating a demo document

Switching over to SQL Server Management Studio, we're going to create a test database and build a users table to take the top level JSON.  It's important here to make sure you define a primary key on any table you're replicating RavenDB data to.  On a few occasions, I've restarted the Raven service and without a primary key, documents can be duplicated instead of being overwritten.

Creating replication table

Next, we need to create a table to replicate our nested JSON array to.  Again, we need to make sure we have a unique primary key on the table but it's very important to make sure you don't create a foreign key relationship.  After monitoring the RavenDB logs and SQL Profiler, I noticed when a change is made to a document Raven doesn't issue an update statement - it deletes the row from the parent SQL table and attempts to recreate it, resulting in a constraint failure.

Creating nested table

Once we've created our SQL tables, the next step is to setup a replication configuration.  This is done by going to the settings menu while in your database context and selecting the SQL replication option, which will bring you to the screen below.

First, choose a name for the replication configuration and then add your connection string, select the source collection you want to replicate and add the SQL table - document key mapping for the parent table to the SQL replication tables.

Configuring replication

Next, you need to build the replication script which specifies which data we want to replicate.  First we're defining a variable to hold the elements of each document and within the variable, we're defining the SQL column to JSON attribute mappings in the format "{SQL column}:{RavenDB attribute}".  We're using the documentId (maps to the document key we defined previously) as our SQL primary key, followed by mappings for each subsequent attribute.

Once the variable has been correctly defined, we're using "replicateTo{SQL Table}({variable});" to replicate the document to SQL Server.  It's important to note that on versions of RavenDB with the SQL replication bundle prior to 2.5, the SqlReplicate function was used to replicate the data to SQL but this has changed for 2.5 to using the replicateTo function.

Replication script

Once we've defined the replication script for the top level JSON, we need to deal with splitting out the nested array into our child table.  

To accomplish this, we need to add another SQL table to document key mapping for our child table and add the second half of the script to step through our JSON array and replicate each member along with the documentId to our table.  

Nested JSON replication

Once we've completed and saved the configuration, that's it (well almost).  SQL replication is now active however, RavenDB will only replicate changes to the existing documents.  In order to force a replication of the entire collection, open the Raven/SqlReplication/Status document from in the system documents collection for your database, and select any members of the "LastReplicatedEtags" array and delete them.  Once the array is empty and the document saved, RavenDB will replicate the entire collection to your SQL tables.

Forcing replication with LastReplicatedEtags

Head over to management studio and you can see the replicated data from our users collection in our SQL tables.

Replicated data from RavenDB in SQL Server

To prove our replication is fully working, we can open one of our documents and make a change.

Browse contents

And the change is instantly replicated to our SQL database!

Browse contents

Although you can accomplish a degree of data visualisation by connecting dashboard objects to JSON data generated by a RavenDB index, a relational data warehouse is essential in order to build proper reporting content.  By using the SQL replication bundle, we can keep our staging tables constantly updated while avoiding any replication from indexes or overly complex custom ETL tasks.

I hope this guide will be helpful to anyone else setting this up.  Please feel free to comment below if you have any questions - Andy Knight.

Ask Us a Question