Migrate from MongoDB to HarperDB





Update, 10/06/2020: HarperDB has built a simple and easy to use migration tool within the studio for MongoDB that you can access directly here.

MongoDB is probably the most prolific NoSQL database in existence. They set the stage for a data revolution and we at HarperDB are eternally grateful for that. Yet, there are certain cases where MongoDB has outgrown its purpose or you’ve simply decided it’s time to try something new. Maybe you saw our benchmark and wanted to give us a try! This blog will walk you through a few different ways of quickly getting your data out of MongoDB and into HarperDB.

If you run into any issues, I recommend joining our community Slack channel! We have a great community of users and HarperDB employees who are always around to help. You can join here.

Let’s get our data moving!

The mongoexport Tool

MongoDB comes equipped with a CLI (command-line interface) used for exporting data. It’s great when databases make it as easy to pull data out as it is to import data in. We have the export_local operation that offers similar functionality, but I digress. We can use Mongo’s tool as an easy and effective way to get your data out and ready for import into HarperDB. 

Their docs are pretty substantial, so if you’re trying to get fancy I highly recommend you check them out here: mongoexport docs. I’ve distilled it down to the simplest options below. The mongoexport tool allows you to export in JSON or CSV format, and I’ll cover both of these valid options for importing into HarperDB.

In these examples, I ran everything on my local MongoDB instance with a database called example_db and a collection called example_collection. You’ll see those used throughout. If you are using a cloud instance, you can specify the --host parameter.

JSON Export

HarperDB speaks JSON natively, so it makes sense to start here. The following command exports my collection to a JSON file:

mongoexport --db=example_db --collection=example_collection --out=output.json --jsonFormat=relaxed --jsonArray --pretty

Not too bad, but if you’re using MongoDB ObjectIDs you end up with some funky formatting. For example, here’s an sample of a person object that I exported from MongoDB using this command:

{
    "_id": {
        "$oid": "5ed00fda3b0e2a0fe9e8ceeb"
    }, 
    "first": "Ella",
    "last": "Davies",
    "birthdate": "1990-12-16T03:37:22.469Z",
    "age": 29
}

Now, if you don’t care about persisting the ObjectIDs, you can import this into HarperDB as-is, but with a unique hash_attribute that isn’t included in the objects, HarperDB will generate a unique GUID for each row. I’d prefer to keep my IDs around, so I wrote a little Python script to simplify them. All this does is flatten the ObjectIDs into a single field value called _id.

import json
#This script assumes all files are in the same directory
with open('input.json') as json_file:       #Specify input file path
    data = json.load(json_file)
    for p in data:
        p["_id"] = p["_id"]["$oid"]         #Change if not using ObjectIDs
with open('output.json', 'w') as outfile:   #Specify output file path
    json.dump(data, outfile, indent=4)

Now that we have that file, we can go ahead and wrap the output JSON array into the records attribute of the HarperDB insert operation. The output from the above Python script can be found there.

{
    "operation": "insert",
    "schema": "example_schema",
    "table": "example_table",
    "records": [
        {
            "_id": "5ed00fda3b0e2a0fe9e8ceeb",
            "first": "Ella",
            "last": "Davies",
            "birthdate": "1990-12-16T03:37:22.469Z",
            "age": 29
        }
    ]
}

Make the API call to HarperDB and we’re done!

CSV Export

The mongoexport tool also allows for CSV exports. This may be a better choice if you have some weirder shaped data that isn’t cooperating the way you prefer with the JSON export. This is a more applicable option if you have more tabular shaped data. The CSV export requires you to specify the fields you would like exported. The following command exports my collection to a CSV file:

mongoexport --db=example_db --collection=example_collection --out=output.csv --type=csv --fields="_id,first,last,birthdate,age"

This CSV can be directly imported into HarperDB with the csv_file_load operation:

{
     "operation":"csv_file_load",
     "schema":"example_schema",
     "table":"example_table",
     "file_path":"/<your_path>/output.csv"
}

The csv_file_load operation will return a job_id since HarperDB runs the import in the background. Depending on how much data you’re importing, you’ll want to check on the status of the job using the get_job operation. Once the job is complete, the response will include a message attribute with something like this: successfully loaded 1000 of 1000 records.

The Node-RED Approach

If you’re looking for a more visual approach to migrating data, you can throw together a quick Node-RED flow. For those of you unfamiliar with Node-RED, check out an old blog of mine here: https://harperdb.io/blog/were-big-fans-of-node-red/. I use Node-RED for a ton of various data migrations, formatting, tests, etc. For this you’ll need to install Node-RED as well as the MongoDB and HarperDB nodes, you can get all of them at the following links.

Node-RED: https://nodered.org/docs/getting-started/

MongoDB Node: https://flows.nodered.org/node/node-red-node-mongodb

HarperDB Node: https://flows.nodered.org/node/node-red-contrib-harperdb

The flow itself is pretty simple: pull data from MongoDB, put it in HarperDB. It’s as easy as configuring the database connections. If you’d like to do some data mapping or formatting, you can do that in between the two nodes. Here’s a look at the flow: 

Once you have everything installed, you can import this flow yourself with the following JSON. Import from Clipboard and just paste it in. You’ll need to configure your MongoDB and HarperDB credentials, but that’s it! Deploy and activate the inject node at the beginning and let your data flow!

MongoDB to HarperDB Node-RED Flow

[{"id":"75e01793.e86b58","type":"mongodb in","z":"80a3f60.f8ca208","mongodb":"ef5aad6a.795af","name":"","collection":"example_collection","operation":"find","x":390,"y":180,"wires":[["a55b6bc8.e30318","65b37b8b.98a584"]]},{"id":"a55b6bc8.e30318","type":"debug","z":"80a3f60.f8ca208","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":410,"y":120,"wires":[]},{"id":"8f939a3.281ae68","type":"inject","z":"80a3f60.f8ca208","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":120,"y":180,"wires":[["75e01793.e86b58"]]},{"id":"65b37b8b.98a584","type":"harperdb","z":"80a3f60.f8ca208","harperdb":"95971046.b5c78","name":"HarperDB_Connection","schema":"example_schema","table":"example_table","hash_attribute":"","hash_values":"","search_attribute":"","search_value":"","get_attributes":"*","operation":"insert","sql":"","fixed_statement":"","x":740,"y":180,"wires":[["8ee0e430.eee088"]]},{"id":"8ee0e430.eee088","type":"debug","z":"80a3f60.f8ca208","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":730,"y":120,"wires":[]},{"id":"ef5aad6a.795af","type":"mongodb","z":"","hostname":"127.0.0.1","port":"27017","db":"example_db","name":"MongoDB"},{"id":"95971046.b5c78","type":"harperdb setup","z":"","hostname":"localhost","port":"9925","name":"localhost"}]