How to import a set of JSON files into RethinkDB

RethinkDB ships with utilities for doing imports and exports. There are two purposes in this; database backup and restore, and the import of new data.

Importing new data is probably a more interesting challenge, since you have to get your import process to map to what RethinkDB wants.

If you haven’t done this before, it requires a python script for RethinkDB:

apt-get install -y python-pip
pip install rethinkdb

Once you do this, you’ll need to define a database in RethinkDB – my example is a series of JSON exports from the Watson API so I’ve called this “Watson”.

This will let you import a single JSON file:

rethinkdb import -f \
  ./watson/transcript_s_TextGetEmotion_1608.json \
  --table Watson.transcript_s_TextGetEmotion

When you run this, it creates the table automatically. It seems to treat the file as a row (possibly because mine contains one object). If you import it again, you will need to use “–force” because it’s not sure how to reconcile it with the existing table. The “–force” option will put the new data in as new rows.

In my case I have a folder that has all the JSON files, named based on the originating ID and the API they are exporting.

Thus, to import an entire folder, I can do this:

cd watson

for f in *
do
  table=$(echo $f | sed "s/\(.*\)_[0-9]\+.json/\1/g")
  table=$(echo $table | sed "s/-/_/g")

  rethinkdb import -f $f --table Watson.$table --force
done

cd ..

Note that you can’t use “-” in a RethinkDB table name, so you’ll want to replace those with underscores if you have them in your source file names.

Leave a Reply

Your email address will not be published. Required fields are marked *