We implemented source data to target data translation by modelling target table structures through SQLAlchemy. Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse.
In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS. Interestingly enough, MongoDB stitch offers integration with AWS services. We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. One of the services offered by MongoDB Stitch is Stitch Triggers. It is the serverless platform from MongoDB. We chose MongoDB Stitch for picking up the changes in the source database. The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilientīased on the above criteria, we selected the following tools to perform the end to end data replication: The data replication must be near real-time, yet it should NOT impact the production database We set ourselves the following criteria for the optimal tool that would do this job: Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.