The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Fun fact - earlier in Uber's history we'd actually moved from MySQL to Postgres before switching back for good, & though we published the article in Summer 2016 we haven't looked back since: In essence, it was due to a variety of limitations of Postgres at the time. Our most popular (& controversial!) article to date on the Uber Engineering blog in 3+ yrs. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).Īnd then after that, we've just gotten a ton of value out of postgres. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days! See more With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy. We deployed this micro-service as AWS Lambda with Zappa. We implemented source data to target data translation by modelling target table structures through SQLAlchemy. Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS. Interestingly enough, MongoDB stitch offers integration with AWS services. We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. One of the services offered by MongoDB Stitch is Stitch Triggers. It is the serverless platform from MongoDB. We chose MongoDB Stitch for picking up the changes in the source database. The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilientīased on the above criteria, we selected the following tools to perform the end to end data replication: The data replication must be near real-time, yet it should NOT impact the production database We set ourselves the following criteria for the optimal tool that would do this job: Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |