The recently announced Uber’s blog post, entitled “Why Uber Engineering Switched from Postgres to MySQL”, is a huge issue at PostgreSQL communities, but also other developer communities. This post was described in detail for the reason why Uber switched its own storage system from PostgreSQL to MySQL. The famous PG contributors and bloggers are currently expressing their opinions, and the developers of Y Combinator Hackers News and reddit are also discussing it. Besides many threads are generated even at PG Mailing List.
Sharing its own technical details at the world-famous company such as Uber is the activity of great worth. Migrating from PostgreSQL to MySQL may be a negative on PG communities, but their reactions are stable. However, they are discussing variously since the comparison contents of Uber’s post based on PostgreSQL 9.2.
Several confusions and questions
Because the decision of Uber is highly influential, it gives an inaccurate impression to people that MySQL is better than PostgreSQL. However, they are the databases with a long history, and the several differences only exist as they have their own development concepts.
Database is a synthesis of each tradeoff, and the purpose for using each database and the choice of it by the workload are very important, except for the comparison. The cause of hot argument is that Uber has a nuance – either one is better between both databases, not reasonably technical comparison. There are people who ask whether PG is a real pool by reading Uber’s post. In other words, they didn’t understand the tradeoff of each database due to the nuance, so might misinterpret as a database is better than the other.
PostgreSQL 9.6 right now
We are not saying that there’s something wrong with the Uber’s posting. When Uber used PostgreSQL 9.2, it had the constraints on performance by unsupported features. But Uber didn’t mention how they tried an optimization, so misunderstanding – PostgreSQL is ineffective is not a good attitude just by Uber’s post. PostgreSQL is currently version 9.6 that has improved a lot, and now it is continuously evolving.
For example, Uber’s post described the tuple update of PostgreSQL and index structure for relatively large plenty of that. An increasing the write instruction by Multi-Version Concurrency Control (MVCC) model is acknowledged even at PG communities. However, all architectures have each tradeoff. The architecture of MySQL is worse for READ than PostgreSQL’s. Uber said that it isn’t a big overhead, but READ operations may cause difference situationally.
In addition, Uber absolutely didn’t mention HOT tuple optimizing the write instruction, while it exactly described a collection of organized tuples. Therefore, Uber’s workload had a high proportion of updates even if Uber used HOT tuple, but it can’t control the performance of updates if not a heavy workload like Uber.
Point out the replica
Indications of replicas are also due to the increasing phenomenon of the write instruction. That is, WAL log and bandwidth are increased, so replication became more difficult in a situation limiting the network bandwidth such as a replica between data centers. Uber said that they are using the indexes, and WAL log for replica will increase as numerous as indexes.
As stated above, the matters pointed out by Uber mostly occur in a significant of Uber data processing architecture’s workload. Thus, it’s wrong to judge the database regardless of specific situation. Uber considered that PostgreSQL’s architecture is inefficient since it was using several indexes per a table and the replica between data centers. Of course, if it used an optimization of PostgreSQL, the inefficiency would be mitigated to some degree.
Uber’s stance and Schemaless
The system really mentioned by Uber is back-end storage of the system called Schemaless and it used MySQL/InnoDB. Uber needed the expandable system depending on the development speed of business and embodied in Schemaless. So, as Markus Winand points out, the post should present that MySQL/InnoDB was a better choice as back-end storage of Schemaless. This is considered in the same vein as appearance of NoSQL due to the limits of expandability of existing relational databases.
As mentioned earlier, the reactions of PG communities are pretty temperate. They acknowledge PostgreSQL’s shortcoming and are discussing new features. So we could hardly see that they point out Uber’s post or have the heated dispute. The users of Hacker News value their responses and finding the direction of development will make PostgreSQL to better database. If Uber shared and consulted the situation with PG communities before switched to MySQL, it might find a more highly available solution also in PostgreSQL.
 Why Uber Engineering Switched From Postgres To MySQL, https://eng.uber.com/mysql-migration/
 Robert Hass (EnterpriseDB), http://rhaas.blogspot.com/2016/08/ubers-move-away-from-postgresql.html
 Markus Winand, http://use-the-index-luke.com/blog/2016-07-29/on-ubers-choice-of-databases
 Simon Riggs (2ndQuadrant), http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-limitations/
 Hackers News https://news.ycombinator.com/item?id=12201353
 Hackers News https://news.ycombinator.com/item?id=12166585
 Hackers News https://news.ycombinator.com/item?id=12216680
 Reddit https://www.reddit.com/r/programming/comments/4uph84/why_uber_engineering_switched_from_postgres_to/
 PG hackers http://postgresql.nabble.com/Why-we-lost-Uber-as-a-user-td5913417.html
 PG advocacy http://postgresql.nabble.com/Uber-moving-towards-MySQL-td5913472.html
 Uber Schemaless https://eng.uber.com/schemaless-part-one/
BITNINE GLOBAL INC., THE COMPANY SPECIALIZING IN GRAPH DATABASE
비트나인, 그래프 데이터베이스 전문 기업