Stuff The Internet Says On Scalability For November 9th, 2018

Wake up! It's HighScalability time:

@b0rk

Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Know anyone looking for a simple book explaining the cloud? Then please recommend my well reviewed (30 reviews on Amazon and 72 on Goodreads!) book: Explain the Cloud Like I'm 10. They'll love it and you'll be their hero forever.

  • $3 billion: Tesla's yearly spend on gigafactories; 18,000: GDPR data breach notifications since May; 30: happy birthday Morris Worm!; 1/3: not opting for new Java; 10x: MySQL TPS improvement in 9 years; 1,300: childhood photos posted by parents by the time they're 13; 100TB: hard drives by 2025; 1000x: faster graphics performance than the original iPad released eight years ago; 13.28B: transistors in  world's first 7nm GPU; 15 million: daily Uber trips; $725 million: opening weekend for Redemption 2; 

  • Quotable Quotes:
    • @kylecyuan: 10/ From computational bio, cloud bio, and digital therapeutics, AI has put Bio on the Moore's Law curve, not Eroom's Law. From wet lab problems to dry lab ones. @vijaypande 's "When Software Eats Bio." This industrializes and 10x existing processes and creates new ones.
    • @davidgerard: It really won't, because it can't possibly scale. This is the key  problem with every musical blockchain initiative I've ever seen.  Proposals whose "blockchain" would need to add 1GB/hour, that sort of thing.
    • @matei_zaharia: Since we opened #DAWNBench deep learning benchmark rolling submissions, there have been some cool entires. You can now train CIFAR10 for just $0.06 and ImageNet for $12.60 (4x less than in April!): https://dawn.cs.stanford.edu/benchmark/ . MLPerf's deadline is also soon (http://mlperf.org ).
    • Paul Alcorn~ The 7nm Rome CPUs come with 64 physical Zen 2 cores, which equates to 128 threads per processor, double that of the first-gen Naples chips. In a two socket server, that equates to 128 physical cores and 256 threads in a single box. Rome is also the first PCIe 4.0 CPU, which offers double the bandwidth per channel. AMD has improved the Infinity Fabric. AMD is now using the second-gen Infinity Fabric to connect a multi-chip design with a 14nm I/O die serving as the linchpin of the design. That central chip ties together the 7nm CPU chiplets, creating a massively scalable architecture. Amazon Web Services, one of the world's largest cloud service providers, announced that, beginning today, it is offering new EPYC-powered cloud instances. The R5a, M5a and T3a instances purportedly will offer a 10% price-to-performance advantage over AWS's other cloud instances.
    • @tony_goodfellow: He's World of Warcraft's first and only conscientious objector—a neutral pandaren shaman who spends his time picking flowers and mining rocks instead of fighting and killing. ...after roughly 8000 .... Doubleagent has reached the maximum level of 110."
    • Brent Ozar: Tons of improvements for Azure SQL Data Warehouse: row level security, Accelerated Database Recovery, virtual networks. Sponsored comparisons say Redshift is more expensive and slower, and Rohan says the secret is local caching on SSDs. John Macintyre coming onstage to demo it. Processing over 1 trillion rows per second.
    • @vllry: #QConSF “Yes, I Test In Production (And So Do You)” with @mipsytipsy  The real world isn’t a strict binary between released and unreleased code. When we release code (quickly or not), we’re testing if it works for our real users.
    • @AlexSteffen: Wow. Seville, Spain eliminated 5,000 on-street parking spaces and built a "lightning" 80-kilometer protected bike grid, for just €32 million, in just 18 months. 70,000 trips a day, now. Rapid urbanism works.
    • Turing Institute: The academic name of this work, 'Chronotopic Cartographies', is based upon Russian theorist Mikhail Bakhtin's idea of the 'chronotope'; how configurations of time and space are represented in language and literature.
    • Paypal: When new requirements come along, developers face a choice: Should we create a new endpoint and have our clients make another request for that data? Or should we overload an existing endpoint with more data? Developers often choose the 2nd option because it’s easier to add another field and prevent a client-to-server round trip. Over time, this causes your API to become heavy, kludgy, and serve more than a single responsibility.
    • Stratechery: Now Apple is arguing that unit sales is the wrong way to understand its business, but refuses to provide the numbers that underlie the story it wants to tell. It is very fair for investors to be skeptical: both as to whether Apple can ever really be valued independently from device sales, and also whether the company, for all its fine rhetoric and stage presentations, is truly prioritizing what drives the revenue and profit instead of revenue and profit themselves. I do think the answer is the former; I just wish Apple would show it with its reporting.
    • Eric Budish: From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) — analogously to how a lock on a door increases the security of a house by more than the cost of the lock.
    • Andrew Diamond: Clojure, on the other hand, trusts the developer entirely and merely asks him to express his intent. This allows good developers to do really good work. Of course, it also allows bad developers [to] really make a mess. So as an organization using Clojure, you have to make a different choice. Instead of hiring potentially mediocre programmers and throwing them into an environment that polices them, you hire really good developers and trust them.
    • Dan Goodin: The domestic US traffic, in particular, “becomes an even more extreme example,” he told Ars. “When it gets to US-to-US traffic traveling through mainland China, it becomes a question of is this a malicious incident or is it accidental? It’s definitely concerning. I think people will be surprised to see that US-to-US traffic was sent through China Telecom for days.”
    • eastsideski: Used to work for Google, it's insane how difficult some of their codebase is to develop with. Google does amazing work on cutting-edge technology, but most of their products have the same problems that you'll find at any tech company.
    • @mweagle: “When we take the time to interact with our stakeholders, and are deliberate with our build vs. buy decisions, we’ve found that our outcomes have been very positive and generally well received by the teams we support.” 
    • vince~ The irrational push for performance over security has led to pretty much every high-end chip on the market to have horrible intractable security vulnerabilities. Computer architecture research is currently in a dark age that I hope some day it might escape from. It probably won't with the current leadership in place.
    • Memory Guy: It’s earnings call season, and we have heard of a slowing DRAM market and NAND flash price declines from Micron, SK hynix, Intel, and now Samsung.  DRAM pices have stopped increasing, and that can be viewed as a precursor to a price decline. Samsung’s 31 October, 2018 3Q18 earnings call vindicated Objective Analysis‘ forecast for a 2H18 downturn in memories that will take the rest of the semiconductor market with it.
    • David Rosenthal: Clearly, the increase in the supply of flash bits resulting from new fabs coming on-line and all the major suppliers transitioning to 96-layer 3D has coincided with a significant reduction in the rate of growth of demand for flash bits. To balance supply and demand, the price of flash bits has fallen, reducing the return on the investment in the new fabs and the 96-layer technology. This less-than-insatiable demand for bits of storage isn't confined to flash. 
    • Ted Kaminski: In the end, I think the async/await/promise approach gets us the same benefits as green threads, with a different mix of drawbacks that I think comes out in favor of the explicit asynchronous model. Pedagogically, I’m also in favor of actually exposing programmers to asynchrony. The confusion that arises from a wholly synchronous world-view is real. Especially since everything is actually asynchronous.
    • @asymco: The computing race was getting to 1 billion users in 20 years. Microsoft won. The mobile race was getting to 1 billion users in less than 10 years. Apple and Google won. The micromobility race is getting to 1 trillion rides in 5 years. Who will win?
    • Streak: And we’ve been very happy with how easy Cloud Datastore is to maintain. Between Google App Engine and Cloud Datastore, we’ve never had to have an explicit infrastructure on-call rotation
    • frew: yeah, no built-ins for graph operations in Spanner other than relational SQL for standard joins * the key thing that make this work are the ability to easily construct global indexes that aren't sharded by the primary key and reasonably fast joins between them * it's also helpful that Spanner does a reasonable job of parallelizing queries (e.g. a lot of times we'll get a 15x increase in speed vs. a sequential plan) * we then do the fan-out across the graph in our Java Spanner client - each distributed SQL index read takes ~10 ms so we can do multiple round trips of graph traversal in the client
    • ryankearney: It's insanely trivial to detect if someone is on Google Maps even if you can't see the hostname. Google maps loads dozens to hundreds of small map tiles. No other Google property exhibits this behavior. A simple analysis of packet size, traffic direction, and duration of traffic spikes would clue you in on what it was. Hell, virtually every "Next Generation" Firewall can detect Google Maps.
    • Ed Sperling: The real issue is how to keep those accelerators and processors primed and fully employed. Idle time costs power and area, and rapid startup and shutdown causes premature aging of digital circuitry, particularly at advanced nodes. Turning devices on and off seems like a good idea at 90 and 65nm, when scaling made multiple cores a necessity. But that rush of current is hard on digital circuitry, particularly at 16/14/10/7nm, where the dielectrics are getting thinner.
    • Andrew Moore: What you really need to be doing is working with a problem your customers have or your workers have. Just write down the solution you’d like to have; then work backwards and figure out what kind of automation might support this goal; then work back to whether there’s the data you need, and how you collect it.
    • biophysboy: As somebody whos been working in this field for a bit, both sequencing and microscopy are advancing in very exciting ways. So Its weird to see an article pitting one against the other. I do wonder how easy/costly it is to compare differences in single cells via sequencing. I realize there are methods for cell isolation using stuff like microfluidics. But a lot of these methods use elements of microscopy. My sense has always been that next-gen sequencing coupled with superres microscopy is the way forward.
    • Jeremy Daly: “The lesson for people who are new to lambda, is not to underestimate the coupling and downstream effects of unlimited scale.” The problem with the inherent scalability of serverless functions, is that most services they connect to are not as scalable. Whether you’re connecting to a database, managed service, or third-party API, there will be a bottleneck somewhere. This needs to be planned for and properly managed.
    • Peter: [Robert Pepperell] rejects the idea that the brain is essentially about information processing, suggesting instead that it processes energy. He rightly points to the differing ways in which the word ‘information’ is used, but if I understand correctly his chief objection is that information is abstract, whereas the processing of the brain deals in actuality; in the actualised difference of energy, in fact.
    • Netflix: Viewing data storage architecture has come a long way over the last few years. We evolved to using a pattern of live and compressed data with parallel reads for viewing data storage and have re-used that pattern for other time-series data storage needs within the team. Recently, we sharded our storage clusters to satisfy the unique needs of different use cases and have used the live and compressed data pattern for some of the clusters. We extended the live and compressed data movement pattern to move data between the age-sharded clusters.
    • tylerjwilk00: I may get hate for this but watching JavaScript development mature makes me feel like I am taking crazy pills. So much shit was figured out decades ago but it's like everyone is just learning this shit for the first time. I don't know if it is because the standard library is so lacking or that it's been historical a client side only language but it makes me feel completely disillusioned with programming. It's like we're not progressing at all, merely learning that same shit over and over again each generation in a series of cycles. Or maybe I'm just too old and need to get off my own lawn.

  • PayPal added GraphQL to their stack. What do they think about it? GraphQL: A success story for PayPal Checkout
    • "GraphQL has been a complete game changer to the way we think about data, fetch data and build applications." Why? REST creates too many roundtrips. When every roundtrip costs 700ms they add up fast. You can create a JSON API to return a higher granularity of data, but that coupled the client to your server. You can batch REST calls together, but the request structure is difficult so developers don't want to use it. 
    • Why is this important? This nails it: "When we took a closer look, we found that UI developers were spending less than 1/3 of their time actually building UI. The rest of that time spent was figuring out where and how to fetch data, filtering/mapping over that data and orchestrating many API calls. Sprinkle in some build/deploy overhead. Now, building UI is a nice-to-have or an afterthought." With GraphQL you only get the data you ask for and the fields you ask for. While there are no stats, they say the new GraphQL + React stack is much faster. 
    • A good contrast is this extensive thread: @kellabyte: GraphQL exists because JavaScript developers finally realized HTTP API’s were too limiting so they reinvented SQL over JSON because JavaScript developers are obsessed with reinventing everything into JSON API’s. @lilactown_: "SQL over JSON" is missing qualities of GraphQL: basically the "graph" part. Some domains are so much more easily expressible as a nested, recursive data structure that GraphQL makes a lot of sense. @Daniel15: It didn't start with JavaScript at all though. The GraphQL server implementation at Facebook is written in Hack, not JavaScript. Also the first client was the Facebook iOS app.

  • Videos from React Conf 2018 are now available

  • AWS reacts to the cries of the unbearable cost of API Gateway. They didn't lower prices overall, they introduced tiered pricing. At the highest tier you pay $1.51 per million requests. Seem high? Still, it might slow down the exodus to EC2. 

  • A good model of how to adopt ML. Scaling Machine Learning at Uber with Michelangelo. Uber started using ML in 2015. They now use it for: ranking models suggest restaurants and menu items; estimate meal arrival times; predict where rider demand and driver-partner availability will be at various places and times; automate or speed-up large parts of the process of responding to and resolving customer support issues; detect possible crashes; estimate arrival times; one click chat; self driving cars. They use a four layer organization: product engineering teams own the models they build and deploy in production; when product engineering teams encounter ML problems that stretch their abilities or resources, they can turn to an internal team of specialists for help; Specialists and product engineering teams often engage with Uber’s AI research group, AI Labs, to collaborate on problems and help guide the direction for future research; The Michelangelo Platform team builds and operates a general purpose ML workflow and toolset. 

  • The fastest code is code that is never written. The cheapest compute resources are those not running. Especially as teams grow larger and the product grows more complex, it's hard to track everything. cloud-nuke: how we reduced our AWS bill by ~85%. Cloud-nuke is a tool that periodically goes through AWS account and deletes all idle resources. They created separate AWS accounts for manual testing and automated testing. They then run cloud-nuke every 3 hours in the automated testing account and every 24 hours in the manual testing account. This cut monthly spending in half.

  • A Netflix Web Performance Case Study: Loading and Time-to-Interactive decreased by 50% (for the logged-out desktop homepage at Netflix.com); JavaScript bundle size reduced by 200kB by switching from React and other client-side libraries to vanilla JavaScript. React was still used server-side; Prefetching HTML, CSS and JavaScript (React) reduced Time-to-Interactive by 30% for future navigations. 

  • 40 billion social media posts, 750 Elasticsearch nodes, 50,000 shards. How do you distribute search and indexing workloads as evenly as possible? Using Linear Optimization modeling. Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster: Linear optimization (also called linear programming) is a method to achieve the best outcome, such as maximum profit or lowest cost, in a mathematical model whose requirements are represented by linear relationships. The optimization technique is based on a system of linear variables, some constraints that must be met, and an objective function that defines what success looks like. The goal of linear optimization is to find the values of the variables that minimizes the objective function while still respecting the constraints...Our cost function weighs together a number of different factors. For example, we want to minimize the variance in index and search workload, to reduce problems with hotspotting keeping the disk utilization variance as small as possible to achieve stable system operations minimize the number of shard movements in order to prevent shard relocation storms as explained above...In conclusion, all this allows our LP solver to produce good solutions within a few minutes, even for a cluster state of our huge size, and thus iteratively improve the cluster state towards optimality. And best of all, the workload variance and disk utilization converged as expected and this near optimal state has been maintained through the many intentional and unforeseen cluster state changes we’ve had since then!

  • Here are some Takeaways from ServerlessNYC 2018. Some videos are available.

  • Apple isn't alone in not reporting unit sales anymore. According to David Rosenthal: Seagate has stopped reporting their declining unit shipment numbers. They now only report Exabyte shipment numbers. For Apple, according to Horace Dediu, the reporting change signals a change from a device business to a services and customer retention business. As a customer you enter their ecosystem and Apple lands and expands into your life. 

  • Some videos SmashingConf New York 2018 are now available. There's also a set of gorgeous Conference Sketchnotes. 

  • Cut your bill by up to 80%. Embracing failures and cutting infrastructure costs: Spot instances in Kubernetes: Kubernetes was designed to abstract the size of nodes and to seamlessly move components between nodes. This makes it the perfect candidate to work with spot instances. A cluster built on top of spot instances will scarcely be less reliable than a cluster built on reserved virtual machines. When shopping for nodes for your Kubernetes cluster, reliability should not be your primary concern. You should focus on cheap memory and CPU! This echoes one of the fundamental principles at Google: You don’t need reliable hardware with good enough software!

  • Sharding Cash. Square did an odd thing, when it came time to shard their MySQL database they didn't build their own system. They turned to Vitess:"a distributed database running on top of multiple MySQL instances. If you have a single MySQL database, you can slide Vitess in between your app and MySQL and then split up your database while maintaining an illusion that it’s a single database. You speak SQL to Vitess, and Vitess routes that SQL to the correct MySQL shard." Since consistency is key for a money app, the didn't want cross shard transactions, so they turned to the idea of entity groups, and idea from Google App Engine. All members of an entity group would stay together over shard splits. They were able to complete a shard split with less than a second of downtime.

  • Here's a MICRO 2018 Summary

  • DDoS Mitigation Strategies. A new thing to worry about. DDoS attacks from within your corporate network. Bad guys get inside your network and take over devices that they use to generate internal traffic. How do you stop it? Everyone has invested mitigation resources at the ISP level, not from within their networks. You can't rely on bandwidth or throughput as a defensive measure because the attacker can always marshal more resources to fill up the pipe.

  • Loved this. Shredding Banksy's the Girl and Balloon - The Director’s Cut. In rehearsals the picture shredded completely every time, but only went halfway-through during the demo. Typical.

  • Quality is in the details. Backblaze’s Custom Data Center PDU

  • Interesting conversation about a surprisingly subtle topic. Why has Google moved from maps.google.com to google.com/maps? My assumption was because it's easier to change a web server than your DNS, but others are far more clever. RandyHoward: Is it possible they are trying to centralize the domain so that all their services can share data with each other? If their services share data across different domains it could be seen in the eyes of the law differently than if the services were sharing data on a single domain. 

  • Good overview of what was Overheard at Scylla Summit 2018. No videos, but lots of decks.

  • One experience binding us as an industry are interviewing horror stories. My Amazon Interview Horror Story

  • pytorch/FBGEMM (article): FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. The library provides efficient low-precision general matrix multiplication for small batch sizes and support for accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization. FBGEMM also exploits fusion opportunities in order to overcome the unique challenges of matrix multiplication at lower precision with bandwidth-bound 

  • SDPaxos: Building efficient semi-decentralized geo-replicated state machines: The motivation for this stems from the following observation. Single leader Paxos approach has a centralized leader and runs into performance bottleneck problems. On the other hand, the leaderless (or opportunistic multileader) approach is fully decentralized but suffers from the conflicting command problems. Taking a hybrid approach to capture the best of both worlds, SDPaxos makes the command-leaders to be decentralized (the closest replica can lead the command), but the ordering-leader (i.e., the sequencer) is still centralized/unique in the system.

  • Prio: Private, Robust, and Scalable Computation of Aggregate Statistics: This paper presents Prio, a privacy-preserving system for the collection of aggregate statistics. Each Prio client holds a private data value (e.g., its current location), and a small set of servers compute statistical functions over the values of all clients (e.g., the most popular location). As long as at least one server is honest, the Prio servers learn nearly nothing about the clients’ private data, except what they can infer from the aggregate statistics that the system computes. To protect functionality in the face of faulty or malicious clients, Prio uses secret-shared non-interactive proofs (SNIPs), a new cryptographic technique that yields a hundred-fold performance improvement over conventional zero-knowledge approaches. 

  • Fusing Modeling and Programming into Language-Oriented Programming Our Experiences with MPS: The paper discusses and illustrates language-oriented programming, the approach to {modeling| programming} we have successfully used over the last 7 years to build a range of innovative systems in domains such as insurance, healthcare, tax, engineering and consumer electronics. It relies on domain-specific languages, modular language extension, mixed notations, and in particular, the Jetbrains MPS language workbench.

  • The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition: This book describes warehouse-scale computers (WSCs), the computing platforms that power cloud computing and all the great web services we use every day. It discusses how these new systems treat the datacenter itself as one massive computer designed at warehouse scale, with hardware and software working in concert to deliver good levels of internet service performance. The book details the architecture of WSCs and covers the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. Each chapter contains multiple real-world examples, including detailed case studies and previously unpublished details of the infrastructure used to power Google's online services. Targeted at the architects and programmers of today's WSCs, this book provides a great foundation for those looking to innovate in this fascinating and important area, but the material will also be broadly interesting to those who just want to understand the infrastructure powering the internet.