Stuff The Internet Says On Scalability For June 29th, 2018

Hey, it's HighScalability time:

Rockets. They're big. You won't believe how really really big they are. (Corridor Crew)

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

  • 200TB: GitLab Git data; $100 Billion: Instagram; ~250k: Walmart peak events per second; 10x: data from upgraded Large Hadron Collider; .3mm: smallest computer; 9.9 million: spam or automated accounts identified by Twitter per week; 1 million: facial image training set; 1/3: industrial robots installed in China; 24%: never backup; 7 billion: BuzzFeed monthly page views; 

  • Quotable Quotes:
    • @jason: would love to do a https://www.founder.university/  for Immigrants -- but we might need to do it in Canada or Mexico, so that, umm.... potential immigrants can actually attend! #america 
    • @kellabyte: LOL at racking up an AWS bill of $140,000 in 4 hours of compute time.
    • @kellabyte: Recently I got to work on a project that really stressed Amazon AWS scalability. You want to talk scale? We spun up a cluster of 100,000 AWS instances multiple times. 2+ million CPU cores. I got to work on something so big a cloud provider learned about their scale bottlenecks
    • jedberg: Despite being a strong advocate for AWS, this is where I will say Google completely outshines Amazon. Google's approach to pricing is, "do it as efficiently and quickly as possible, and we'll make sure that's the cheapest option". AWS's approach is more, "help us do capacity planning and we'll let you get a price break for it.". Google applies bulk discounts after the fact, AWS makes you ask for them ahead of time.
    • Mark Lapedus: Costs of developing a complex chip could run as high as $1.5B, while power/performance benefits are likely to decrease.
    • Quirky: Thus there is a tradeoff: separateness enables inventors to create heterodox ideas, but strong cohesive networks are likely to be better for getting them implemented.
    • karmakaze: The key difference is that NoSQL was being developed and used to solve specific issues with using traditional databases. They were implementations of solutions to problems. Blockchain is a solution in search of problems.
    • slivym: FPGAs in industry are used for a very small number of specific applications: Smart NICs, Early stages of wireless networks (5G whilst the standards are being hammered out), military (where you need high performance with no consideration of cost), and embedded, Prof Video (where the custom I/O is essential). Generally, unless you're doing something that fits those applications well, the FPGA will not look good, and there are the same mistakes made in research time after time. For data centre these are twice as bad.
    • @troyhunt: We [Cloudflare] peaked at 44M requests to @reporturi in an hour yesterday. The busiest single minute I saw was 949k requests or an *average* of 16k requests per second for that minute. 
    • NSA: Right now, almost all NSA’s mission is being done in [IC GovCloud], and the productivity gains and the speed at which our analysts are able to put together insights and work higher-level problems has been really amazing.
    • Robert Graham: This is the most pressing lesson organizations need to learn, the one they are ignoring. They need to do more to prevent desktops from infecting each other, such as through port-isolation/microsegmentation. They need to control the spread of administrative credentials within the organization. A lot of organizations put the same local admin account on every workstation which makes the spread of NotPetya style worms trivial. They need to reevaluate trust relationships between domains, so that the admin of one can't infect the others.
    • smidgie82: This article totally ignores the devops side of Docker. Sure, you can cobble together Docker-like process isolation using namespaces and cgroups, and you can run the process using a custom set of libraries using chroot -- though I definitely don't agree that for the average developer that approach is anywhere near as easy as Docker.
    • Valeriy Kravchuk: Partitioning bugs do not get proper attention from Oracle engineers. We see bugs with wrong status and even a bug with a clear test case and a duplicate that is "Open" for 4 years. Some typical use cases are affected badly, and still no fixes (even though since 5.7 we have native partitioning in InnoDB and changing implementation gave good chance to review and either fix or re-check these bugs).
    • Werner Vogels: Just as they are no longer writing monolithic applications, developers also are no longer using a single database for all use cases in an application—they are using many databases. Though the relational database remains alive and well, and is still well suited for many use cases, purpose-built databases for key-value, document, graph, in-memory, and search uses cases can help you optimize for functionality, performance, and scale and—more importantly—your customers' experience. 
    • Steven Sinofsky: Intel was always focused on getting the ecosystem to rally around what would make the most money, but just as important was managing what chips could be used where. Ultrabooks, Tablets, Netbooks — these categories are specifically designed around price points of chips.
    • @JoeEmison: I’ve written three profitable (all <$1M ARR) fully serverless applications; now on my fourth.
    • Burt Helm: So how has HelloFresh managed to defy the fate of its competitors? Richter explains the startup's strategy as he would one of its recipes: His team finds the target customer, busy families (by examining data); fine-tunes the efficiency of the marketing efforts (by collecting lots of data); improves the quality of the recipes (using insights drawn from, yep, more data); expands the range of the offerings to cater to even more customer segments (thanks to even more insights and even more data). But in reality, HelloFresh's history is far more complicated than his just-add-data-and-stir formula. It involves hundreds of millions of venture capital dollars, aggressive marketing tactics, health department complaints, threats of violence, hard drugs, dumb money, and, at its center, founders who have been so relentlessly focused on growth, they've barely stopped to consider what happens when they actually win.
    • Broad Band: As the cultural theorist Sadie Plant writes so elegantly, “When computers were vast systems of transistors and valves which needed to be coaxed into action, it was women who turned them on. When computers became the miniaturized circuits of silicon chips, it was women who assembled them . . . when computers were virtually real machines, women wrote the software on which they ran. And when computer was a term applied to flesh and blood workers, the bodies which composed them were female.”
    • Weston Twigg: We project NAND industry contract pricing to decline roughly 8% sequentially in the C2Q, yet Micron noted a mid- to upper single-digit percentage increase in its F3Q NAND ASP. Supporting the uptick was the shift to higher-valued products, such as SSDs and mobile managed NAND. DRAM remains the main driver for MU. DRAM accounted for 71% of revenue in the F3Q, with DRAM non-GAAP GM hitting 69%, the highest we can ever remember. We project DRAM industry bit supply growth of just 21% in both 2018 and 2019; if annual demand growth remains in the 20-25% range, pricing should remain relatively robust
    • UofM: One of the big challenges in making a computer about 1/10th the size of IBM’s was how to run at very low power when the system packaging had to be transparent. The light from the base station—and from the device’s own transmission LED—can induce currents in its tiny circuits. In addition to the RAM and photovoltaics, the new computing devices have processors and wireless transmitters and receivers. Because they are too small to have conventional radio antennae, they receive and transmit data with visible light. A base station provides light for power and programming, and it receives the data.
    • Geoff Huston: A lot of the Internet today looks much the same as the Internet of a decade ago. Much of the Internet’s infrastructure has stubbornly resisted various efforts to engender change. We are still in the middle of the process to transition the Internet to IPv6, which was the case a decade ago. We are still trying to improve the resilience of the Internet to various attack vectors, which was the case a decade ago. We are still grappling with various efforts to provide defined quality of service in the network, which was the case a decade ago. It seems that the rapid pace of technical change in the 1990’s and early 2000’s has simply run out of momentum and it seems that the dominant activity on the Internet over the past decade was consolidation rather than continued technical evolution.

  • Software 2.0 is not code based, it's dataset based. The problem is the toolchain for software 2.0 doesn't exist yet. Building the Software 2.0 Stack Andrej Karpathy (Tesla).
    • "In the new paradigm, much of the attention of a developer shifts from designing an explicit algorithm to curating large, varied, and clean datasets, which indirectly influence the code."
    • Programmers don't write code anymore, labelers write the code. Code is implied by the optimization which is based on the dataset. 
    • Your goal in life as a software 2.0 programmer is to accumulate a very large, clean dataset with lots of edge cases for all the different tasks you want to do. Massaging data is really where most of the action is. 
    • What does programming the 2.0 stack look like? You need to find lots of training data and label it, especially negative examples, like going through a tunnel so your windshield wipers know not to turn on in a tunnel. So, programming is still scutwork. 
    • What might a 2.0 IDE do? Show a full inventory/stats of the current dataset; create/edit annotation layers for any datapoint; Flag, escalate & resolve discrepancies in multiple labels; Flag & escalate datapoints likely to be mislabeled; Display predictions on an arbitrary set of test datapoints; Autosuggest dataset points that should be labeled.

  • How much faster is bare metal compared to virtualization? ScyllaDB: AWS has recently made available a new instance type, i3.metal, that provides direct access to the bare metal without paying the price of the virtualization layer. Since Scylla can scale up linearly as the boxes grow larger it is perfectly capable of using the extra resources made available for the users of i3.metal. We showed in this article that despite having 12% more CPUs, i3.metal can sustain a 31% higher write throughput and up to 8x lower read latencies than i3.16xlarge, confirming our expectation that removing the hypervisor from the picture can also improve the efficiency of the resources. With its increased efficiency, i3.metal offers the best hardware on AWS for I/O intensive applications. 

  • Upgrading to Node v8 has significantly reduced our operating costs: We’ve bet on well supported open source projects like Google’s V8. Following an upgrade from Node.js v6 to v8, this bet has paid off. Our latencies are more consistent and our global infrastructure server costs have gone down by almost 40%.

  • Cloud adoption is mature enough we're starting to see not just moves to the cloud, but moves from one cloud to another as experience reveals requirements and alliances form and reform. No, it's not about Microsoft acquiring GitHub. It's all about them thinking Kubernetes is the future. GitLab + Google Cloud Platform = simplified, scalable deployment: With increasing adoption of cloud native practices, the use of microservices and containers has become critical to modern software development. Kubernetes has emerged as the first choice for container orchestration, allowing apps to scale elastically from a couple of users to millions. Today, we’re happy to announce we've been collaborating with Google to make Kubernetes easy to set up on GitLab. andrewl-hn: The plan to move to Google Cloud was in motion for many months, way before talks about a potential GitHub acquisition started. They adopted Kubernetes relatively early, and as they progressed their reliance on Azure-specific services went down. At that point move to another cloud was a purely financial decision. The move to k8s was not strictly a means to allow cloud migration. GitLab sells their Enteprise product, and the Kubernetes-based deployment helps customers with product trial and adoption. Cloud migration is a welcomed side-effect of that initiative.

  • How much faster is a hash join compared to a nested loop join? A lot. How We Made Joins 23 Thousand Times Faster, Part One. And execution time increases linearly instead of quadratically.

  • Facebook with their 2018 Networking @Scale recap. Topics include: Networking Across 3 Decades, Scaling the Facebook backbone through Zero Touch Provisioning, Optics Scaling Challenges, Load Balancing at Hyperscale, Edge Fabric, Network Challenges in Gaming. 

  • Phil Zimmermann (PGP) on Triangulation. The sandboxing tech on your phone may be your most secure execution environment. The threat model has changed. Now it's easier to steal keys. It's hard to protect against exfiltration. You have to solve a problem. Solve that problem and that product can take over. Skype solved NAT. Signal solved the what happens when Alice and Bob aren’t online at the same time. He wants to solve the ease of use problem.

  • AWS has a lot of quality documentation available on kindle. And it's free.

  • An almost unavoidable habit of being programmer is seeing programming as a metaphor for pretty much everything. This time it's cars and car production. Asymco Episode 44: The view from Tokyo with Bertel Schmitt
    • Tesla isn't disruptive because it's not entering at a disruptively low price point. All tech revolutions start with a product at a vastly lower price. In contrast, India's $4000 car is nearing disruptive status. I kind of expected Horace to bring up Apple, which is disruptive from the top, by giving consumers a user experience they can't get anywhere else. Apple seems to be doing OK. 
    • In China a car plant costs about $2 billion. A BMW car plant in South Carolina cost $9 billion and took 8 years to build. Individual datacenters are cheaper, but you need a complex of multiple datacenters in a region, so the costs aren't all that dissimilar. 
    • Cars are produced on 1km long production lines. They snake around and through buildings. Software is often organized into pipelines. Data flows through the pipeline, is transformed at each station, and at the end you have your shiny new widget. 
    • A line has a production capacity. It can produce only so many cars per hour. A standard rate might be something like 45 cars per hour. Some produce a car per minute. Software pipelines also have a limited capacity.
    • Toyota has a small, lightweight plant in Miyagi Japan capable of producing 250,000 cars per year. Small smaller plants, in contrast to Tesla's giant plants, are considered the future of car production. The interesting thing about the line in this plant is that it can be scaled up and down—it's elastic. Need more capacity? Add more platforms. It takes a weekend. Need less capacity? Remove platforms. This way you don't have idle capacity. Sounds a bit like cloud computing.
    • Fewer robots is a trend in the industry. The 80s was the height of robotization. It's not working anymore. Robots can be used for certain things like welding. This is a difference. Software is always driving relentlessly towards more and more automation.
    • If you're smart as many as 8 car brands can be created on the same line. Same with software. A cluster can run all kinds of jobs.
    • A car plant takes 4, 5, sometimes 6 years to build. A modern datacenter takes six to nine months to build. This, BTW, is why Bertel thinks Tesla won't be able to meet their production goals. They can't make existing lines produce radically more cars and Tesla hasn't been building new car plants.

  • Capacity planning in the cloud? Yes, if you want to use reserved instances. Stripe shows how they do it. Effectively using AWS Reserved Instances
    • At Stripe, we typically use no-upfront convertible reserved instances with a three-year term. We think this offers the right trade-off between price efficiency and flexibility. To adopt reserved instances, you first need to estimate your cluster’s total compute requirements. This is the hardest part of capacity planning. Take a snapshot of your fleet using the AWS cost and usage report. Add up the total compute power for each instance family. Pick a standard instance size. Divide the total compute capacity by its scaling factor. The result is the number of reserved instances you’ll purchase.
    • Included is the SQL query they use to analyze and suggest purchases over time as the fleet dynamically scales up and down in compute requirements. Reserved instances are purchased once a month. 
    • Good discussion on Hacker News exploring the different pricing policies between GCP and AWS. Many like GCP better becasue Google will give you discounts based on what you actually use. Amazon makes you decide up front, but AWS discounts are bigger because the user is taking all the risk of correctly planning their capacity.

  • At which point does connection pooling improve performance? Scaling PostgreSQL with PgBouncer: When running sysbench-tpcc with only 56 concurrent clients the use of direct connections to PostgreSQL provided a throughput 2.5 times higher than that obtained when using PgBouncer. The use of a connection pooler in this case was extremely detrimental to performance...When running the benchmark with 150 concurrent clients, however, we start seeing the benefits of employing a connection pooler... throughput becomes comparable to when not using a connection pooler once the number of concurrent threads is greater than the number of available CPUs. Also, Gracefully Scaling to 10k PostgreSQL Connections for $35/mo, Part One.

  • Traditionally, writing a technical book is a great way to make pennies per hour. After an ungodly amount writing and revising, you're lucky to sell 2,000 copies. The Economics of Writing a Technical Book captures the glory of the process quite well. On the other hand, jashmenn talks about making $400k in revenue for an Angular 2 book. Pick a hot topic, market well, and have your own email list—you can succeed. But that's not common. Lots of tech authors can be found commiserating in the comments. I bring this up because John Resig has a new book out called The GraphQL Guide (which I've bought) that has very interesting business model. John is not an author virgin. Ten long years ago he wrote Programming Book Profits. He's had some time to figure out how to make this work. What did he come up with? The GraphQL Guide has 5 pricing levels. Like Patreon and Kickstarter, you get more at higher pricing tiers. For $39 you get a reduced version of the completed book. For $89 you get the full book with extra chapters, free updates for 2 years, interactive exercises in-browser, and a few videos. For $289 add git repositories with source code, tech support, access to a slack community, and a t-shirt. For $749 add training with advanced topics like SSR. For $1000 you get a team license for 5 seats. This is smart. Segment your readership so you can extract as much money as possible. You aren't selling a book. You're selling developing expertise in a subject. That can take many forms and involve many services. The downside is I'm kind of butt hurt over not getting source code. Unbundling code verges on bad faith IMHO. I'm also puzzling over the question of what is a book? anymore. How can you be getting a book for $39 when there are more chapters on the next tier? Requiring Github authentication is an extra useless step that I don't care for. But, I think this is an interesting way of trying to monetize the immense amount of work that goes into creating quality technical content. And I like how the money goes to the content producers, not middlemen who had little value. I'd imagine the next step is some sort of subscription bundle. For $300/yr you get everything we produce and we promise to produce this, that, and the other thing. The book is just one part of a bundle. For good or ill, none of this kind of stuff was possible before the internet.

  • 8th-Gen Intel Core i7 CPUs Are Quite a Bit Faster. Go for the article, stay for the always entertaining AMD vs. Intel throwdown in the comments. 

  • JavaScript PCI nightmare: Ticketmaster, Inbenta and the canary in the coal mine: "According to Inbenta, back in February 2018 somebody altered the Ticketmaster JavaScript maliciously." Tesla also blames problems someone hacking their software. You have to wonder at the software release processes. Isn't code reviewed before going into production? Isn't code signed so it can be verified that the right code is always running? Aren't servers scanned for problems? Aren't tests run to check for malicious behaviour? 

  • Another Perl monolith bites the dust. How BuzzFeed Migrated from a Perl Monolith to 500 Go and Python Microservices. Why move off Perl? Scaling was becoming hard. And apparently Perl programmers are dying off and it's hard to find them anymore. They also, of course, want to iterate faster. At a high level the new architecture: a CDN (Fastly) that points to a routing service (NGINX) which sits inside AWS using the ECS containerised service. New microservices are developed using Python as the main language with Go for the more performance sensitive components.

  • Need an algorithm? Take a look in The Arcane Algorithm Archive.

  • Facebook replaced the storage system for more than a billion Messenger users, which improved system resiliency, reduced latency, and decreased storage consumption by 90 percent. Migrating Messenger storage to optimize performance. What did they do? Redesigned and simplified the data schema, created a new source-of-truth index from existing data, and made consistent invariants to ensure that all data is formatted correctly. Moved from HBase to MyRocks. Moved from storing the database on spinning disks to flash on Lightning Server SKU (A flexible NVMe JBOF). Applied Zstandard, a state-of-the-art lossless data compression algorithm. No longer bound on I/O in HBase, read latency is now 50 times lower than in the previous system. 

  • Filed in one of the non-obvious ideas of all time category. Adaptive Backoff Algorithms for Multiple Access: A History: The use of random access to share a broadcast channel was first proposed by Norman Abramson for the ALOHA System (AFIPS Conf. Proceedings, 1970). He carried out his analysis under the assumption that both new and retransmitted packets arrive according to a Poisson process.  This assumption abstracted away the need for a backoff algorithm.

  • Quick-to-scale architecture for task-based processing: The approach we ended up taking makes uses of the stream platform, AWS Kinesis Streams, as a queue for the top level tasks to be computed. The tasks themselves scale up with one level of Lambda Cascade. The use of the streams for queuing tasks has several important advantages: All of the needed tasks can be generated right at the beginning without the need to await for available computational limits (e.g., due to Lambda scaling limitation). Each Kinesis shard has one top-level Lambda function active at a time; as soon as that function is complete, the next task from the queue in this shard can be picked up. Tasks can be load-balanced (albeit statically) by manually sharding them. For collecting the output of tasks we also use Kinesis Data Streams, allowing reliable, highly scalable collection of data. The data can easily then be permanently stored in a database, without a need for managing any EC2 servers. Furthermore, the Kinesis Data Analytics can be used to collate and present the results in real-time; this is often a key requirement in quick-to-scale architectures as the end-user is obviously awaiting for those in order to make time-critical business decisions.

  • How would you design a new database system optimized for the hardware we have today? Daniel Lemire summarizes a conference he went to on just that subject. Data processing on modern hardware: You can try to offload some of the computation to the graphics processor (GPU)...The problem, in general, with heterogeneous data processing systems is that you must, somehow, somewhen, move the data from one place to the other...There is expensive fast storage and cheaper and slower storage. How do you decide where to invest your money?...There is much talk about FPGAs: programmable hardware that can be more power efficient than generic processors at some tasks...There is talk of using “blockchains” for distributed databases...Cloud databases are a big deal...Google has fancy tensor processors...People want more specialized silicon, deeper pipelines, more memory requests in-flight. It is unclear whether vendors like Intel are willing to provide any of it. There was some talk about going toward Risc-V.

  • According to DigitalOcean containers are reaching a tipping point. 49% of developers use containers now and 78% plan to use them. Serverless? Don't worry about that. Serverless computing is in a much earlier stage of adoption, with nearly half of developers reporting they don’t have a clear understanding of what it even is. Only a third of developers who are familiar with serverless have actually deployed an application in a serverless environment. So, don't worry about serverless. These are not the droids you are looking for.

  • Walmart analyses a ~70k events per second on an average event click stream using a data pipeline with Lambda Architecture using Spark/Spark Streaming: Building the data pipeline for A/B testing with the lambda architecture using Spark helped us to have quick view of the data/metrics that get generated with a streaming job...Using Spark/Spark Streaming helped us to write the business logic functions once...The performance of the applications were improved by tuning Spark’s serialization, memory parameters, increasing the number of cores and parallelism iteratively.

  • biokoda/actordb: ActorDB is a distributed SQL database...with the scalability of a KV store, while keeping the query capabilities of a relational database. ActorDB is ideal as a server side database for apps. Think of running a large mail service, dropbox, evernote, etc.

  • michael-kehoe/awesome-sre-cheatsheets: A curated list of cheatsheets for SRE.

  • Accelerating Machine Learning Inference with Probabilistic Predicates: For an expert in database systems, the key innovation is in extending a cost-based query optimizer to explore many possible necessary conditions of the true query predicate. Moreover, since our probabilistic predicates are tunable to explore different performance points on the precision-recall curve, the query optimizer extension also chooses the parameters of individual PPs such that the PP combination meets the desired objective. For example, the PP combination can be tuned to maximize benefit (ratio of fraction of input dropped to the cost of the PP combination) while meeting an accuracy threshold.

  • Microsoft Research Open Data: A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences. Download or copy directly to a cloud-based Data Science Virtual Machine for a seamless development experience.

  • The Software Heritage archive: Our long term goal is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it. The Software Heritage archive is growing over time as we crawl new source code from software projects and development forges.

  • ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications: In practice, database transactions frequently execute under weak isolation that exposes programs to a range of concurrency anomalies, and programmers may fail to correctly employ transactions. While low transaction volumes mask many potential concurrency-related errors under normal operation, determined adversaries can exploit them programmatically for fun and profit.