Stuff The Internet Says On Scalability For July 27th, 2018

Hey, it's HighScalability time:

Startup opportunity? Space Garbage Collection Service. 18,000+ known Near-Earth Objects. (NASA)

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

  • 143 billion: daily words Google Translated; 73%: less face-to-face interaction in open offices; 10 billion: Uber trips; 131M: data breach by Exactis; $123 billion: Facebook value loss is 4 Twitters and 7 snapchats; $9.1B: spent on digital gaming across all platforms; 20-km: width of lake on mars; 1 billion: Google Drive users; $32.7 billion: Alphabet revenues; $110bn: Microsoft total revenue; $1.9 million: buy the Brady Bunch house; 5.623 trillion: Amazon Sable requests handle on prime day; 91%: Facebook's advertising revenue on mobile; $18 billion: deep learning market by 2024; 

  • Quotable Quotes:
    • Ryan Cash: The App Store has changed the world so drastically it’s hard to even imagine sometimes. In some ways, the world feels kind of the same as it did 10 years ago. But only for a second. In almost every single way, with almost every single thing we do, the iPhone and the App Store have changed how we live as humans. It’s changed how we communicate, how we share, and how we express ourselves artistically. It’s changed how we travel, do business, and how we eat. It’s made us healthier, wiser, and a little goofier. It truly kickstarted a massive global revolution that, while even 10 years in, feels like just the beginning.
    • yaypie~ I've often wondered what the computing world would look like today if Apple had bought Be. Somewhere out there is a parallel universe where BeOS, rather than OpenStep, became the basis for Apple's new OS. Would it have been able to compete with Windows? Without macOS's BSD underpinnings, would it have been as popular with developers as Mac OS X was? I wonder. 
    • Peter: DARPA foresee a third one in which context-based programs are able to explain and justify their own reasoning.
    • @chrismunns: "hey Ops, we're launching next week, can we make sure we can handle 1000 #serverless function invocations?" Ops: "Sorry, 3-5 month lead time on DC hardware and our switches are near capacity" - coming soon to an on-prem "serverless" project near you.
    • Aaron Frank: In the world of real estate, as Brad Inman puts it, “the company has gone viral.” Incredibly, this growth is largely the result of eXp Realty’s use of an online virtual world similar to Second Life. That means every employee, contractor, and the thousands of agents who work at the company show up to work—team meetings, training seminars, onboarding sessions—all inside a virtual reality campus.
    • Lance Gutteridge: The reason why almost no one encrypts their databases is one of the dirty secrets of IT.
    • @AndreaPessino: It's finally happening - after >30 years of pro use, 20 of which quite reluctantly, I am officially DONE with C/C++. Only maintenance from now on, everything new will be in @rustlang. THANKS Rust team for refining modern concepts into such a practical, elegant system. I love it.
    • Peter J. Denning: These analyses show that the conditions exist at all three levels [chip, system, and adopting community] of the computing ecosystem to sustain exponential growth. They support the optimism of many engineers that many additional years of exponential growth are likely. Moore's Law was sustained for five decades. Exponential growth is likely to be sustained for many more.
    • George: I propose that there is one problem chief among them, an impetus for bad software from which almost all of the others take root: imaginary problems.
    • SteveNuts: For every internet action there's an equal and opposite overreaction.
    • mailsharath: Its difficult for the average phone user to be convinced to spend money on a device that would cost considerably less if purchased outside the country. For example, someone was telling me (I haven't personally validated) that its cheaper to take a flight to Bangkok, Thailand and pickup a Macbook Pro 15' 2018 and get it back than buy it in a local store. The same thing holds true for iPhones as well. A lot of people get phones from friends / family traveling to other countries as its cheaper to buy there.
    • @asynchio: Trading choice for speed is scary to me.
    • @dovyp: Twitter is migrating 300 Petabytes of data to GCP. Holy crap. #GoogleNext18
    • @copyconstruct: to scale infinitely you need to avoid coordination - meaning each individual unit of the system should be able to make decisions based on local knowledge. It follows that you cant do distributed transactions - which is a form of coordination. - @SeanTAllen at @papers_we_love
    • @copyconstruct: At a previous job, we decided against doing K8s. A disgruntled engineer spent the entirety of their tenure being grumpy about this decision. It takes well over a million dollars just in engineer salary to get K8s up and running from scratch. And you still might not get there.
    • @bitfield: The software to create the black hole in the movie 'Interstellar' is a full implementation of Einstein's equations in 40,000 lines of C++, and rendered thousands of 23-megapixel IMAX frames on a 32,000-core render farm at about 20 core-hours per frame 
    • @elazarl: QOTD by @ncweaver: "Shoving garbage into append-only ledger, doesn't solve your problems!" (I looked at some blockchain-based solutions to a problem, and never understood how does the blockchain even helps a bit, read a few use cases)
    • spullara: I once asked Gordon Moore what the software equivalent of Moore's law was, he responded without pause: "the number of bugs doubles every 18 months".
    • ebay: Our results are very promising: an important ML task that took more than 40 days to run on our in-house systems completed in just four days on a fraction of a TPUv2 Pod, a 10X reduction in training time. This is a game changer—the dramatic increase in training speed not only allows us to iterate faster but also allows us to avoid large up-front capital expenditures.
    • evv: Recently I've been fascinated at gap between server providers these days. It seems each provider is either: - Enterprise public cloud, AWS/GCP/Azure, expensive but scalable and enterprise friendly - Developer public cloud, Linode/DO, cheap and easy to use. Although I say that AWS/GCP/etc are expensive, they obviously have negotiable prices for large customers. I doubt the smaller providers do that. But it makes me wonder why people use AWS/GCP when the other providers are so much cheaper. How do Linode/DO offer such good prices? Would they kick me off if I actually maximized the server capacity they offer, like a shared cPanel host would do, back in the day?
    • Tim Wagner (AWS serverless chief): If you have traded any stocks, or had any stocks traded in your behalf, FINRA processes those stock trades at the end of the closing day using Lambda, so there’s a big chance here that the trade you made was evaluated and validated by FINRA using Lambda. Thomson Reuters does four thousand transactions every second with it, Fannie Mae runs its 20 million mortgage calculations through there.
    • Stewart Brand: Information wants to be free... is a paradox that keeps driving people mad.
    • Tim Wagner (AWS serverless chief): There’s [a serverless] adoption pattern we often see, an initial DevOps adoption, maybe people will begin running serverless cron [scheduled task] jobs, and then it will move into more back-office situation, and then eventually the more mission-critical systems.
    • Lyft: In our old worldview, models were simply extensions of business rules. As our models grew larger and features more complicated, model and feature definition inconsistencies between offline prototypes in Jupyter Notebooks and the production service occurred more frequently
    • @dmeconis: I love how TV shows about the supernatural treat Latin like it's a deeply occult ancient language. Instead of a language so pervasive and easily learnable that it was used for the dreariest clerical data. Imagine summoning a demon by reading formulas off an Excel spreadsheet.
    • Yu Liu: Our work suggests that there is no reason to believe that ϕ or any other specific number will characterise self-replicating systems.
    • @AnimeshSingh: OpenWhisk + Istio + Kubernetes = KNative #Serverless. Get going with Serverless platforms on Kubernetes!
    • Mustachioed Copy Cat: Okay kids, here's an easy rule for when you want to build a desktop. I know there are a lot of different choices, and Intel has made things super confusing, possibly because they need to match every exotic manufacturing failure with a particular SKU.
    • Memory Guy: The table below, explained in another Memory Guy blog post, gives estimates of best-case endurance for the cells in the XPoint memory in Optane SSDs.  In other words, with a sophisticated enough controller, good DRAM buffering, and overprovisioning, all of which are techniques commonly used to extend the life of the media in a NAND flash SSD, the cell lifetime could be significantly lower than that shown in the last column of the table and the SSD would still provide the specified endurance. 
    • Alex Stamos (Facebook's chief security officer): We need to change the metrics we measure and the goals we shoot for. We need to adjust PSC to reward not shipping when that is the wiser decision. We need to think adversarially in every process, product and engineering decision we make. We need to build a user experience that conveys honesty and respect, not one optimized to get people to click yes to giving us more access. We need to intentionally not collect data where possible, and to keep it only as long as we are using it to serve people. We need to find and stop adversaries who will be copying the playbook they saw in 2016. We need to listen to people (including internally) when they tell us a feature is creepy or point out a negative impact we are having in the world. We need to deprioritze short-term growth and revenue and to explain to Wall Street why that is ok. We need to be willing to pick sides when there are clear moral or humanitarian issues. And we need to be open, honest and transparent about challenges and what we are doing to fix them.
    • vaxboy: I totally agree with Steve about Gassée, convinced there was an Amiga 1000 in the basement of One Infinite Loop fueling all of that period's innovations (GfxBase->QuickDrawGX, ARexx->AppleScript, Speech->PlainTalk, etc..) sustaining an unbelievable amount of politics much to Bill Gates' delight.
    • @BenedictEvans: Everything bad that the internet did to media companies is going to happpen to retail.
    • @kellabyte: IMO NoSQL didn’t disrupt SQL databases much. After a long journey we reminded ourselves what SQL databases are good at and started to move some things back. 
      New persistent memory technologies however will disrupt their designs significantly and some won’t adapt well. 
    • @ben11kehoe: #k8s is such a red herring. It's penny wise, pound foolish—doing a very good job of fixing pain points in today's paradigm, missing the bigger picture of shifting to a service-full architecture model
    • Backblaze~ The overall failure rate for all hard drives in service is 1.80%. This is the lowest we have ever achieved, besting the previous low of 1.84% from Q1 2018. When you constrain for drive count and average age, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much.
    • Smalldatum: In summary, the potential problems with index+log are bad cache-amp (index must fit in memory), bad CPU write-amp (too many compares/insert) and bad IO read-amp for range queries. Note that HashKV might avoid the first two problems. But all index structures have potential problems and for some workloads index+log is a great choice.
    • @jgargis: Did you know Target is running on Google Cloud?!? Bullseye!! #GoogleNEXT18
    • @dzimine: I think we see a fork on the road (or split in the cloud?) #GCP takes their #k8s strategy to the end - private, enterprise, paying lip service to #serverless and #PaaS. AWS betting on  #serverless and PaaS, paying lip service to #k8s
    • @jwittich: Big things at #GoogleNext18 for @intel and @googlecloud Strategic Alliance. Building off 15 year partnership to bring Intel Optane DC Persistent Memory to GCP. Expanded VM resource sizing to drive key biz results via @SAP HANA. #iamintel
    • @krishnan: AWS used to be the company not playing nice with partners. GKE On-Prem shows that Google is taking the same path. Google just threw their partnership with Red Hat, Pivotal, etc. under the bus.
    • CockroachDB: matches over 99% of Aurora’s throughput, and has only 10 to 20% of the price. We think it’s safe to say that on the TPC-C 1k warehouse benchmark
    • tedinski: risk = sum([x.value * x.cost_factor  for x in system])
    • @swardley: It feels like I am watching the MBA / Enterprise IT transformation of Google into IBM / VMWare. It lacks the killer instinct and the focus on the future that I see within Amazon. Why the timid approach to serverless? Why not even mention it in the keynotes? I feel genuinely sad.
    • @GordyPls: I legit just saw an 8 year old at the school get their phone confiscated and they waited until the coast was clear, pulled an iPad mini from a schoolbag pocket, retrieved a sim from a ziplock bag, installed it, then resumed their conversation. 
    • SatvikBeri: According to Adam D'Angelo (first CTO of Facebook, obviously a biased source) Facebook was able to implement features much faster than MySpace, which was a major competitive advantage for them. "I remember talking to an engineer who worked at MySpace who told me about how they had this huge list of regular expressions to try to prevent cross site scripting attacks, and whenever there was a new one they would make a new regex to try to fix it, rather than sanitizing html the right way."

  • Fascinating story of how decisions are made at the top. It's about the tech, but not in the way you might think. It's more about tech bro relationships than the actual tech. The secret call to Andy Grove that may have helped Apple buy NeXT: I always wonder. Was it really about the CEOs? Or were there technical superstars we don’t know about who figured out how to make processors faster without overheating? That’s something I loved about Steve Jobs: he would ask who on planet earth is the best person to pull off something impossible and he would do anything to hire them.

  • It may seem like a slam dunk now, but the decision to create Apple stores was controversial. Everyone was against it, yet Steve Jobs pushed it through. He wanted the stores to drive demand—a problem he experienced at NextStep.
    • cmacaskill: My opinion after working for [Steve Jobs] (and writing this story) is he couldn't see obvious things everyone else could see, but he could see things no one else could. I fought with him over stores as did virtually everyone on the board of Apple, and it turned out he was right. Thank God he was stubborn enough to go forward with them. We all said it drove Gateway out of business, yada. 
    • Steve Jobs (NeXT Computer Corp) - Sloan Distinguished Speaker Series: There are some things I can't talk about here. In addition to that, if you look at how we sell our computers right now, we have a sales force in the US of about 130 professionals in the field out selling NeXT computers. They spend 90% of their time selling NeXTSTEP software, and then 10% of their time selling the hardware. In other words, if they can get the customer to buy into NeXTSTEP, then they're going to sell the hardware, because right now we have the only hardware it runs on. So they are out there selling NeXTSTEP right now. And this is what is required to launch a new innovative product. The current distribution channels for the computer industry over the last several years have lost their ability to create demand. They can fulfill demand, but they can't create it. If a new product comes out, you're lucky if you can find somebody at the computer store that even knows how to demo it. So the more innovative the product is, the more revolutionary it is and not just an incremental improvement, the more you're stuck. Because the existing channel is only fulfilling demand. Matter of fact, it's getting so bad, that it's getting wiped out, because there are more efficient channels to fulfill demand, like the telephone and Federal Express. So we're seeing the channel become condensed on its way to I think just telebusiness. So how does one bring innovation to the marketplace? We believe the only way we know how to do it right now is with the direct sales force, out there in front of customers showing them the products in the environment of their own problems, and discussing how those problems can be mated with these solutions.

  • With cloud based edge computing we've entered a kind of weird mushy mixed centralized/decentralized architecture phase. Amazon let's you put EC2 instances at the edge. Microsoft has Azure IoT Edge. Google has Cloud IoT Edge and GKE On-Prem and Edge TPU. The general idea is you pay cloud providers to put their machines on your premises and let them manage what they can. You aren't paying other people to manage you're own equipment, the equipment isn't even yours. Outsourcing with a twist. Since the prefix de- means "away from" and centr- means "middle", maybe postcentralization, as in after the middle, would be a good term for it? There are more things in heaven and earth, Developer, than are dreamt of in your cloud. 

  • Even Amazon can fail when they enter new uncharted levels of scale. Internal documents show how Amazon scrambled to fix Prime Day glitches. Auto-scaling failed early on Prime day. So Amazon used load shedding strategies like installing a lightweight landing page and cutting off international traffic. They added servers manually to meet traffic demand. When the front page was restored they only let 25% of traffic through. And they had a 300 people emergency conference call to figure out how to respond. Love to be an Echo on that wall. 

  • When Facebook designed their new configuration system they didn't want to use ZooKeeper again. Why? They wanted to support 100MB sized configuration items and ZooKeeper couples a data store with its distribution framework. They wanted to separate the data storage and distribution components so they could size and scale each independently. How? Proxies are organized into a distribution tree, which is essentially a well-organized peer-to-peer network. Rather than relying on serving the entire data tree as part of one distribution tree, they split it into smaller distribution trees, each of which serves some part of the data tree. Result? Having a separate control and data flow plane enables each distributor to handle around 40,000 subscribers; ZooKeeper handled around 2,500 subscribers. Location-Aware Distribution: Configuring servers at scale

  • Vacation Tracker ended up making their internal project idea a product, but they first thought of using serverless as a way to make a vacation tracking tool just for themselves. All those project tools you need and would like to build, but don't have the bandwidth to create, would be a lot easier to build on serverless. The architecture would also be there when employees leave. That's always the problem with tools. Usually a tool is championed by one or two people and when they leave the tool goes entropic. Maybe tools have a longer shelf-life in serverless? Maybe? How your startup can benefit from serverless. Also, Why Do the Biggest Companies Keep Getting Bigger? It’s How They Spend on Tech.

  • People might be surprised how much of hardware sounding stuff is actually controlled by software hidden deep down in the stack. Apple releases software fix for MacBook Pro slowdown: "Apple on Tuesday acknowledged that the slowdowns exist—and that they’re caused by a bug in the thermal management software of all the 2018 MacBook Pro models." Apple's new firmware algorithm smooths out clock speeds and CPU temperatures. It's likely in the future rather than ship harcoded firmware we'll see machine learning personalize these algorithms. Given every unit is slightly different do to the manufacturing process and every user has their own CPU/GPU/memory/networking/disk usage profile, it would make sense to optimize system performance by personalizing software controlled processes—like thermal management—through machine learning.

  • Gil Tene dug up some good Intel data: As many of you may know, the behavior of CPU speed on various Intel processor varies with workload. This often makes questions like "which chip is fastest for my workload" and "does AVX-512 help or hurt performance" not a clear thing to answer. It can also make measuring behavior on one configuration (e.g. AWS c5 or m5 instances or equivalent Azure or GCE instances, all of which are easy to get your hands on) and projecting from that to what other setups will see (e.g. a Xeon Gold 6146 or Xeon Silver 4116) mind-bogging-ly hard...After digging around quite a bit to try and find data on how frequencies react to instruction mixes and active thread counts on various Skylake models, I found the following VERY handy document, which I figured others on this list may find useful: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf. An example interesting conclusion is that while the higher frequency parts like the Xeon Gold 6146 (165W TDP 3.2GHz base with 4.2GHz Max Turbo boost) may be the fastest when no AVX/AVX2/AVX-512 instructions are involved, the highest end Platinum 8180 (with its 205W TDP, 2.5GHz base, 3.8Ghz Max Turbo Boost) is actually same-or-faster across the board when AVX/AVX2/AVX512 instructions are involved. And it's larger L3 caches won't hurt to have around either. For people asking the "which processor is the fastest I can get"? question, this can change the answer quite a bit, since at least some form of AVX instructions is often/typically interleaved in most workloads (they are built into most optimized memcpy implementations, in java object allocation, etc.). Of course, the price point is a bit different too (e.g. https://ark.intel.com/compare/120495,120481,120496,124942), but if speed is what you really care about...

  • Thoughtful Go advice. Go for Industrial Programming: I’ve been doing this best practice series for six years now, and while a few tips have come and gone, especially in response to emerging idioms and patterns, what’s remarkable is really how little the foundational knowledge required to be an effective Go programmer has changed in that time. By and large, we aren’t chasing design trends. We have a language and ecosystem that’s been remarkably stable, and I’m sure I don’t just speak for myself when I say I really appreciate that.

  • Choice rules the day. With the cloud is still too expense, no convergence is in sight. What tech stacks are indie hackers using for their apps, and why? 1996: Python for the code, Postgres for the backend, Markdown for the frontend. Because I am too lazy to reinvent the wheel, and most of that is good enough. The servers are on Debian VMs on Azure. nickjj: I'm writing it in Elixir and Phoenix (Elixir's goto web framework) with PostgreSQL to back my data. For the front end, I'm using good old fashioned server side rendered templates and Turbolinks with sprinkles of Javascript where needed. jimmy1: Go and, honestly, sqlite with backups written to S3. It's the absolute cheapest. I can run multiple apps on a t2.nano (t2.micro if I am feeling fancy). My apps cost something like $1.50 to run a month, and they can easily handle medium-sized traffic, plus Go is just so dead simple to deploy. Klonoar: I actually wound up building most of my current project in Rust, on top of actix-web. raarts: My stack: Gitlab CI/CD, deploying using Ansible to Docker Swarm, running Keycloak for auth, RabbitMQ for messaging, Postgres, Elixir/Phoenix for the API server (GraphQL), Apollo + React Native for frontend and mobile apps. Why? For me it's best of breed vs simplicity. 

  • Nice accessible explanation of homomorphic encryption. bo1024: You encrypt some data and send it to Bob. Bob does some computations on the encrypted data and sends you the (still-encrypted) results. You decrypt the results to get the answer of your computation. Bob never learns what your data is or what the results are. The term "homomorphic" roughly refers to the fact that the encrypt/decrypt functions go "outside" the computation. That is, if Bob is applying the function f, we have f(Encrypt(data)) = Encrypt(f(data)). The left side is what Bob does, the right side is what you want to get (because you can decrypt it).

  • It's not really free, but it is a good overview of pricing for various AWS services and what you can expect to pay for a very limited application. The Free Stack - Running your application for free on AWS. As you might expect a chorus of why use the cloud instead of this or that cheaper service broke out on HNibudiallo: Only last month my two $5 droplets handled 5 million web requests from a viral post. cutler: For 5 Euros per month a cloud VPS at Hetzner.com. comes with 2 CPUs and 4GB RAM. That's 8 or maybe 16 times the RAM that Facebook probably launched with plus extra CPU and SSD disk speed. All these cobbled-together free tiers may be slightly cheaper but the added complexity I can do without. It just seems like tech for tech's sake. malchow: I read this, and I think: wow, that's extraordinarily expensive. The stack he designs is $20/mo for 1,000 user sessions/day. Back in the days of cPanel/LAMP shared hosting, you'd have similar capability for $5/mo. mr_toad: The cheapest AWS EC2 reserved instance is about the same $/month as the cheapest DO instance, both can be used for a small website. You’ll pay more for on-demand pricing. swebs: I wouldn't consider VPS providers part of "the cloud". You simply rent a (virtual) server. With AWS and the like, you pay for their automation of services. You don't need to manually deploy load balancing, CDNs, DDoS mitigation, security hardening, and the like. The big pitch is that you're paying a bit more in order to phase out your IT team. kayoone:  I run a small Golang Api on DO for $5, using DynamoDB and some Cloudflare Caching (free) and it can handle quite a lot of traffic. simon_weber: I recently went through this, though with the (arbitrary) goal of running for free indefinitely. This is tougher, since you can't use API Gateway and need to throttle your dynamo operations. SQS ended up being the key to making this work: everything is asynchronous, and the js SDK is used to enqueue messages directly from the frontend.

  • Lyft driver fraud is a thing and they use neural net driven fingerprinting to track it down with 40% better results. Fingerprinting fraudulent behavior: Our behavior fingerprinting neural network is implemented as a stack of the embedding layer, ConvNet, and RNN in that order on Tensorflow through the Keras interface. We concatenate the RNN’s output with the structured features and pass it through fully-connected layers that returns a softmax multi-class output that determines the probability assigned to each possible fraud user segment.

  • Optimizing database queries is often at the heart of performance tuning. Do I Have A Query Problem Or An Index Problem? Think QTIP. Query Plan. Text of the query. Indexes used. Parameters used.

  • Awesome deep dive on how to optimize a puzzle solver with way too many states to fit in memory. Optimizing a breadth-first search: That concludes the things I learned from this project that seem generally applicable to other brute force search problems. These tricks combined to get the hardest puzzles of the game from an effective memory footprint of 50-100GB to 500MB, and degrading gracefully if the problem exceeds available memory and spills to disk. It is also 50% faster than a naive hash table based state deduplication even for puzzles that fit into memory.

  • Google has released a new book in their SRE series: The Site Reliability Workbook. It's free until August 23rd.

  • tensorflow/models/astronet: This directory contains TensorFlow models and data processing code for identifying exoplanets in astrophysical light curves. 

  • google/go-cloud (article): A library and tools for open cloud development in Go. The Go Cloud Project is an initiative that will allow application developers to seamlessly deploy cloud applications on any combination of cloud providers. It does this by providing stable, idiomatic interfaces for common uses like storage and databases. Think database/sql for cloud products. cflewis: The main goal of Go Cloud is that the abstraction is not leaky and that you don't get access to specific features in the way you are describing in the application itself.

  • primaryobjects/knowledgebase: This project is an example of building an expert system, using a knowledge-base constructed with logic-based artificial intelligence, also called symbolic AI [in javascript].

  • rtr7/router7: a pure-Go implementation of a small home internet router. It comes with all the services required to make a fiber7 internet connection work (DHCPv4, DHCPv6, DNS, etc.).

  • TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines:  The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.

  • BOLT: A Practical Binary Optimizer for Data Centers and Beyond: In this paper, we present BOLT, a post-link optimizer built on top of the LLVM framework. Utilizing sample-based profiling, BOLT boosts the performance of real-world x86 64-bit ELF applications even for highly optimized binaries built with both feedback-driven optimizations (FDO) and link-time optimizations (LTO). We demonstrate that post-link performance improvements are complementary to conventional compiler optimizations, even when the latter are done at a whole-program level and in the presence of profile information. BOLT has been deployed inside Facebook for multiple data-center workloads. For data-center applications, BOLT achieves up to 8.0% performance speedups on top of profile-guided function reordering and LTO. We have also applied BOLT to GCC and Clang binaries, and our evaluation shows that BOLT speeds up these binaries by up to 15.3% on top of FDO and LTO, and up to 35.5% if the binaries are built without FDO and LTO.

  • CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement: Our evaluations, over a diverse set of 27 workloads, show that CRAM provides a speedup of up to 73% (average 6%) without causing slowdown for any of the workloads, and consuming a storage overhead of less than 300 bytes at the memory controller.

  • Chaos Engineering: Members of the Netflix team that developed Chaos Engineering explain how to apply these principles to your own system. 

  • Serverless Streaming Architectures and Best Practices: In this whitepaper we will explore three stream processing patterns using a serverless approach. For each pattern, we’ll describe how it applies to a real-world use-case, the best practices and considerations for implementation, and cost estimates. Each pattern also includes a template which enables you to easily and quickly deploy these patterns in your AWS accounts.