Stuff The Internet Says On Scalability For August 24th, 2018

Hey, it's HighScalability time:

Images from a far flung galaxy? Nope. It's the mind blowing swirling beauty of ink in motion.

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

  • $10 billion: damages in worlds largest cyberattack; .5%: bitcoins use of all the electricity on earth; 1/200th: Verizon throttling California firefighters for leverage; 4.6%: YC companies reaching $100M+ valuation; 45: ave age of successful startup founder; $250,000: monthly take from browser-based Monero mining; 300+: 3D digitized Greek and Roman sculptures; 80: employees are chipped at a company; 100k: bike graveyard from failed startups; 70%: executives think they are block chain experts; $7 billion: Slack valuation; 120: AWS instance types; 27.6 petabytes: Microsoft’s undersea data center webcam of swimming fish; 42%: product is the reason startups fail; $334bn: tech and telecom M&A market; 6x: Nvidia's new GPU; 50%: Cisco's revenue is now subscription based; 

  • Quotable Quotes:
    • @hichaelmart: The biggest problem I have with all the "best practices" assertions recently are that almost none of them seem to be accompanied with any data. We do *many* of these "non-best-practices" at large scale at @bustle The "connections take time" one especially irks me. HTTP anyone?
    • @JoeEmison: The history of web-scale companies says: at scale you build all your own stuff. Not at scale, you should leverage the best of what you can pay-by-use for.
    • @RealSexyCyborg: You don't give lessons on the internet in how to avoid Chinese law enforcement.
    • @rsingel: The fight against net neutrality is a fight to monetize scarcity. ISPs have no interest in a world of fast, cheap, ubiquitous connections. That road leads to being a commodity
    • chx: Is Amazon AWS the new reality distortion field? Gosh, just rent two dedicated boxes, one master, one slave, switch over to slave manually in the extreme rare case if the master fails and be done. This entire article screams "right tool for the job and this is not the right tool".
    • Liz Pelly: The result of this experiment: I found Spotify’s most popular and visible playlists to be staggeringly male-dominated. Not only this, I approached the project by listening from a brand new account in order to confirm that gender bias would be reproduced by way of algorithmic recommendations—that when a user listens to mostly male-dominated playlists, what is produced are yet more male-dominated playlists.
    • @dhh: "Wherever per-person Facebook use rose to one standard deviation above the national average, attacks on refugees increased by about 50 percent", Facebook usage is directly correlated with hate crimes in bombshell study from German. 
    • some_account: I was looking into it a bit before. The smallest RDS instance was like 11 dollars per month and can't use spot instances (naturally). If you do use spot instance with a local postgres on EBS, it comes down to about 4 dollars per month all together. With instance store, that should be around 3 dollars. These are for t2 micro. If you use bigger instances, the money difference will be huge. RDS is just way too expensive for small projects.
    • @kellabyte: Whenever an open source database like Redis goes closed source on features and holds features for hostage the community starts to decay.Commercializing infrastructure is hard and there’s always a temptation to start holding features hostage because services struggle to  sell.
    • Eric Meyer: The drive to force every site on the web to HTTPS has pushed the web further away from the next billion users—not to mention a whole lot of the previous half-billion.  I saw a piece that claimed, “Investing in HTTPS makes it faster, cheaper, and easier for everyone.”  If you define “everyone” as people with gigabit fiber access, sure.  Maybe it’s even true for most of those whose last mile is copper.  But for people beyond the reach of glass and wire, every word of that claim was wrong.
    • NASA: The orbiter computer data bus network consists of a group of twisted, shielded wire pairs (data buses) that support the transfer of serial digital commands from the GPCs to vehicle hardware and vehicle systems data to the GPCs. The computer data bus network is divided into specific groups that perform specific functions.
    • @mjpt777: I've been observing the progressive decline in performance this year as patches get applied. The era of continuous improvements is over. [re: Intel's spectre patches]
    • @WhatTheFFacts: PornHub reported a significant drop in visitors the day that Fallout 4 was released.
    • @mipsytipsy: You all do realize that part of what's driving this uprising for event oriented instrumentation is straight up Moore's Law, right? The categories of logging, monitoring/metrics and APM only exist because they were optimizing for disks being fucking expensive.  ONLY that.
    • @cloud_opinion: Google changing the way they price GCE instances - its very smart, simple and awesomeness!. Never thought price by instance type made much sense in a cloud world, so glad to see Google getting rid of it.
    • @paulrnash: Ah, I see. Yeah, independent of committed use, the challenge we have is that Intel put the same number of RAM controller channels on the chip regardless of core count. So with high core count CPUs, you can't get really high RAM ratios without stranding the cores...On prem, it's easy to get 16:1 kinds of RAM ratios if you stuff the DIMMs and install lower core (8, 10, 12) parts. But... not very cost efficient in cloud (yet).
    • @aallan: "A standard water tower can be emptied in an hour using a botnet of  1,355 sprinklers," Nassi et. all wrote in their whitepaper. "A flood water reservoir can be emptied overnight using a botnet of 23,866 sprinklers." #IoT #DDoS #Botnet
    • @weetwo: Having sold software in a variety of forms all the way back to floppies, I can say with confidence that app developers complaining about a 30% app store cut don't know how good they have it. Global distribution and currency conversion, people! Quit whining!
    • @stevesi: Complaining that the fees to app stores feel like “taxes”? Then a fair thing to do is not charge for a product since that feels like a tax to your users. 
      Distribution is a cost of doing business. App stores are modern (and rather effective) distribution. 1/3
    • @ruthmalan: “Uncertainty is not a license to guess. It is a directive to decouple."  — @sandimetz 
    • Andrei Frumusanu: The last slide that is notable to talk about is the performance projections for Deimos and Hercules. Here Arm’s taking a direct stab at Intel’s lack of significant progress over the last few years and reiterating its confidence in the company’s ability in sustaining high CAGR (compound annual growth rate) performance figures for the next generations.
    • John Quain: “They think we need 300 teraflops of computing power,” [for cars] said Willard Tu, a senior director at chip supplier Xilinx. A teraflop is a trillion operations per second, which means that every vehicle would be a rolling supercomputer. “So the type of pipe and the diameter of that pipe” that is going to connect all these components, Mr. Tu said, “has to be very flexible.” Estimates for how big that pipe needs to be range from 25 to 40 or more gigabytes per second.
    • @taotetek: Distributed systems tip: Write your system without any queues first. You might find you don't need queues. If you end up needing queues, the retry and reliability code you wrote in order to function without queues will still make your system more reliable.
    • @mattblaze: You cannot make insecure software systems more secure by being encouraging and not saying mean things about them. It’s not that kind of insecurity.
    • @micahjay1: Some of our fastest growing companies were founded by individuals that a) got rejected by YC, b) didn't go to a top tier university, c) are over 40, d) are in obscure geographies. Its a good reminder that we're in the business of exceptions.
    • John Ousterhout: Use your intuition to ask questions, not to answer them
    • @kelseyhightower: Why settle for monolithic applications when you can have monolithic functions?
    • @jrhunt: We [AWS] recently launched the ability to connect to your Kinesis Data Streams over HTTP/2. This works with a new feature called Enhanced Fan-out that allows devs to register different consumers and get an additional 2MB/s / shard output from their stream.
    • @dsballantyne: Done very quickly, so please test your own clients + API. Running a 120s load test comparing HTTP/1.1 vs HTTP/2. HTTP/1.1 gets 71982 reqs and 599 req/s, HTTP/2 gets 881385 reqs and 7344 req/s. Pretty significant perf increase! #h2load #APIGateway
    • Joel Hruska: If you’ve been planning a new SSD purchase, it might be best to hold off for a while. There are signs that the market could be headed for a significant downturn, with some experts predicting NAND prices could collapse as low as $0.08 per GB during 2019. That would put NAND flash within striking distance of HDD pricing — certainly far closer than the two have come before.
    • iambateman: But once you get to that organizational size, there are 3 project managers and a VP with an agenda and designers and design managers and a QA department which is backed up “for a bit” and a PR team who wants to make sure we get a good Q2’19 “win with the developer crowd”. So it takes a year, and $2.5M, to do what should take 3 months and $300k. * not saying this is the case at Slack, it’s just often the case at Everywhere. ;)
    • clarkdave: We’ve been running Postgres on i3 instances with their attached SSDs. Performance is solid and it’s cheaper too. Having up to date replicas becomes crucial, along with incremental backups (we use wal-e for that).
    • ruffrey: Teams is unique in how it "ticks all the boxes" but has enough small flaws and unfinished implementations that it makes for a really, really awful tool. There are fundamental problems with the software.
    • Aristos Georgiou: Researchers from Australia’s Garvan Institute of Medical Research identified a structure that “remembers” past infections and vaccinations—and is filled with immune cells of many kinds which respond to pathogens that the body has encountered before.The structure is strategically positioned on the outside of lymph glands to detect infections early, according to researchers. It was discovered when the team used an advanced imaging technique, known as "3D microscopy", to essentially create “movies” of the immune system in action inside living animals.
    • Mark LaPedus: the 3D NAND market is expected to “collapse at the end of this year,” said Jim Handy, an analyst at Objective Analysis. “Already, we are seeing some price declines. Spot market prices have been going down all year. We are on the verge of oversupply. The issue is that people are getting more efficient in making 3D NAND. It’s supply-driven. There is no shortage of demand.”
    • @jeffblankenburg: My daughter started 9th grade yesterday. Her first class of the day was Computer Science. A class of 19 boys and her. The teacher thought she was lost, and asked which room she was looking for. And we wonder why this industry has a diversity problem.
    • Daniel Carroll: The researchers found node betweenness is actually a greater attractor and driver for the formation of social ties than node degree or other measures of centrality. Instead of examining only the amount of connections a single node has, WBPA places more emphasis on community formation and the quality of node connections. CMU's Radu Marculescu says, "The new model builds on the idea that humans are better at observing qualitative aspects than quantitative ones, which is why people typically favor investing in fewer qualitative social ties rather than numerous lower quality ties.
    • @halvarflake: People speak about the "security poverty line", but the harsh truth is that there is an "engineering poverty line" in tech, and many large, world-famous companies fall below it. Good security is normally a result of healthy IT engineering culture & competence; reality is that most organisations really struggle to keep their stuff running, or inventoried, or approximately clean. Many famous brands fail at doing "simple" things like having a working payment workflow on their website; discussing proper security engineering when they can't get ...
    • Bent Flyvbjerg: "Tera" is the next unit up, as the measurement for a trillion (a thousand billion). Recent developments in the size of the very largest projects and programs indicate we may presently be entering the "tera era" of large-scale project management. If we consider as projects the stimulus packages that were launched by the United States, Europe, and China to mitigate the effects of the 2008 financial and economic crises, then we may speak of trillion-dollar projects and thus of "teraprojects." Similarly, if the major acquisition program portfolio of the United States Department of Defense – which was valued at 1.6 trillion dollars in 2013 – is considered a large-scale project, then this, again, would be a teraproject (United States Government Accountability Office, 2013: 2). Projects of this size compare with the GDP of the world's top 20 nations, similar in size to the national economies of for example Australia or Canada. There is no indication that the relentless drive to scale is abating in megaproject development.
    • Rubik's Code: Self-organizing maps are one very fun concept and very different from the rest of the neural network world. They use the unsupervised learning to create a map or a mask for the input data. They provide an elegant solution for large or difficult to interpret data sets. Because of this high adaptivity, they found application in many fields and are in general mostly used for classification. Initially, Kohonen used them for speech recognition, but today they are also used in Bibliographic classification, Image browsing systems and Image classification, Medical Diagnosis, Data compression and so on.
    • Clive Thompson: One of the things that are so interesting about those early games is how they wrestled with the great geopolitical anxieties of the times: not just global thermonuclear war, but also the idea that society was robotizing. Space Invaders was about these gibbering creatures from the id. You had games that literally made the subtext, text—like Missile Command. Games were responding to these floating senses of technological threat that were in the air. There is really something dreamlike about them. They were the poetry of the age.
    • Carlo Rovelli: It is important not to confuse “time” and “change.” We tend to confuse these two important notions because in our experience we can merge them: we can order all the change we experience along a universal one-dimensional oriented line that we call “time.” But change is far more general than time. We can have “change,” namely “happenings,” without any possibility of ordering sequences of these happenings along a single time variable. 
    • Alexandra Robbins: Geeks, loners, punks, floaters, dorks, freaks, nerds, gamers, weirdos, emos, indies, scenes—whether they choose to alter their labels or ignore them entirely, they are free to self-catalog as an identity of one. Identifying as an “I” rather than as an “us” means that there are no rules. Unshackled by strict yet arbitrary, misguided norms, outcasts can be, look, act, and associate however they want to. And in this ever conformist, cookie-cutter, magazine-celebrity-worshipping, creativity-stifling society, the innovation, courage, and differences of the cafeteria fringe are vital to America’s culture and progress. Which is why we must celebrate them.
    • Ed Sperling: Money. Possibly the least understood impact [of tariffs] because it is often private is investment capital. China’s semiconductor investment fund at last count was somewhere in the neighborhood of $300 billion. That money initially was targeted for acquisitions, but more recently it has gone into funding startups inside of China. Normally, it takes five to seven years for startups to begin paying back investments, and it can take up to a decade with very complicated technology. What the impact of this will be is hard to assess in the early stages, but it will become more noticeable as these companies mature. It’s hard to put a price tag on the free flow of information in a startup market, but a trade war is bad for business and that information exchange.
    • russnewcomer: I worked for a company that has been in the tech side of this space since 2004 until acquisition last year. You'll notice that the discussion in this article (which seems to be submarine PR for Cargill) talks about Cargill and ADM's relationship with the farmers. There are a lot of smaller elevators and coops out there that are working to help farmers capitalize on the increased flow of information while maintaining their ability to stay open against corporate competition from the ABC giants (ADM, Bunge, Cargill). The big thing in the overall space is that the software side of ag lags behind profitability in the industry. The company I worked for was 6 people (fte 3 devs), and we were running on a .NET WebForms platform originally written in 2005 that we didn't have the time or resources to bring nearly 250kloc to something more modern. (That includes web, backend processes, api, our mobile app platform was another 40kloc) 

  • API Gateway is the outbound bandwidth of services. From $erverless to Elixir: It’s a lot cheaper for us. Mind that we already have an ops team and we already have a Kubernetes cluster running. Our additional costs are the fractions of EC2 instances that the Elixir nodes are consuming...What everyone should do is think about where your service is going, and can you afford those costs when you get there. If you don’t have a team of ops people and you aren’t familiar with serverful stuff, spending $30k/mo on HTTP requests might be cheaper than an ops team.

  • Excellent list of Case studies of AWS serverless apps in production.

  • Soon combining robots and deep learning will radically change how work gets done out in the real world. The Robohub podcast episode—Robotic Weeding and Harvesting—covers some really interesting robotics work being done in Australia. The first robot performs weed management. Not weed killing, the researcher really didn't like the phrase "killing weeds." The robot localizes using Real-time kinematic GPS, an expensive and highly accurate location system, along with a camera with deep learning to identify weeds in real-time. Once identified weed are "managed" chemically or pulled. The idea is herbacide resistence is a big problem. With robots you can just pick the weeds as a human might. The system replaces a tractor that would spray herbicide before planting. The system can also scout the field to figure out what's in the field for itself. It doesn't have to be told. If it doesn't know what a plant is the farmer can tell it, so it can work on any field. The second robot harvests sweet peppers. The problem they have is labor smoothing. Labor is only needed periodically, which isn't practical for people. The robot is a cheaper, more consistent source of labor, though humans are faster pickers. We also have to think about how in the years to come we'll be able to double the production of food as population increases. Both robots are designed to be easily generalizable platforms. Goal is deployment in a few years. I expect we'll see soon fruits and veggies engineered to be easier for robots to "manage".

  • Good classes on Nonlinear Dynamics and Chaos - Steven Strogatz, Cornell University.

  • Awesome cautionary tale well told. Automatic software updates are a chokepoint, once breached marauders are free to rape and pillage the castle. The destructive amplification these attacks provide are unmatched. As usual our greatest strength is also our greatest weakness. Oh, and the lesson is patch stuff. The Untold Story of NotPetya, the Most Devastating Cyberattack in History: In the spring of 2017, unbeknownst to anyone at Linkos Group, Russian military hackers hijacked the company’s update servers to allow them a hidden back door into the thousands of PCs around the country and the world that have M.E.Doc installed. Then, in June 2017, the saboteurs used that back door to release a piece of malware called ­NotPetya, their most vicious cyberweapon yet...NotPetya was propelled by two powerful hacker exploits working in tandem: One was a penetration tool known as EternalBlue, created by the US National Security Agency...On a national scale, NotPetya was eating Ukraine’s computers alive. It would hit at least four hospitals in Kiev alone, six power companies, two airports, more than 22 Ukrainian banks, ATMs and card payment systems in retailers and transport, and practically every federal agency. “The government was dead,” summarizes Ukrainian minister of infrastructure Volodymyr Omelyan. According to ISSP, at least 300 companies were hit, and one senior Ukrainian government official estimated that 10 percent of all computers in the country were wiped. The attack even shut down the computers used by scientists at the Chernobyl cleanup site, 60 miles north of Kiev...NotPetya’s architects combined that digital skeleton key with an older invention known as Mimikatz, created as a proof of concept by French security researcher Benjamin Delpy in 2011. 

  • An excellent explanation of Go Memory Management

  • You may not know it, but Silicon Valley has a world class Computer History Museum. One of the cool things they do is record oral histories. You can access their entire Oral History Collection online.  There are almost 1000 entries. You might like Backus, John oral history or Don Knuth's early programs or Kapor, Mitch oral history

  • Redis, the secret caching sauce behind a lot of websites on the internet, has gone to an open core paid for modules model as an attempt to make a decent living off of software a lot of people use but don't pay for. People have to make a living. It has worked for Postgres and Nginx, why not Redis? The world is in tumult: here, here, here, here@antirez: "Please note that the Redis license remains BSD. A few people misunderstood the @RedisLabs blog post. It applies only to modules developed at Redis Labs such as RediSearch. Modules developed by myself will be AGPL (that is, Disque). Redis core BSD as usually." But the cloud adds a level of indirection. @antirez: "AGPL is the only way to stay inside the OSS realm. But I believe it's not going to completely fix the fundamental problem of OSS in the cloud era, that is, people producing the software are not always the same that will make most money selling the service."

  • How confident are we that algorithms of tomorrow are a good fit for existing semiconductor chips or new computational fabrics under development? Algorithms Outpace Moore’s Law for AI: Professor Martin Groetschel observed that a linear programming problem that would take 82 years to solve in 1988 could be solved in one minute in 2003. Hardware accounted for 1,000 times speedup, while algorithmic advance accounted for 43,000 times. Similarly, MIT professor Dimitris Bertsimas showed that the algorithm speedup between 1991 and 2013 for mixed integer solvers was 580,000 times, while the hardware speedup of peak supercomputers increased only a meager 320,000 times.  Similar results are rumored to take place in other classes of constrained optimization problems and prime number factorization.

  • It's nice to see HighScalability has helped someone. Snyk.io reached out to say an article written in 2011—Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design—helped them uncover an email parser DOS vulnerability in some of the most popular Node.js parsers. You can read all about it in How to crash an email server with a single email. Awesome job!

  • Multi-cloud: snark or real beast? Cloud Wars: How The Rivalry Between Amazon, Microsoft, and Google Has Enabled The Rise Of Multi-Cloud Strategies: Companies like Snap have discussed their multi-cloud strategy on earnings calls, according to CB Insights’ Earnings Transcripts tool. The former CFO Drew Vollero highlighted how Snap’s multi-cloud strategy has saved the company money:“We’ve been able to moderate user cost growth through the successful execution of our multi-cloud strategy. Specifically, hosting costs per user dropped from $0.72 a year ago to $0.70 in the quarter. That’s great progress in a year when our sales have more than doubled and engagement metrics have grown substantially.” Also, A Portable Cloud Experiment: SFTP Cloud Storage Sync

  • Comprehensive Threadripper tests - memory vs cpu freq at capped power: The test is important, because the threadripper only has 4 memory channels. Consumer Intel and AMD parts have 2 memory channels, AMD's threadripper has 4, Intel's Xeon's have 6, and AMD's EPYC has 8.  Since threadripperonly has 4, memory-intensive workloads essentially cap performance.  It's interesting just how low in frequency we can go with only a minimal impact on the compile workload.  Memory speed trades-off against frequency, so if you don't want to pull 330W at the wall in the stock configuration, then buying 3000 MHz memory for the threadripper is not necessarily the best choice.

  • Curious about how software on the space shuttle works? Here it is: Avionics. Browse for awhile. It's a good time.

  • Scalable multi-node deep learning training using GPUs in the AWS Cloud: The results of this work demonstrate that AWS can be used for rapid training of deep learning networks, using a performant, flexible, and scalable architecture. The implementation described in this blog post has room for further optimization. A single Amazon EC2 P3 instance with 8 NVIDIA V100 GPUs can train ResNet50 with ImageNet data in about three hours (NVIDIA, Fast.AI) using SuperConvergence and other advanced optimization techniques. @jrhunt~ Scaling learning rate for the 1st 10 epochs then scaling it down for the final 80. Using a separate parameter server cluster. It came out to ~$30 an hour without using spot.

  • Nice trick. Built for Speed: Custom Parser for Regex at Scale: we captured the huge query latency reduction enabled by Bloom filters with a custom-built regex parser...Our own FSM proved to be 3–4 times faster! Thanks to these hand-coded FSMs, we’ve seen a substantial improvement in our ingestion pipeline, which brings more speed for our customers.

  • Basically, learn how distributed systems workServerless Best Practices: Each function should do only one thing; Functions don’t call other functions; Use as few libraries in your functions as possible (preferably zero); Avoid using connection based services e.g. RDBMS; One function per route (if using HTTP); Learn to use messages and queues (async FTW); Data flows, not data lakes; Just coding for scale is a mistake, you have to consider how it scales. 

  • Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost: 1) Using EC2 i3 instances with local NVMe storage can increase performance and save money. There are some limitations: local storage is ephemeral and will disappear if the node has stopped. Reboot is fine. 2) ZFS filesystem with compression enabled can decrease the storage requirements so that a MySQL instance will fit into local storage. Another option for compression could be to use InnoDB compression (row_format=compressed).

  • AnandTech live blogged a number of sessions from Hot Chips 2018. For example, the enigmatically titled SMIV DNN SoC for IoT

  • Mikael Ronstrom has made available for free a chapter—Use cases for MySQL Cluster—of his book MySQL Cluster 7.5 inside and out

  • It's a long commoditizing your complement play, but we're seeing the next level of abstractions being  created for a viable portable private multi-coud. KnativeEnvoy + Istio. Microservice Meshes With Istio And Envoy on Datanauts and Istio service mesh and microservices on Changelog are both good episodes to learn more about what's developing. Also, Envoy Service Mesh Case Study: Mitigating Cascading Failure at LyftMulti-cluster Kubernetes load balancing in AWS with YggdrasilEnvoy vs NGINX vs HAProxy: Why the open source Ambassador API Gateway chose EnvoyHybrid and Open Services with GCP, Envoy and Istio: A Talk with Google and Lyft (Cloud Next '18).

  • Murat continues his deep dive from his all expenses paid vacation at Microsoft. Logical index organization in Cosmos DB: Logical indexing is a specialized database topic. Does understanding this help me become a better distributed systems researcher? I would argue yes.  First of all, developing expertise in multiple branches, being a Pi-shaped academician, provides advantages. Aside from that, learning new things stretches your brain and makes it easier to learn other things.

  • Speculation isn't just for instructions, Intel even uses it for the content in their virtual memory management paging tables. Because processors are so much faster than RAM, multiple caches are placed between the processor and RAM, that even includes virtual memory management paging tables. The Foreshadow Flaw. Security Now episode 677. The result? More patches that slow everything down.

  • Eric Hammond with a lot of good tips on how to set up billing alerts on AWS. If you've ever faced the horror of paying a pretty penny for resources you forgot to close down (and who hasn't?), this is a very good thing to do.

  • facebookexperimental/FBHALE (article): FBHALE was developed to aid in the conceptual design of High Altitude Long Endurance (HALE) aircraft. By leveraging first order physical models for the various tightly coupled disciplines that drive HALE aircraft design, FBHALE allows for quick and accurate design space exploration. 

  • What You Should Know About Megaprojects and Why: An Overview: Sixth, it is shown how megaprojects are systematically subject to "survival of the unfittest," explaining why the worst projects get built instead of the best. Finally, it is argued that the conventional way of managing megaprojects has reached a "tension point," where tradition is challenged and reform is emerging.

  • PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database:  a distributed file system with ultra-low latency and high availability, designed for the POLARDB database service, which is now available on the Alibaba Cloud. PolarFS utilizes a lightweight network stack and I/O stack in user-space, taking full advantage of the emerging techniques like RDMA, NVMe, and SPDK. In this way, the end-toend latency of PolarFS has been reduced drastically and our experiments show that the write latency of PolarFS is quite close to that of local file system on SSD.

  • Architecting Persistent Memory Systems: The imminent release of 3D XPoint memory by Intel and Micron looks set to end the long wait for affordable persistent memory. Persistent memories combine the persistence of disk with DRAM-like performance, blurring the traditional divide between a byte-addressable, volatile main memory and a block-addressable, persistent storage (e.g., SSDs). One of the most disruptive potential use cases for persistent memories is to host in-memory recoverable data structures. These recoverable data structures may be directly modified by programmers using user-level processor load and store instructions, rather than relying on performance sapping software intermediaries like the operating and file systems. Ensuring the recoverability of these data structures requires programmers to have the ability to control the order of updates to persistent memory.