Facebook engineers are understandably obsessed with reliability and performance.
When it comes to infrastructure-intensive global enterprises, few companies occupy the extreme data-request processing category of Facebook. Imagine ensuring glitch-free, split-second responsiveness to 1.09 billion active daily users. Complicating the task of making sure that one of the world’s most popular mobile apps can reliably pull over a billion requests a second is the scattering of Facebook users throughout virtually all 195 nations on the planet.
The stakes are enormous — tremendous revenue, not to mention user satisfaction, are at risk.
That’s why Facebook’s crew of software engineers are tasked with the mission critical responsibilities of ensuring all apps perform reliably and are scalable. That’s why they increasingly monitor the performance of all Facebook’s Family of Apps on actual hardware: an estimated 2,000 real phones and tablets, not emulated ones, all hosted in the mobile device laboratory.
Facebook’s App Dream Team
As the social networking behemoth continues its winning ways (Q1 2016 results smashed the street’s estimates with $5.38 billion in revenue and a tidy operating profit of $1.5 billion), Facebook is increasingly focused on two all-important objectives:
- Sustained market penetration on mobile devices through its brilliantly successful Family of Apps strategy
- Recruiting its next 1 billion users in developing nations (“rest of world”)
Both are fundamental to growing Average Revenue per User (ARPU), the metric that fuels Facebook’s bottom line.
Unlike many high valuation acquisitions by the industry’s largest players (Microsoft, Yahoo!, eBay and HP, we’re talking about you) Facebook’s strategy to build an independent but complementary portfolio of apps is paying enormous short-term dividends, which seem likely to continue well into the future.
Facebook’s Mobile-First Growth Strategy
In August 2011, Facebook spun out its home-grown Facebook Messenger (600 million users) as a standalone app to elbow its way into the mobile chat sector. Then, leveraging its solid social networking status and gargantuan $300 billion market cap, Facebook bought up two best-of-breed apps to boost its mobile footprint: Instagram in April 2012 (300 million users) and WhatsApp in October 2014 (700 million users).
Collectively this strategy has more than doubled Facebook’s aggregated number of daily active users, placing it second in mobile advertising revenue behind Google. Not bad for a company celebrating its 12th birthday next February 4th.
The payoff has been obvious. Facebook’s highly captive and very mobile audience have transformed the social network giant into an advertising cash cow. 894 million monthly active users engage with Facebook only on their smartphones, up from 723 million in 2015. During the same period, 82% of its advertising revenue was generated on mobile, where ad rates command a premium over those served to desktops. Overall advertising revenue rose in the first quarter to $5.2 billion, up from $3.3 billion. User engagement continues to soar as well, with daily time spent using Facebook, Instagram and Messenger averaging more than 50 minutes a day.
Maximizing Mobile: Facebook’s present and future growth is almost entirely based on expanding mobile use of its Family of Apps and capturing users in developing countries. (Graph courtesy of Facebook, Inc.)
The stakes are enormous and Facebook is leaving nothing to chance.
Facebook’s Production Engineering Team: Delivering Mobile App Perfection
Prominent among the technology development and support groups within Facebook’s roughly 13,000-strong workforce is Production Engineering. This team is charged with the bulk of the testing and analyzing the performance of apps on mobile devices. Describing themselves as a “hybrid between software and systems engineering,” these developers and engineers write code and debug software for core infrastructure components and front-end services.
Production Engineering is tasked with the critical mission of ensuring Facebook apps perform well on virtually every model and configuration of device imaginable. Their mantra – guaranteeing all “services are reliable and scalable” – increasingly means monitoring and predicting the performance of each of the Family of Apps on actual hardware.
Getting Real: Testing on 2,000 Mobile Devices
Prior to building out the Mobile Device Testing Lab, engineers relied on a service called CT-Scan, which ran app code on a single generic hardware device sitting on their desks. Although this approach showed the general implications of specific code changes, CT-Scan didn’t scale and was deemed too narrow a window into real-world performance to be useful.
“We needed to be able to run tests on more than 2,000 mobile devices to account for all the combinations of device hardware, operating systems, and network connections that people use to connect on Facebook,” explains Antoine Reversat, a Production Engineer at the Prineville, Oregon, Data Center.
Intent on meeting this challenge, hundreds of iPhones and Androids were arranged on wall panels at the company’s Menlo Park headquarters. After much trial and error, the fledgling program was relocated to the Cold Storage Building in Prineville in March of last year. Today, the Mobile Device Lab is buzzing with over 2,000 devices in 60 specially designed racks each containing up to 32 smartphones, tablets and cellphones ranging from 1-year to over 5-years old. To avoid interference with the Wi-Fi signal that delivers test data to the devices from PCs and Mac Minis located underneath, the racks are enclosed in electromagnetic interference (EMI) cabinets.
Soon, cameras positioned directly above the devices will oversee app behavior on each screen to record rendering problems, performance issues and other anomalies. Until this capability is out of beta, performance data is non-visual and streamed from device to central servers for analysis.
Really Sold on Real-Time Device Testing: After experimenting with code performance monitoring software, mobile emulators, and generic production testing hardware, Facebook made the move to a device-based testing model similar to the services offered by Mobile1st. (Photo courtesy of Facebook, Inc.)
Facebook Says “No” to Mobile Emulators
Conducting mobile performance testing of this magnitude might seem a perfect use case for mobile emulators configured to imitate the behavior of Facebook’s family of apps, but the benefits were too compelling to not go beyond simple troubleshooting.
“We need to understand the performance implications of a code change on both high-end and typical devices, as well as on a variety of operating systems. We have thousands of changes each week, and given the code intricacies of the Facebook app, we could inadvertently introduce regressions [or bugs] that take up more data, memory, or battery usage. We wouldn’t be able to track down a one-percent performance regression in a simulator. So we opted for on-device testing,” Reversat says.
Facebook, just like the many QA and Web Dev teams employing Mobilizer, has embraced mobile device labs to test and monitor their app’s performance on a broad range of actual devices. Even if you’re not rolling out hundreds of thousands of lines of production code on a weekly basis for apps used by over 1.6 billion people, the bottom line and peace of mind benefits are the same.
“When a developer makes a change to one of the mobile applications, we take that change, we build the app with the change, and then we install it on one of the devices that are here and we run the app while collecting metrics,” adds Reversat.
Facebook’s Four Domains of QA Testing
- Change Management: When rolling out thousands of changes each week on each mobile platform, it’s necessary to thoroughly test the impact of each line of code on a full range of mobile devices.
- Troubleshooting: Given the complexity of Facebook’s apps, it is relatively easy for a seemingly innocuous change to slow down an interaction, or use more data, memory, or battery.
- Performance Monitoring and Management: We care about app speed, data usage, and battery efficiency.
- Optimize Workflow: We iterate at a very fast pace. We want to build a system to maintain or improve our development speed while minimizing the number of regressions and glitches in such dimensions of performance as speed, data usage, battery consumption, and memory footprint.
At the heart of Facebook’s quest for engineering excellence are the unending pursuit of continually higher levels of productivity and precision.
“The whole point: is to go fast, we have to get better quickly,” says Ken Patchett, Facebook’s Director of Western Data Center Operations.
Don’t have a few hundred million lying around to build your own device testing lab?
Increase revenue and customer engagement with Mobile1st. Easily identify display issues, monitor mobile performance metrics, and optimize the customer experience.
Top image courtesy of Flickr and Jeremy Keith