Facebook has been an industry leader in building its Internet infrastructure for scalability. That includes the scalability of the people that work in the company’s data centers.
With more than 900 million active users, Facebook is the busiest site on the Internet and has built an extensive infrastructure to support this rapid growth. The social networking site was launched in February 2004, initially out of Facebook founder Mark Zuckerberg’s dorm room at Harvard University and using a single server. The company’s web servers and storage units are now housed in data centers around the country.
Each data center houses thousands of computer servers, which are networked together and linked to the outside world through fiber optic cables. Every time you share information on Facebook, the servers in these data centers receive the information and distribute it to your network of friends.
Facebook Manages A 300-Petabyte Data Warehouse, 600 Terabytes Per Day
How Big is Facebook’s Internet Infrastructure?
Facebook requires massive storage infrastructure to house its enormous stockpile of photos, which grows steadily as users add 300 million new photos every day. In addition, the company’s infrastructure must support platform services for more than 1 million web sites and 550,000 applications using the Facebook Connect platform.
To support that huge activity, Facebook has built two huge data centers, has two more under construction, and leases additional server space in at least nine data centers on both coasts of the United States. More than 70 percent of Facebook’s audience is in other countries, prompting Facebook to announce its first non-U.S. data center in Lulea, Sweden.
The company’s massive armada of servers and storage must work together seamlessly to deliver each Facebook page, the company said. “Loading a user’s home page typically requires accessing hundreds of servers, processing tens of thousands of individual pieces of data, and delivering the information selected in less than one second,” the company said.
For most of its history, Facebook has managed its infrastructure by leasing “wholesale” data center space from third-party landlords. Wholesale providers build the data center, including the raised-floor technical space and the power and cooling infrastructure, and then lease the completed facility. In the wholesale model, users can occupy their data center space in about five months, rather than the 12 months needed to build a major data center. This has allowed Facebook to scale rapidly to keep pace with the growth of its audience.
Where are Facebook’s Data Centers Located?
In January 2010 Facebook announced plans to build its own data centers, beginning with a facility in Prineville, Oregon. This typically requires a larger up-front investment in construction and equipment, but allows greater customization of power and cooling infrastructure. The social network has since announced plans for data centers inForest City, North Carolina (November 2010) and Lulea, Sweden(October, 2011). The company has brought the facilities in Prineville and North Carolina online, and begun work on second data centers on both campuses.
Facebook currently leases space in about six different data centers in Silicon Valley, located in Santa Clara and San Jose, and at least one in San Francisco. The company has also leased space in three wholesale data center facilities in Ashburn, Virginia. Both Santa Clara and Ashburn are key data center hubs, where hundreds of fiber networks meet and connect, making them ideal for companies whose content is widely distributed.
If Facebook’s growth continues at the current rate, it will likely require a larger network of company-built data centers, as seen with Google, Microsoft, Yahoo and eBay.
When Facebook first began with a small group of people using it and no photos or videos to display, the entire service could run on a single server,” said Jonathan Heiliger, Facebook’s vice president of technical operations.
Not so anymore. Facebook doesn’t say how many web servers it uses to power its infrastructure. Technical presentations by Facebook staff suggest that as of June 2010 the company was running at least 60,000 servers in its data centers, up from 30,000 in 2009 and 10,000 back in April 2008.
What kind of servers does Facebook use?
Facebook’s servers are powered by chips from both Intel and AMD, with custom-designed motherboards and chassis built by Quanta Computer of Taiwan. The servers use a 1.5U (2.65 inch) chassis, allowing the use of larger heat sinks and fans to improve cooling efficiency.
The cabling and power supplies are located on the front of the servers, so Facebook staff can work on the equipment from the cold aisle, rather than the enclosed, 100-degree plus hot aisle.
Facebook’s servers include custom power supplies that allow servers to use 277-volt AC power instead of the traditional 208 volts. This allows power to enter the building at 400/277 volts and come directly to the server, bypassing the step-downs seen in most data centers as the power passes through UPS systems and power distribution units (PDUs). The custom power supplies were designed by Facebook and built by Delta Electronics of Taiwan and California-based Power One.
Facebook contemplated installing on-board batteries on its servers, but settled on in-row UPS units. Each UPS system houses 20 batteries, with five strings of 48 volt DC batteries. Facebook’s power supplies include two connections, one for AC utility power and another for the DC-based UPS system. The company has systems in place to manage surge suppression and deal with harmonics (current irregularities).
What kind of software does Facebook Use?
Facebook was developed from the ground up using open source software. The site is written primarily in the PHP programming language and uses a MySQL database infrastructure. To accelerate the site, the Facebook Engineering team developed a program calledHipHop to transform PHP source code into C++ and gain performance benefits.
Facebook has one of the largest MySQL database clusters anywhere, and is the world’s largest users of memcached, an open source caching system. Memcached was an important enough part of Facebook’s infrastructure that CEO Mark Zuckerberg gave a tech talk on its usage in 2009.
Facebook has built a framework that uses RPC (remote procedure calls) to tie together infrastructure services written in any language, running on any platform. Services used in Facebook’s infrastructure include Apache Hadoop, Apache Cassandra, Apache Hive, FlashCache, Scribe, Tornado, Cfengine and Varnish.
Data center operations is a critical skill at Facebook, which now has 1.15 billion users, including 720 million who log in daily. Each day, Facebook users share 4.75 billion content items and “like” 4.5 billion items. The company now stores more than 240 billion photos, and adds 7 petabytes of photo storage each month.
To manage all that activity, Facebook has developed software to automate many aspects of data center operations. That includes software known as CYBORG, which detects problems with servers and attempts to fix the problems. If CYBORG exhausts automated repair options, it will send an alert to the ticketing system to dispatch a data center staffer to investigate the issue.
“Our goal is not to deploy a technician to the data center floor unless they actually have to physically handle a server,” said Eberly.