Google platform
Encyclopedia : G : GO : GOO : Google platform
- For the company, see Google, Inc.; for the search engine see Google search; for other uses see Google (disambiguation).
Google, being one of the most popular Internet search engines, requires large computational resources in order to provide their service. This article describes Google's technological infrastructure, as presented in the company's public announcements.
Network topology
Google maintains an estimated 450,000 servers, arranged in racks located in clusters in cities around the world, with major centers in Mountain View, California; Virginia; Atlanta, Georgia; Dublin, Ireland; and a new facility constructed in 2006 in The Dalles, Oregon.Carr, David F. "[How Google Works]." [Baseline Magazine]. July 6, 2006. Retrieved on July 10, 2006. When an attempt to connect to Google is made, Google's DNS servers perform load balancing to allow the user to access Google's content most rapidly. This is done by sending the user the IP address of a cluster that is not under heavy load, and is geographically proximate to them. Each cluster has a few thousand servers, and upon connection to a cluster further load balancing is performed by hardware in the cluster, in order to send the queries to the least loaded Web Server.Racks are custom-made and contain 40 to 80 servers (20 to 40 1U servers on either side), new servers are 2U Rackmount systems.[[Citing sources citation needed]] Each rack has a switch. Servers are connected via a 100 Mbit/s Ethernet link to the local switch. Switches are connected to core gigabit switch using one or two gigabit uplinks.[[Citing sources citation needed]]
Main index
Since queries are composed of words, an inverted index of documents is required. Such an index allows obtaining a list of documents by a query word. The index is very large due to the number of documents stored in the servers, therefore it needs to be split up into "index shards". Each shard is hosted by a set of index servers. The load balancer decides which index server to query based on the availability of each server.Server types
Google's server infrastructure is divided in several types, each assigned to a different purpose:
- Google Web Servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
- Data-gathering servers are permanently dedicated to spidering the Web. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
- Index servers each contain a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
- Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
- Spelling servers make suggestions about the spelling of queries.
- See also: List of Google server types
Server hardware and software
Original hardware
The original hardware (ca. 1998) that was used by Google when it was located at Stanford University, included:"[Google Stanford Hardware]." Stanford University (provided by Internet Archive). Retrieved on July 10, 2006.
- Sun Ultra II with dual 200MHz processors, and 256MB of RAM. This was the main machine for the original Backrub system.
- 2 x 300 MHz Dual Pentium II Servers donated by Intel, they included 512MB of RAM and 9 x 9GB hard drives between the two. It was on these that the main search ran.
- F50 IBM RS/6000 donated by IBM, included 4 processors, 512MB of memory and 8 x 9GB hard drives.
- Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
- IBM disk expansion box with another 8 x 9GB hard drives donated by IBM.
- Homemade disk box which contained 10 x 9GB SCSI hard drives
Current hardware
Servers are commodity-class x86 PCs running customized versions of GNU/Linux. Indeed, the goal is to purchase CPU generations that offer the best performance per unit of power, not absolute performance. Other than the wages bill, the biggest cost that Google faces is electric power consumption. Estimates of the power required for over 250,000 servers range upwards of 20 megawatts, which could cost on the order of 1-2 million $US per month in electricity charges.For this reason, the Pentium II has been the most favoured processor, but this could change in the future as processor manufacturers are increasingly limited by the power output of their devices.
Published specifications:
- over-250,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III (as of 2005)
- One or more 80GB hard disk per server. (2003)
- 2–4 GiB memory per machine (2004)
Project 02
Google is currently developing a supercomputer at a data center located in the town of The Dalles, Oregon, on the Columbia River, approximately 80 miles from Portland. The project, codenamed Project 02, is expected to substantially add to their current global network capable of processing billions of search queries per day and a growing repertoire of other services.Markoff, John; Hansell, Saul. "[Google's quasi-secret power play]." San Diego Union Tribune. June 14, 2006. Retrieved on July 10, 2006. The new complex is approximately the size of two football fields with cooling towers four stories high. The project has already created hundreds of jobs in the area, mainly construction jobs at this point, with an expected 60 to 200 permanent positions later this year. Real estate prices in the small town of 12,000 have also increased by 40%.Server operation
Most operations are read-only. When an update is required, queries are redirected to other servers, such as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts in parallel, thus reducing the latency time.In order to avoid the effects of unavoidable hardware failure, data stored in the servers may be mirrored using hardware RAID. Software is also designed to be fault tolerant. Thus when a system goes down, data is still available on other servers, which increases the throughput.
References
External links
- [Google Supercomputer]
- [The Google Linux Cluster] — Video about Google's NT cluster
- [Web Search for a Planet: The Google Cluster Architecture] (Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
- [How Google Works]
Search Services: Blogs | Books | Catalogs | Directory | Froogle | Groups | Images | News | Personalized Search | Scholar | Video | Web
Web Applications: Analytics | Base | Blogger | Calendar | Finance | Gmail | Google Checkout | Maps | Notebook | orkut | Page Creator | Personalized Homepage | Picasa Web Albums | Reader | Spreadsheets | Translate | Writely
Computer Applications: Browser Sync | Desktop | Earth | Hello | Pack | Picasa | SketchUp | Talk | Toolbar | Web Accelerator
Miscellaneous: AdSense | AdWords | Answers | Co-op | Labs | Mobile | Sitemaps | SMS | Trends | Zeitgeist
Other
Terminology and concepts: Google economy | Google (verb) | Google bomb | Google juice
History and criticism: Google logo | Google and privacy issues | History of Google | Google's hoaxes
See also: Google Founders' Award | Googleplex | List of acquisitions by Google | PageRank
Annual Revenue: $7.14 billion USD (2006) | Employees: 6,800 (2006) | Stock Symbol: NASDAQ [GOOG] | Website: [www.google.com]
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
