Cloud Computing, Big Data, & CDN Emerging Technologies
May 28, 2016 | Author: Adrian Lopez | Category: N/A
Short Description
Download Cloud Computing, Big Data, & CDN Emerging Technologies...
Description
Cloud Computing
Cloud Introduction 1
Cloud Computing What does Cloud Computing do? • Provides online data storage • Enables configuration and accessing of online applications • Provides a variety of software usage • Provides computing platform and computing infrastructure
2
Cloud Computing Application Example • Using Gmail on my smartphone to check e-mails • Receive an e-mail with a MS Power Point attachment file • However, MS Power Point and Windows OS is not installed on my smartphone! • Google Drive service’s Google Docs, Sheets, and Slides can be used to open the file
3
Cloud Computing What is a Cloud? • Cloud can provide services through a public or private Network or the Internet, where the service hosting system is at a remote location • Cloud can support various applications • E-mail, Web Conferencing, Games, Database Management, CRM (Customer Relationship Management), etc.
4
Cloud Computing Cloud Models
5
Cloud Computing Cloud Models • Public Cloud ˗ ˗ ˗
Enables public systems and service access Open architecture (e.g., e-mail) Could be less secure due to openness
• Private Cloud ˗ ˗
Enables service access within an organization Due to its private nature, it is more secure
6
Cloud Computing Cloud Models • Community Cloud ˗
Cloud accessible by a group of organizations
• Hybrid Cloud ˗ ˗ ˗
Hybrid Cloud = Public Cloud + Private Cloud Private cloud supports critical activities Public cloud supports non-critical activities
7
Cloud Computing Cloud Service Models The lower service model supports the management, computing power, security of its upper service model
Ø Ø Ø
SaaS: Software as a Service PaaS: Platform as a Service IaaS: Infrastructure as a Service
8
Cloud Computing Software as a Service (SaaS) • Provides a variety of software applications as a service to end users
Platform as a Service (PasS) • Provides a program executable platform for applications, development tools, etc.
Infrastructure as a Service (IaaS) • Provides the fundamental computing and security resources for the entire cloud • Backup storage, computing power, VM (Virtual Machines), etc.
9
Cloud Computing Cloud Service Models • There are many other service models • XaaS = Anything as a Service • • • •
NaaS à N for Network as a Service DaaS à D for Database as a Service BaaS à B for Business as a Service etc.
10
Cloud Computing Cloud Benefits
11
Cloud Computing Characteristics
12
Cloud Computing
REFERENCES 13
References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]
14
References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
15
Cloud Computing
Cloud Service Models 16
Cloud Computing Cloud Service Models The lower service model supports the management, computing power, security of its upper service model
Ø Ø Ø
SaaS: Software as a Service PaaS: Platform as a Service IaaS: Infrastructure as a Service
17
IaaS IaaS (Infrastructure as a Service) • Infrastructure support over the Internet • Cloud’s Computing & Storage Resources • Computing Power • Storage Services • Software Packages & Bundles • VLAN (Virtual Local Area Network) • VM (Virtual Machine) Features
18
IaaS VM (Virtual Machine) Administration • IaaS enables control of computing resources through Administrative Access to VMs è Server Virtualization features • Access to computing resources are enabled by Administrative Access to VMs • VM Administrative Command examples • Save data on cloud server • Start web server • Install new application
19
IaaS IaaS Procedures
20
IaaS IaaS Benefits • Flexible and Efficient Renting of Computer & Server Hardware • Rentable Resources • VM, Storage, Bandwidth, IP Addresses, Monitoring Services, Firewalls, etc. • Rent Payment Basis • Resource type • Usage time • Service packages
21
IaaS IaaS Benefits • Portability & Interoperability with Legacy Applications • Enables portability based on infrastructure resources that are used through Internet connections • Enables a method to maintain interoperability with legacy applications and workloads between IaaS clouds
22
PaaS PaaS (Platform as a Service) • Provides development & deployment tools for application development • Provides runtime environment for apps.
23
Cloud Services PaaS Types Application Delivery-Only Environment
Stand Alone Development Environment
Open Platform as a Service
Add-on Development Facilities
24
PaaS PaaS Types • Application Delivery-Only Environment • Provides on-demand scaling & application security • Stand-Alone Development Environment • Provides an independent platform for a specific function • Open Platform as a Service • Provides open source software to run applications for PaaS providers • Add-On Development Facilities • Enables customization to the existing SaaS platforms
25
PaaS PaaS Benefits
26
PaaS Benefits • Lower Administrative Overhead • User does not need to be involved in any administration of the platform • Lower Total Cost of Ownership • User does not need to purchase any hardware, memory, or server
27
PaaS Benefits • Scalable Solutions • Application resource demand based automatic resource scale control • More Current System Software • Cloud provider needs to maintain software upgrades & patch installations
28
SaaS SaaS (Software as a Service) • Provides software applications as a service to the user • Software that is deployed on a cloud server which is accessible through the Internet
29
SaaS Characteristics • On Demand Availability • Cloud software is available anywhere that the cloud is reachable via Internet • Easy Maintenance • No user software upgrade or maintenance needed è All supported by the cloud • Flexible Scale Up or Scale Down • Centralized Management & Data
30
SaaS Characteristics • Enables a Shared Data Model • Multiple users can share a single data model and database • Cost Effectiveness • Pay based on usage • No risk in buying the wrong software • Multitenant Programming Solutions • Multiple programmers are ensured to use the same software version è No version mismatch problems
31
Software-as-a-service Open SaaS Applications
32
Cloud Computing
REFERENCES 33
References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]
34
References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
35
Cloud Computing
Cloud Services 36
Cloud Services Google Cloud • Google App Engine ˗ ˗ ˗
Released as a preview in April 2008 PaaS (Platform as a Service) for web applications Provides automatic scaling based on resource demands and server load
• Google Cloud Storage ˗ ˗
Launched in May 2010 Online file storage service
37
Cloud Services Google Cloud • Google BigQuery ˗ ˗
Released in April 2012 Data analysis tool that uses SQL-like queries to process big datasets in seconds
• Google Compute Engine ˗ ˗
Released in June 2012 IaaS (Infrastructure as a Service) support to enable on demand launching of VMs (Virtual Machines)
38
Cloud Services Google Cloud • Google Cloud Endpoints ˗ ˗ ˗
Released in November 2013 Tool to create services inside App Engine Easily connects from Android, iOS, and JavaScript clients
• Google Cloud DNS (Domain Name System) ˗
DNS service supported by the Google Cloud
39
Cloud Services Google Cloud • Google Cloud Datastore ˗
NoSQL (No Structured Query Language) data storage
• Google Cloud SQL (Structured Query Language) ˗ ˗
Released in February 2014 as GA (General Availability) Fully managed MySQL database
40
Cloud Services Amazon S3 (Simple Storage Service) • Online file storage web service offered by Amazon Web Services • Public web service released in the United States in March 2006 and in Europe in November 2007 • Provides storage through web services interfaces (REST, SOAP, and BitTorrent)
41
Cloud Services Amazon Cloud Drive • Amazon Cloud Drive was released in March 2011 • Web storage application from Amazon • Storage Space Characteristics ˗
Can be accessed from up to eight specific devices (e.g., mobile devices & different computers) and by using different browsers on the same computer
42
Cloud Services Amazon Cloud Drive • Cloud Player (Originally bundled) ˗
Users can play music in their Cloud Drive from any computer or Android device
˗
Music browsing based on song titles, albums, artists, genres (website only), and playlists
43
Cloud Services Amazon Cloud Drive Options • Unlimited Photos ˗ ˗
Unlimited storage for photos & raw data files 5 gigabytes of video storage
• Unlimited Everything ˗
Unlimited storage for photos, videos, documents, and various files types
44
Cloud Services iCloud • Developed by Apple, Inc. • Public release in October 2011 • Cloud Storage & Cloud Computing • Operating system ˗ ˗ ˗
OS X (10.7 Lion or later) Microsoft Windows 7 or later iOS 5 or later
45
Cloud Services iCloud replaces MobileMe • Subscription-based collection of Apple’s online services and software • MobileMe was replaced by iCloud • MobileMe ceased services in June 2012 • MobileMe users were allowed transfers to iCloud until July 2012
46
Cloud Services iCloud Features • Email, Contacts, and Calendars • Find My Friends • Backup & Restore ˗ ˗
Back up feature for device settings & data iOS 5 or later required
• Find My iPhone ˗ ˗
Enables a user to track the location of an iOS device or Mac Formerly a feature of MobileMe
47
Cloud Services iCloud Features • Can manage lost or stolen Apple devices • Back to My Mac ˗
Enables remote log in to other computers that have Back to My Mac installed (using the same Apple ID)
• iWork for iCloud ˗
Apple's iWork suite (Pages, Numbers, and Keynote) made available on a web interface
48
Cloud Services iCloud Features • Photo Stream ˗ ˗
Can store most recent 1,000 photos Free storage for up to 30 days
• iCloud Photo Library ˗ ˗
Stores all photos at original resolution Stores photo metadata
• Storage (Introduced in 2011) ˗
5 GB of free storage per account
49
Cloud Services iCloud Features • iCloud Drive ˗
Can save photos, videos, documents, and apps
• iCloud Keychain ˗ ˗
Secure database for Website and Wi-Fi password Secure Credit card & Debit card management for quick access and auto-fill
50
Cloud Services iCloud Features • iTunes Match ˗ ˗
iTunes music library scan and match tracks function Serves tracks copied from CDs or other sources
51
Cloud Computing
REFERENCES 52
References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]
53
References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
54
Big Data
Big Data Examples 55
Big Data New FLU Virus Starts in the U.S.! • H1N1 flu virus (which has combined virus elements of the bird and swine (pig) flu) started to spread in the U.S. in 2009 • U.S. CDC (Centers for Disease Control and Prevention) was only collecting diagnostic data of Medical Doctors once a week • Using the CDC information to find how the flu was spreading would have an approximate 2 week lag, which is far too slow compared to the speed of the virus spreading
56
Big Data New FLU Virus Starts in the U.S.! • What vaccine was needed? • How much vaccine was needed? • Where was the vaccine needed? • Vaccine preparation and delivery plans could not be setup fast enough to safely prevent the virus from spreading out of control 57
Big Data New FLU Virus Starts in the U.S.! • Fortunately, Google published a paper about how they could predict the spread of the winter flu in the U.S. accurately down to specific regions and states • This paper was published in the journal Nature a few weeks before the H1N1 virus made the headline news 58
Big Data New FLU Virus Starts in the U.S.! • Millions of the most common search terms and Millions of different mathematical models were tested on Google’s database • Google receives more than 3 billion search queries a day • Analysis system was set to look for correlation between the frequency of certain search queues and the spread of the flu over time and space
59
Big Data New FLU Virus Starts in the U.S.! • Google’s method of analysis did not use data provided from hospitals or Medical Doctors • Google used Big Data analysis on the most common search terms people use • Google’s system proved to be more accurate and faster than analyzing government statistics
60
Big Data Wal-Mart • Wal-Mart’s Data Warehouse • Stores 4 petabytes (4´1015) of data • Records every single purchase • Approximately 267 million transactions a day from 6000 stores worldwide is recorded
61
Big Data Wal-Mart • Wal-Mart’s Data Analysis • Focused on evaluating the effectiveness of pricing strategies and advertising campaigns • Seeking for improvement methods in inventory management and supply chains
62
Big Data Recommendation System using Big Data • Based on data analysis of simple elements • What users made purchases in the past • Which items do they have in their virtual shopping cart • Which items did customers rate and like • What influence did the rating have on other customers to make a purchase
63
Big Data Amazon.com • Amazon.com’s Recommendation System • Item-to-Item Collaborative Filtering Algorithm • Personalization of the Online Store è Customized to each customer • Each customer’s store is based on the customer’s personal interest • Example: For a new mother, the store will display baby supplies and toys
64
Big Data Citibank • Bank operations in 100 countries • Big Data analysis on the database of basic financial transactions can enable Global insight on investments, market changes, trade patterns, and economic conditions • Many companies (e.g., Zara, H&M, etc.) work with Citibank to locate new stores and factories
65
Big Data Product Development & Sales • For example, a Smartphone takes significant time and money to manufacture • In addition, the duration of popularity for a new Smartphone is limited • To maximize sales, a company needs to manufacture just the right amount of products and sell them in the right locations
66
Big Data Product Development & Sales • Too much will result in leftovers and a big waste for the company! • Too less will result in a lost opportunity for company profit and growth! • Big Data analysis can help find how many smartphones and where the products could be popular based on common search terms that people use è Use this to also estimate how many products could be sold in a certain location è But why is this difficult?
67
Big Data
REFERENCES 68
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
69
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
70
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
71
Big Data
Big Data's 4 Vs 72
Big Data Big Data’s 4 V Big Challenges • Volume – Data Size • Variety – Data Formats • Velocity – Data Streaming Speeds • Veracity – Data Trustworthiness
73
Big Data Volume – Data Size • 40 Zettabytes (1021) of data is predicted to be created by 2020 • 2.5 Quintillionbytes (1018) of data are created every day • 6 Billion (109) people have mobile phones • 100 Terabytes (1012) of data (at least) is stored by most U.S. companies • 966 Petabytes (1015) was the approximate storage size of the American manufacturing industry in 2009
74
Big Data Variety – Data Formats • 150 Exabytes (1018) was the estimated size of data for health care throughout the world in 2011 • More than 4 Billion (109) hours each month are used in watching YouTube • 30 Billon contents are exchanged every month on Facebook • 200 Million monthly active users exchange 400 Million tweets every day
75
Big Data Velocity – Data Streaming Speeds • 1 Terabytes (1012) of trade information is exchanged during every trading session at the New York Stock Exchange • 100 sensors (approximately) are installed in modern cars to monitor fuel level, tire pressure, etc. • 18.9 Billion network connections are predicted to exist by 2016
76
Big Data Veracity – Data Trustworthiness • 1 out of 3 business leaders have experienced trust issues with their data when trying to make a business decision • $3.1 Trillion (1012) a year is estimated to be wasted in the U.S. economy due to poor data quality
77
Big Data New technology is needed to overcome these 4 V Big Data Challenges • Volume – Data Size • Variety – Data Formats • Velocity – Data Streaming Speeds • Veracity – Data Trustworthiness
78
Big Data
REFERENCES 79
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
80
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
81
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
82
Big Data
HADOOP 83
Hadoop Data Storage, Access, and Analysis • Hard drive storage capacity has tremendously increased • But the data read and write speeds to and from the hard drives have not significantly improved yet • Simultaneous parallel read and write of data with multiple hard disks requires advanced technology
84
Hadoop Data Storage, Access, and Analysis • Challenge 1: Hardware Failure ˗
When using many computers for data storage and analysis, the probability that one computer will fail is very high
• Challenge 2: Cost ˗
To avoid data loss or computed analysis information loss, using backup computers and memory is needed, which helps the reliability, but is very expensive
85
Hadoop Data Storage, Access, and Analysis • Challenge 3: Combining Analyzed Data ˗
Combining the analyzed data is very difficult
˗
If one part of the analyzed data is not ready, then the overall combining process has to be delayed
˗
If one part has errors in its analysis, then the overall combined result may be unreliable and useless
86
Hadoop Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗
HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem
˗
MapReduce provides Data Analysis ˗ MapReduce = Map + Reduce Function Function
87
Hadoop HDFS: Hadoop Distributed FileSystem • DFS (Distributed FileSystem) is designed for storage management of a network of computers • HDFS is optimized to store huge files with streaming data access patterns • HDFS is designed to run on clusters of general computers
88
Hadoop HDFS: Hadoop Distributed FileSystem • HDFS was designed to be optimal in performance for a WORM (Write Once, Read Many times) pattern, which is a very efficient data processing pattern • HDFS was designed considering the time to read the whole dataset to be more important than the time required to read the first record
89
Hadoop HDFS • HDFS clusters use 2 types of nodes • Namenode (master node) • Datanode (worker node)
90
Hadoop HDFS: Namenode • Manages the filesystem namespace • Maintains the filesystem tree and the metadata for all the files and directories in the tree • Stores on the local disk using 2 file forms • Namespace Image • Edit Log
91
Hadoop HDFS: Datanodes • Workhorse of the filesystem • Store and retrieve blocks when requested by the client or the namenode • Report back to the namenode periodically with lists of blocks that were stored
92
Hadoop MapReduce • MapReduce is a program that abstracts the analysis problem from stored data • MapReduce transforms the analysis problem into a computation process that uses a set of keys and values
93
Hadoop MapReduce System Architecture • MapReduce was designed for tasks that consume several minutes or hours on a set of dedicated trusted computers connected with a broadband high-speed network managed by a single master data center
94
Hadoop MapReduce Characteristics • MapReduce uses a somewhat brute-force data analysis approach • The entire dataset (or a big part of the dataset) is processed for every query • è Batch Query Processor model
95
Hadoop MapReduce Characteristics • MapReduce enables the ability to run an ad hoc query against the whole dataset within a scalable time • Many distributed systems combine data from multiple sources (which is very difficult), but MapReduce does this in a very effective and efficient way
96
Hadoop Technical Terms used in MapReduce • Seek Time is the delay in finding a file • Transfer Rate is the speed to move a file • Transfer Rate has improved significantly more (i.e., now has much faster transfer speeds) compared to improvements in Seek Time (i.e., still relatively slow)
97
Hadoop MapReduce • MapReduce gains performance enhancement through optimal balancing of Seeking and Transfer operations • Reduce Seek operations • Effectively use Transfer operations • In the next lecture, we will compare MapReduce with a traditional RDBMS (Rational Database Management System)
98
Big Data
REFERENCES 99
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
100
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
101
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
102
Big Data
MapReduce vs. RDBMS 103
Hadoop MapReduce vs. RDBMS • RDBMS (Rational Database Management System) Characteristics • RDBMS is good for updating a small proportion of a big database • RDBMS uses a traditional B-Tree, which is highly dependent in the time required to perform seek operations
104
Hadoop MapReduce vs. RDBMS • MapReduce Characteristics • MapReduce is good for updating all (or a majority) of a big database • MapReduce uses Sort and Merge to rebuild the database, which depends more on transfer operations
105
Hadoop MapReduce vs. RDBMS • RDBMS is good for applications that require the datasets of the database to be very frequently updated (e.g., point queries or small dataset updates) • MapReduce is better for WORM (Write Once and Read Many times) based data applications • MapReduce is a complementary system to RDBMS
106
Hadoop MapReduce vs. RDBMS RDBMS
MapReduce
Data Size
Gigabytes (109)
Petabytes (1012)
Access
Interactive & Batch
Batch
Updates
Read & Write Many Times
WORM (Write Once, Read Many Times)
Data Structure
Static Schema
Dynamic Schema
Integrity
High
Low
Scalability
Nonlinear
Linear
107
Hadoop MapReduce vs. RDBMS: Data Types • Structured Data: Data that has a formal defined structure (e.g., XML documents or database tables) • Semi-Structured Data: Data that has a looser format where the data structure is used as a guide and may be ignored • Unstructured Data: Data that does not have any formal structure (e.g., plain text or image data)
108
Hadoop MapReduce vs. RDBMS: Data Types • MapReduce is very effective on unstructured and semistructured data • Why? • MapReduce interprets data during the data processing sessions • MapReduce does not use intrinsic properties of the data as input keys or input values. The parameters used are selected by the person analyzing the data
109
Hadoop MapReduce vs. RDBMS: Scalability • MapReduce has a programming model that is linearly scalable • MapReduce Functions: 2 types • Map function • Reduce function • Both of these functions define a Key-Value pair mapping relation (e.g., Key-Value pair 1 è Key-Value pair 2)
110
Hadoop Hadoop Release Series
Release 2.6.0 became available Nov. 2014
Feature
1.x
0.22
2.X
Secure authentication
Yes
No
Yes
Old configuration names
Yes
New configuration names
No
Yes
Yes
Old MapReduce API
Yes
Yes
Yes
New MapReduce API
Yes (with some missing libraries)
Yes
Yes
MapReduce 1 runtime (Classic)
Yes
Yes
No
MapReduce 2 runtime (YARN)
No
No
Yes
HDFS Federation
No
No
Yes
HDFS High-Availability
No
No
Yes
111
Hadoop Hadoop Release Series • 2.x includes several major new features • MapReduce 2 is the new MapReduce runtime implemented on a new system called YARN • YARN • Yet Another Resource Negotiator • General resource management system for running distributed applications
112
Hadoop Hadoop Release Series • HDFS Federation partitions the HDFS namespace across multiple namenodes • Enables improved support for clusters with very large numbers of files • HDFS High-Availability feature uses standby namenodes for backup, and therefore, the namenode is no longer a potential SPOF (Single Point of Failure)
113
Big Data
REFERENCES 114
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
115
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
116
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
117
Big Data
MapReduce 118
MapReduce Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗
HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem
˗
MapReduce provides Data Analysis ˗ MapReduce = Map Function + Reduce Function
119
MapReduce Scaling Out • Scaling out is done by the DFS (Distributed FileSystem), where the data is divided and stored in distributed computers & servers • Hadoop uses HDFS to move the MapReduce computation to several distributed computing machines that will process a part of the divided data assigned
120
MapReduce Jobs • MapReduce job is a unit of work that needs to be executed • Job types: Data input, MapReduce program, Configuration Information, etc. • Job is executed by dividing it into one of two types of tasks • Map Task • Reduce Task
121
MapReduce Node types for Job execution • Job execution is controlled by 2 types of nodes • Jobtracker • Tasktracker • Jobtracker coordinates all jobs • Jobtracker schedules all tasks and assigns the tasks to tasktrackers
122
MapReduce
• • •
Tasktracker will execute its assigned task Tasktracker will send a progress reports to the Jobtracker Jobtracker will keep a record of the progress of all jobs executed
123
MapReduce Data flow • Hadoop divides the input into input splits (or splits) suitable for the MapReduce job • Split has a fixed-size • Split size is commonly matched to the size of a HDFS block (64 MB) for maximum processing efficiency
124
MapReduce Data flow • Map Task is created for each split • Map Task executes the map function for all records within the split • Hadoop commonly executes the Map Task on the node where the input data resides
125
MapReduce Data flow
• Data-Local Map Task • Data locality optimization does not need to use the cluster network • Data-local flow process shows why the Optimal Split Size = 64 MB HDFS Block Size
126
MapReduce Data flow
Node
Rack Data Center
• Rack-Local Map Task • A node hosting the HDFS block replicas for a map task’s input split could be running other map tasks • Job Scheduler will look for a free map slot on a node in the same rack as one of the blocks
127
Map Task HDFS Block
MapReduce Data flow
• Off-Rack Map Task • Needed when the Job Scheduler cannot perform data-local or rack-local map tasks • Uses inter-rack network transfer
128
MapReduce Map • Map task will write its output to the local disk • Map task output is not the final output, it is only the intermediate output
Reduce • Map task output is processed by Reduce Tasks to produce the final output • Reduce Task output is stored in HDFS • For a completed job, the Map Task output can be discarded
129
MapReduce Single Reduce Task
• • •
Node includes Split, Map, Sort, and Output unit Light blue arrows show data transfers in a node Black arrows show data transfers between nodes
130
MapReduce Single Reduce Task
• Number of reduce tasks is specified independently, and is not based on the size of the input
131
MapReduce Combiner Function • User specified function to run on the Map output è Forms the input to the Reduce function • Specifically designed to minimize the data transferred between Map Tasks and Reduce Tasks • Solves the problem of limited network speed on the cluster and helps to reduce the time in completing MapReduce jobs
132
MapReduce Multiple Reducer • Map tasks partition their output, each creating one partition for each reduce task • Each partition may use many keys and key associated values • All records for a key are kept in a single partition
133
MapReduce Multiple Reducers Shuffle
• Shuffle process is used in the data flow between the Map tasks and Reduce tasks 134
MapReduce Zero Reducer
• Zero reducer uses no shuffle process • Applied when all of the processing can be carried out in parallel Map tasks 135
Big Data
REFERENCES 136
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
137
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
138
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
139
Big Data
HDFS 140
HDFS Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗
HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem
˗
MapReduce provides Data Analysis ˗ MapReduce = Map Function + Reduce Function
141
HDFS
HDFS: Hadoop Distributed FileSystem • DFS (Distributed FileSystem) is designed for storage management of a network of computers • HDFS is optimized to store large terabyte size files with streaming data access patterns
142
HDFS
HDFS: Hadoop Distributed FileSystem • HDFS was designed to be optimal in performance for a WORM (Write Once, Read Many times) pattern • HDFS is designed to run on clusters of general computers & servers from multiple vendors
143
HDFS HDFS Characteristics • HDFS is optimized for large scale and high throughput data processing • HDFS does not perform well in supporting applications that require minimum delay (e.g., tens of milliseconds range)
144
HDFS Blocks • Files in HDFS are divided into block size chunks è 64 Megabyte default block size • Block is the minimum size of data that it can read or write • Blocks simplifies the storage and replication process è Provides fault tolerance & processing speed enhancement for larger files
145
HDFS HDFS • HDFS clusters use 2 types of nodes • Namenode (master node) • Datanode (worker node)
146
HDFS Namenode • Manages the filesystem namespace • Namenode keeps track of the datanodes that have blocks of a distributed file assigned • Maintains the filesystem tree and the metadata for all the files and directories in the tree • Stores on the local disk using 2 file forms • Namespace Image • Edit Log
147
HDFS Namenode • Namenode holds the filesystem metadata in its memory • Namenode’s memory size determines the limit to the number of files in a filesystem • But then, what is Metadata?
148
HDFS Metadata • Traditional concept of the library card catalogs • Categorizes and describes the contents and context of the data files • Maximizes the usefulness of the original data file by making it easy to find and use
149
HDFS Metadata Types • Structural Metadata • Focuses on the data structure's design and specification • Descriptive Metadata • Focuses on the individual instances of application data or the data content
150
HDFS Datanodes • Workhorse of the filesystem • Store and retrieve blocks when requested by the client or the namenode • Periodically reports back to the namenode with lists of blocks that were stored
151
HDFS Client Access • Client can access the filesystem (on behalf of the user) by communicating with the namenode and datanodes • Client can use a filesystem interface (similar to a POSIX (Portable Operating System Interface)) so the user code does not need to know about the namenode and datanodes to function properly
152
HDFS Namenode Failure • Namenode keeps track of the datanodes that have blocks of a distributed file assigned è Without the namenode, the filesystem cannot be used • If the computer running the namenode malfunctions then reconstruction of the files (from the blocks on the datanodes) would not be possible è Files on the filesystem would be lost
153
HDFS Namenode Failure Resilience • Namenode failure prevention schemes 1. Namenode File Backup 2. Secondary Namenode
154
HDFS 1. Namenode File Backup • Back up the namenode files that form the persistent state of the filesystem’s metadata • Configure the namenode to write its persistent state to multiple filesystems è Synchronous and atomic backup • Common backup configuration è Copy to Local Disk and Remote FileSystem
155
HDFS 2. Secondary Namenode • Secondary namenode does not act the same way as the namenode • Secondary namenode periodically merges the namespace image with the edit log to prevent the edit log from becoming too large • Secondary namenode usually runs on a separate computer to perform the merge process because this requires significant processing capability and memory
156
HDFS Hadoop 2.x Release Series HDFS Reliability Enhancements • HDFS Federation • HDFS HA (High-Availability)
157
HDFS HDFS Federation • Allows a cluster to scale by adding namenodes • Each namenode manages a namespace volume and a block pool • Namespace volume is made up of the metadata for the namespace • Block pool contains all the blocks for the files in the namespace
158
HDFS HDFS Federation • Namespace volumes are all independent • Namenodes do not communicate with each other • Failure of a namenode is also independent to other namenodes • A namenode failure does not influence the availability of another namenode’s namespace
159
HDFS HDFS High-Availability • Pair of namenodes (Primary & Standby) are set to be in Active-Standby configuration • Secondary namenode stores the latest edit log entries and an up-to-date block mapping • When the primary namenode fails, the standby namenode takes over serving client requests
160
HDFS HDFS High-Availability • Although the active-standby namenode can takeover operation quickly (e.g., few tens of seconds), to avoid unnecessary namenode switching, standby namenode activation will be executed after a sufficient observation period (e.g., approximately a minute or a few minutes)
161
Big Data
REFERENCES 162
References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.
163
References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.
164
References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org
Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
165
CDN (Content Delivery Network)
CDN Introduction 166
CDN Table of Contents • CDN Motivation & Structure • CDN Procedures • Hierarchical Content Delivery Model • CDN Market & Major Service Providers • CDN Research & Development
167
CDN CDN Motivation • CDN is a network constructed from a group of strategically placed and geographically distributed caching servers • CDN is one of the most efficient solutions for CPs (Content Providers) in serving a large number of user devices, for reduction in content download time and network traffic
168
CDN CDN Motivation • Network traffic that is accessed by mobile users (e.g., smart devices) is rapidly increasing • Mobile network performance is highly dependent on the content download of multimedia data and applications • Several mobile network operators have suffered from service outage or performance deterioration due to the significant increase in use of mobile devices
169
CDN Using CDN, both content download time and network traffic are reduced
CDN Structure Content Provider
User
Caching Server
Store popular contents in advance
Content request and delivery route with CDN Content request and delivery route without CDN
170
CDN CDN in Mobile Networks • Mobile communication networks have a stronger need for both reduced traffic load and content delivery time compared to broadband backbone networks where capacity is abundant such that traffic load reduction may not be as much of a critical issue
171
CDN CDN Structure • CDN usually consists of the CP (Content Provider) and caching servers • CP possesses all contents to serve • Caching servers are distributed in the network containing selected copies of identical contents that the CP stores
172
CDN CDN Structure • When a user requests a content to its nearest caching server, the server can delivery the content if the requested content is in its cache
• Otherwise the caching server redirects the user’s request to the remotely located CP
173
CDN CDN Procedures • When a user requests a content to its nearest caching server, the server can delivery the content if the requested content is in its cache
174
CDN CDN Procedures • If the requested content is not in the local server’s cache, content request is redirected to the remotely located CP
175
CDN Content Aging Procedure • Content aging is focused on delivering the most popular contents to users in the most effective way • Dependent on • Location of caching servers • Number of caching servers • Limited memory size of caching servers • Content Aging • Delete expired contents from the cache server • Download updated contents from the CP
176
CDN
Content Aging Procedure • Each content has a content update period è TTL (Time to Live) • Few seconds for on-line trading • Few seconds for auction information • 24 hours or more for movies
177
CDN
REFERENCES 178
References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/
179
CDN (Content Delivery Network)
CDN Hierarchical Content Delivery 180
Hierarchical Content Delivery Hierarchical Content Delivery • It is not possible for a caching server to save all contents that the CP (Content Providers) serves • Retrieving contents from the remotely located CP can cause a long content download time. In addition, a large amount of traffic will be generated by each server in support of the content’s packet routing
181
Hierarchical Content Delivery Hierarchical Content Delivery • For the given cache size of each server, it is important to maximize the hit rate of the local caching server such that the requested contents do not have to be retrieved from the CP • To accomplish this objective in the Internet in a scalable way, hierarchical cooperative content delivery techniques are used in providing content delivery to local caching servers
182
Hierarchical Content Delivery Hierarchical Content Delivery • CD & LCF (Content Distribution & Location Control Functions) controls the overall content delivery process, and has all content IDs of the CDN • CCF (Cluster Control Function) controls multiple CDPFs (Content Delivery Processing Functions) and saves content IDs of the cluster • CDPF stores and delivers the contents to the users
183
Hierarchical Content Delivery Hierarchical Content Delivery Network Example
184
Hierarchical Content Delivery Content Delivery Procedures • Case 1 • Requested content is in the local cluster • Content request message is delivered to the CCF • CCF sends a session request message to the CDPF to deliver the content to the user • CDPF delivers the content to the user
185
Hierarchical Content Delivery Content Delivery Procedures • Case 1 Procedures
186
Hierarchical Content Delivery Content Delivery Procedures • Case 2 • Requested content is not in the local cluster, but another local cluster (i.e., target cluster) has the content • Procedures • Content request message is redirected from the local cluster to the CD & LCF • Continued…
187
Hierarchical Content Delivery Content Delivery Procedures • Case 2 • Procedures Continued… • CD & LCF checks if the requested content is in the other cluster • Requested content can be delivered from the target cluster to the user directly, or through the local cluster (the local cluster can store the requested content)
188
Hierarchical Content Delivery Content Delivery Procedures • Case 2 Procedures
189
Hierarchical Content Delivery Content Delivery Procedures • Case 3 • When the requested content is not in the CDN • Content request message is sent from the CD & LCF to the CP • CP delivers the content to the user through the local cluster • The requested content can be stored in the local cluster
190
Hierarchical Content Delivery Content Delivery Procedure • Case 3 Procedures
191
CDN
REFERENCES 192
References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/
193
CDN (Content Delivery Network)
CDN Market 194
CDN Market Measuring the CDN Market Value • There are many ways to evaluate the value of the CDN market • Evaluation is related to the diverse range of CDN industry participants • Example of industry participants • CSP (Communications Service Provider) • Industry manufacturers • CDN service providers • Content provider
195
CDN Market Measuring the CDN Market Value • For communication service providers, the CDN’s value includes improving retail service delivery and supporting their efforts to win and retain customers • For industry manufacturers, the market value is related to the demand from telcos, content providers and other businesses
196
CDN Market CDN Market Size • 2014 CDN Market size was $3.71 billion • CDNs Market Components • Content delivery technologies, hardware, analytics, monitoring, encoding, transparent caching, DRM (Digital Rights Management), CMS (Content Management System), OVP (Online Video Platform), etc. • CDN Market Estimations • Expectations to grow to $12.16 billion by 2019 •
Predicted 26.3% CAGR (Compound Annual Growth Rate) from 2014~2019
197
CDN Market CDN Service Providers • Akamai has about 110,000 servers over the world. Akamai's service includes cloud computing, HD video delivery, etc. • Amazon Cloudfront delivers static and streaming contents. Amazon Cloudfront works seamlessly with other Amazon Web and Cloud Service solutions • S3 (Simple Storage Service) • EC2 (Elastic Compute Cloud)
198
CDN Market CDN Service Providers • CDNetworks has POPs (Point of Presences) in 6 continents, including 20 POPs in China. World’s 3rd largest, and Asia’s #1, full-service provider • Level 3 supports a comprehensive encoding suite for video data, and intelligent traffic manager services (i.e., load balance)
199
CDN Market CDN Service Providers • Limtlight has 6,000 servers at 75 POPs (Points of Presence), and more than 30 regional content delivery centers in the U.S., Europe, and Asia • ChinaCache is a CDN market leader in China, which has 127 POPs and 11,000 servers in China. CDN services include hotlink protection, custom CNAME for SSL and Purge All.
200
CDN Market Telcos with a CDN resale agreement CDN Provider Akamai
Operator (Market Region) Verizon (US), NTT Communications (Japan), du (UAE), Telekom Malaysia (Malaysia)
CDNetworks
Andorra Telecom (Andorra), MegaFon (Russia), Telecom Italia Sparkle (Italy), SingTel (Singapore)
ChinaCache
China Mobile (China), HGC (International)
201
CDN Market Telcos with a CDN resale agreement CDN Provider
Operator (Market Region)
EdgeCast
AT&T (US), AAPT (Australia), Deutsche Telekom ICSS (Germany), Dogan Telecom (Turkey), Pacnet (Asia Pacific), Telus (Canada)
Jet-Stream Level 3 Limelight Networks
Telenet (Belgium), Ziggo (Netherlands) Internexa (South America), MWeb (South Africa), STC (Saudi Arabia) Bell Canada (Canada), Bestel (Mexico), Bharti Airtel (India), XO Communications (US)
202
CDN
REFERENCES 203
References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/
204
CDN (Content Delivery Network)
CDN R&D 205
CDN CDN Research & Development • Content Aspects • Content Type based Differentiated Support • Data, Multimedia, Mobile Apps, etc. • Content Aging Control • Content Selection & Deletion • Content Replication Detection • Dynamic Page Publishing • Digital Rights Management • Live Event Management
206
CDN CDN Research & Development • System Aspects • Surrogate Server Location (Dynamic) • Storage Memory Size (Dynamic) • Content Delivery Method • Mobile Device Characteristics, Location • Network Latency • Security & Information Assurance • Anomaly Detection • User Authentication • Content Authentication
207
CDN Mobile CDN Research & Development • Mobile wireless networks have additional challenges in supporting CDN services, e.g., • GPS & Navigation Information • Mobile TV • ITS (Intelligent Transportation System) • LBS (Location Based Service) • Efficient content provisioning is required to provide scalable control over wide coverage areas while providing high levels of QoS with limited resources
208
CDN Mobile CDN Challenges • Mobile node constraints (limited storage, processing power, input capability) due to the portable size of mobile devices • Frequent network disconnections due to mobile users • Location oriented services regarding user mobility • Real time monitoring to obtain the real time status of mobile users
209
CDN CDN vs. Mobile CDN Features
CDN
Mobile CDN [Future]
Content Type
Static, Dynamic, Streaming
Static, Dynamic, Streaming
Users Location
Fixed
Mobile, Fixed
Surrogate Location
Fixed
Fixed, [Mobile]
Surrogate Topology
ISP (Internet Service Provider) Local, Center of Service Area
BSs (Base Stations), RAN (Radio Access Network) Systems, [Mobile Devices]
Maintenance Complexity
Low~Medium
Medium~High [Dynamic]
Services
Multimedia & Data Services, etc.
Mobile Apps, LBS, [Mobile] Cloud, etc.
210
CDN
REFERENCES 211
References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/
212
View more...
Comments