Cloud Computing, Big Data, & CDN Emerging Technologies

May 28, 2016 | Author: Adrian Lopez | Category: N/A

Share Embed Donate

Report this link

Short Description

Download Cloud Computing, Big Data, & CDN Emerging Technologies...

Description

Cloud Computing

Cloud Introduction 1

Cloud Computing What does Cloud Computing do? • Provides online data storage • Enables configuration and accessing of online applications • Provides a variety of software usage • Provides computing platform and computing infrastructure

2

Cloud Computing Application Example • Using Gmail on my smartphone to check e-mails • Receive an e-mail with a MS Power Point attachment file • However, MS Power Point and Windows OS is not installed on my smartphone! • Google Drive service’s Google Docs, Sheets, and Slides can be used to open the file

3

Cloud Computing What is a Cloud? • Cloud can provide services through a public or private Network or the Internet, where the service hosting system is at a remote location • Cloud can support various applications • E-mail, Web Conferencing, Games, Database Management, CRM (Customer Relationship Management), etc.

4

Cloud Computing Cloud Models

5

Cloud Computing Cloud Models • Public Cloud ˗ ˗ ˗

Enables public systems and service access Open architecture (e.g., e-mail) Could be less secure due to openness

• Private Cloud ˗ ˗

Enables service access within an organization Due to its private nature, it is more secure

6

Cloud Computing Cloud Models • Community Cloud ˗

Cloud accessible by a group of organizations

• Hybrid Cloud ˗ ˗ ˗

Hybrid Cloud = Public Cloud + Private Cloud Private cloud supports critical activities Public cloud supports non-critical activities

7

Cloud Computing Cloud Service Models The lower service model supports the management, computing power, security of its upper service model

Ø Ø Ø

SaaS: Software as a Service PaaS: Platform as a Service IaaS: Infrastructure as a Service

8

Cloud Computing Software as a Service (SaaS) • Provides a variety of software applications as a service to end users

Platform as a Service (PasS) • Provides a program executable platform for applications, development tools, etc.

Infrastructure as a Service (IaaS) • Provides the fundamental computing and security resources for the entire cloud • Backup storage, computing power, VM (Virtual Machines), etc.

9

Cloud Computing Cloud Service Models • There are many other service models • XaaS = Anything as a Service • • • •

NaaS à N for Network as a Service DaaS à D for Database as a Service BaaS à B for Business as a Service etc.

10

Cloud Computing Cloud Benefits

11

Cloud Computing Characteristics

12

Cloud Computing

REFERENCES 13

References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]

14

References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

15

Cloud Computing

Cloud Service Models 16

Cloud Computing Cloud Service Models The lower service model supports the management, computing power, security of its upper service model

Ø Ø Ø

SaaS: Software as a Service PaaS: Platform as a Service IaaS: Infrastructure as a Service

17

IaaS IaaS (Infrastructure as a Service) • Infrastructure support over the Internet • Cloud’s Computing & Storage Resources • Computing Power • Storage Services • Software Packages & Bundles • VLAN (Virtual Local Area Network) • VM (Virtual Machine) Features

18

IaaS VM (Virtual Machine) Administration • IaaS enables control of computing resources through Administrative Access to VMs è Server Virtualization features • Access to computing resources are enabled by Administrative Access to VMs • VM Administrative Command examples • Save data on cloud server • Start web server • Install new application

19

IaaS IaaS Procedures

20

IaaS IaaS Benefits • Flexible and Efficient Renting of Computer & Server Hardware • Rentable Resources • VM, Storage, Bandwidth, IP Addresses, Monitoring Services, Firewalls, etc. • Rent Payment Basis • Resource type • Usage time • Service packages

21

IaaS IaaS Benefits • Portability & Interoperability with Legacy Applications • Enables portability based on infrastructure resources that are used through Internet connections • Enables a method to maintain interoperability with legacy applications and workloads between IaaS clouds

22

PaaS PaaS (Platform as a Service) • Provides development & deployment tools for application development • Provides runtime environment for apps.

23

Cloud Services PaaS Types Application Delivery-Only Environment

Stand Alone Development Environment

Open Platform as a Service

Add-on Development Facilities

24

PaaS PaaS Types • Application Delivery-Only Environment • Provides on-demand scaling & application security • Stand-Alone Development Environment • Provides an independent platform for a specific function • Open Platform as a Service • Provides open source software to run applications for PaaS providers • Add-On Development Facilities • Enables customization to the existing SaaS platforms

25

PaaS PaaS Benefits

26

PaaS Benefits • Lower Administrative Overhead • User does not need to be involved in any administration of the platform • Lower Total Cost of Ownership • User does not need to purchase any hardware, memory, or server

27

PaaS Benefits • Scalable Solutions • Application resource demand based automatic resource scale control • More Current System Software • Cloud provider needs to maintain software upgrades & patch installations

28

SaaS SaaS (Software as a Service) • Provides software applications as a service to the user • Software that is deployed on a cloud server which is accessible through the Internet

29

SaaS Characteristics • On Demand Availability • Cloud software is available anywhere that the cloud is reachable via Internet • Easy Maintenance • No user software upgrade or maintenance needed è All supported by the cloud • Flexible Scale Up or Scale Down • Centralized Management & Data

30

SaaS Characteristics • Enables a Shared Data Model • Multiple users can share a single data model and database • Cost Effectiveness • Pay based on usage • No risk in buying the wrong software • Multitenant Programming Solutions • Multiple programmers are ensured to use the same software version è No version mismatch problems

31

Software-as-a-service Open SaaS Applications

32

Cloud Computing

REFERENCES 33

References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]

34

References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

35

Cloud Computing

Cloud Services 36

Cloud Services Google Cloud • Google App Engine ˗ ˗ ˗

Released as a preview in April 2008 PaaS (Platform as a Service) for web applications Provides automatic scaling based on resource demands and server load

• Google Cloud Storage ˗ ˗

Launched in May 2010 Online file storage service

37

Cloud Services Google Cloud • Google BigQuery ˗ ˗

Released in April 2012 Data analysis tool that uses SQL-like queries to process big datasets in seconds

• Google Compute Engine ˗ ˗

Released in June 2012 IaaS (Infrastructure as a Service) support to enable on demand launching of VMs (Virtual Machines)

38

Cloud Services Google Cloud • Google Cloud Endpoints ˗ ˗ ˗

Released in November 2013 Tool to create services inside App Engine Easily connects from Android, iOS, and JavaScript clients

• Google Cloud DNS (Domain Name System) ˗

DNS service supported by the Google Cloud

39

Cloud Services Google Cloud • Google Cloud Datastore ˗

NoSQL (No Structured Query Language) data storage

• Google Cloud SQL (Structured Query Language) ˗ ˗

Released in February 2014 as GA (General Availability) Fully managed MySQL database

40

Cloud Services Amazon S3 (Simple Storage Service) • Online file storage web service offered by Amazon Web Services • Public web service released in the United States in March 2006 and in Europe in November 2007 • Provides storage through web services interfaces (REST, SOAP, and BitTorrent)

41

Cloud Services Amazon Cloud Drive • Amazon Cloud Drive was released in March 2011 • Web storage application from Amazon • Storage Space Characteristics ˗

Can be accessed from up to eight specific devices (e.g., mobile devices & different computers) and by using different browsers on the same computer

42

Cloud Services Amazon Cloud Drive • Cloud Player (Originally bundled) ˗

Users can play music in their Cloud Drive from any computer or Android device

˗

Music browsing based on song titles, albums, artists, genres (website only), and playlists

43

Cloud Services Amazon Cloud Drive Options • Unlimited Photos ˗ ˗

Unlimited storage for photos & raw data files 5 gigabytes of video storage

• Unlimited Everything ˗

Unlimited storage for photos, videos, documents, and various files types

44

Cloud Services iCloud • Developed by Apple, Inc. • Public release in October 2011 • Cloud Storage & Cloud Computing • Operating system ˗ ˗ ˗

OS X (10.7 Lion or later) Microsoft Windows 7 or later iOS 5 or later

45

Cloud Services iCloud replaces MobileMe • Subscription-based collection of Apple’s online services and software • MobileMe was replaced by iCloud • MobileMe ceased services in June 2012 • MobileMe users were allowed transfers to iCloud until July 2012

46

Cloud Services iCloud Features • Email, Contacts, and Calendars • Find My Friends • Backup & Restore ˗ ˗

Back up feature for device settings & data iOS 5 or later required

• Find My iPhone ˗ ˗

Enables a user to track the location of an iOS device or Mac Formerly a feature of MobileMe

47

Cloud Services iCloud Features • Can manage lost or stolen Apple devices • Back to My Mac ˗

Enables remote log in to other computers that have Back to My Mac installed (using the same Apple ID)

• iWork for iCloud ˗

Apple's iWork suite (Pages, Numbers, and Keynote) made available on a web interface

48

Cloud Services iCloud Features • Photo Stream ˗ ˗

Can store most recent 1,000 photos Free storage for up to 30 days

• iCloud Photo Library ˗ ˗

Stores all photos at original resolution Stores photo metadata

• Storage (Introduced in 2011) ˗

5 GB of free storage per account

49

Cloud Services iCloud Features • iCloud Drive ˗

Can save photos, videos, documents, and apps

• iCloud Keychain ˗ ˗

Secure database for Website and Wi-Fi password Secure Credit card & Debit card management for quick access and auto-fill

50

Cloud Services iCloud Features • iTunes Match ˗ ˗

iTunes music library scan and match tracks function Serves tracks copied from CDs or other sources

51

Cloud Computing

REFERENCES 52

References • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading Computation Save Energy?,” Computer, vol. 43, no. 4, pp. 51–56, Apr. 2010. • Wikipedia, http://www.wikipedia.org • Apple, iCloud, https://www.icloud.com • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015] • Virtualization, Cisco’s IaaS cloud, http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg [Accessed June 1, 2015] • Tutorialspoint, Cloud computing, http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf [Accessed June 1, 2015]

53

References Image sources • AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

54

Big Data

Big Data Examples 55

Big Data New FLU Virus Starts in the U.S.! • H1N1 flu virus (which has combined virus elements of the bird and swine (pig) flu) started to spread in the U.S. in 2009 • U.S. CDC (Centers for Disease Control and Prevention) was only collecting diagnostic data of Medical Doctors once a week • Using the CDC information to find how the flu was spreading would have an approximate 2 week lag, which is far too slow compared to the speed of the virus spreading

56

Big Data New FLU Virus Starts in the U.S.! • What vaccine was needed? • How much vaccine was needed? • Where was the vaccine needed? • Vaccine preparation and delivery plans could not be setup fast enough to safely prevent the virus from spreading out of control 57

Big Data New FLU Virus Starts in the U.S.! • Fortunately, Google published a paper about how they could predict the spread of the winter flu in the U.S. accurately down to specific regions and states • This paper was published in the journal Nature a few weeks before the H1N1 virus made the headline news 58

Big Data New FLU Virus Starts in the U.S.! • Millions of the most common search terms and Millions of different mathematical models were tested on Google’s database • Google receives more than 3 billion search queries a day • Analysis system was set to look for correlation between the frequency of certain search queues and the spread of the flu over time and space

59

Big Data New FLU Virus Starts in the U.S.! • Google’s method of analysis did not use data provided from hospitals or Medical Doctors • Google used Big Data analysis on the most common search terms people use • Google’s system proved to be more accurate and faster than analyzing government statistics

60

Big Data Wal-Mart • Wal-Mart’s Data Warehouse • Stores 4 petabytes (4´1015) of data • Records every single purchase • Approximately 267 million transactions a day from 6000 stores worldwide is recorded

61

Big Data Wal-Mart • Wal-Mart’s Data Analysis • Focused on evaluating the effectiveness of pricing strategies and advertising campaigns • Seeking for improvement methods in inventory management and supply chains

62

Big Data Recommendation System using Big Data • Based on data analysis of simple elements • What users made purchases in the past • Which items do they have in their virtual shopping cart • Which items did customers rate and like • What influence did the rating have on other customers to make a purchase

63

Big Data Amazon.com • Amazon.com’s Recommendation System • Item-to-Item Collaborative Filtering Algorithm • Personalization of the Online Store è Customized to each customer • Each customer’s store is based on the customer’s personal interest • Example: For a new mother, the store will display baby supplies and toys

64

Big Data Citibank • Bank operations in 100 countries • Big Data analysis on the database of basic financial transactions can enable Global insight on investments, market changes, trade patterns, and economic conditions • Many companies (e.g., Zara, H&M, etc.) work with Citibank to locate new stores and factories

65

Big Data Product Development & Sales • For example, a Smartphone takes significant time and money to manufacture • In addition, the duration of popularity for a new Smartphone is limited • To maximize sales, a company needs to manufacture just the right amount of products and sell them in the right locations

66

Big Data Product Development & Sales • Too much will result in leftovers and a big waste for the company! • Too less will result in a lost opportunity for company profit and growth! • Big Data analysis can help find how many smartphones and where the products could be popular based on common search terms that people use è Use this to also estimate how many products could be sold in a certain location è But why is this difficult?

67

Big Data

REFERENCES 68

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

69

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

70

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

71

Big Data

Big Data's 4 Vs 72

Big Data Big Data’s 4 V Big Challenges • Volume – Data Size • Variety – Data Formats • Velocity – Data Streaming Speeds • Veracity – Data Trustworthiness

73

Big Data Volume – Data Size • 40 Zettabytes (1021) of data is predicted to be created by 2020 • 2.5 Quintillionbytes (1018) of data are created every day • 6 Billion (109) people have mobile phones • 100 Terabytes (1012) of data (at least) is stored by most U.S. companies • 966 Petabytes (1015) was the approximate storage size of the American manufacturing industry in 2009

74

Big Data Variety – Data Formats • 150 Exabytes (1018) was the estimated size of data for health care throughout the world in 2011 • More than 4 Billion (109) hours each month are used in watching YouTube • 30 Billon contents are exchanged every month on Facebook • 200 Million monthly active users exchange 400 Million tweets every day

75

Big Data Velocity – Data Streaming Speeds • 1 Terabytes (1012) of trade information is exchanged during every trading session at the New York Stock Exchange • 100 sensors (approximately) are installed in modern cars to monitor fuel level, tire pressure, etc. • 18.9 Billion network connections are predicted to exist by 2016

76

Big Data Veracity – Data Trustworthiness • 1 out of 3 business leaders have experienced trust issues with their data when trying to make a business decision • $3.1 Trillion (1012) a year is estimated to be wasted in the U.S. economy due to poor data quality

77

Big Data New technology is needed to overcome these 4 V Big Data Challenges • Volume – Data Size • Variety – Data Formats • Velocity – Data Streaming Speeds • Veracity – Data Trustworthiness

78

Big Data

REFERENCES 79

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

80

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

81

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

82

Big Data

HADOOP 83

Hadoop Data Storage, Access, and Analysis • Hard drive storage capacity has tremendously increased • But the data read and write speeds to and from the hard drives have not significantly improved yet • Simultaneous parallel read and write of data with multiple hard disks requires advanced technology

84

Hadoop Data Storage, Access, and Analysis • Challenge 1: Hardware Failure ˗

When using many computers for data storage and analysis, the probability that one computer will fail is very high

• Challenge 2: Cost ˗

To avoid data loss or computed analysis information loss, using backup computers and memory is needed, which helps the reliability, but is very expensive

85

Hadoop Data Storage, Access, and Analysis • Challenge 3: Combining Analyzed Data ˗

Combining the analyzed data is very difficult

˗

If one part of the analyzed data is not ready, then the overall combining process has to be delayed

˗

If one part has errors in its analysis, then the overall combined result may be unreliable and useless

86

Hadoop Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗

HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem

˗

MapReduce provides Data Analysis ˗ MapReduce = Map + Reduce Function Function

87

Hadoop HDFS: Hadoop Distributed FileSystem • DFS (Distributed FileSystem) is designed for storage management of a network of computers • HDFS is optimized to store huge files with streaming data access patterns • HDFS is designed to run on clusters of general computers

88

Hadoop HDFS: Hadoop Distributed FileSystem • HDFS was designed to be optimal in performance for a WORM (Write Once, Read Many times) pattern, which is a very efficient data processing pattern • HDFS was designed considering the time to read the whole dataset to be more important than the time required to read the first record

89

Hadoop HDFS • HDFS clusters use 2 types of nodes • Namenode (master node) • Datanode (worker node)

90

Hadoop HDFS: Namenode • Manages the filesystem namespace • Maintains the filesystem tree and the metadata for all the files and directories in the tree • Stores on the local disk using 2 file forms • Namespace Image • Edit Log

91

Hadoop HDFS: Datanodes • Workhorse of the filesystem • Store and retrieve blocks when requested by the client or the namenode • Report back to the namenode periodically with lists of blocks that were stored

92

Hadoop MapReduce • MapReduce is a program that abstracts the analysis problem from stored data • MapReduce transforms the analysis problem into a computation process that uses a set of keys and values

93

Hadoop MapReduce System Architecture • MapReduce was designed for tasks that consume several minutes or hours on a set of dedicated trusted computers connected with a broadband high-speed network managed by a single master data center

94

Hadoop MapReduce Characteristics • MapReduce uses a somewhat brute-force data analysis approach • The entire dataset (or a big part of the dataset) is processed for every query • è Batch Query Processor model

95

Hadoop MapReduce Characteristics • MapReduce enables the ability to run an ad hoc query against the whole dataset within a scalable time • Many distributed systems combine data from multiple sources (which is very difficult), but MapReduce does this in a very effective and efficient way

96

Hadoop Technical Terms used in MapReduce • Seek Time is the delay in finding a file • Transfer Rate is the speed to move a file • Transfer Rate has improved significantly more (i.e., now has much faster transfer speeds) compared to improvements in Seek Time (i.e., still relatively slow)

97

Hadoop MapReduce • MapReduce gains performance enhancement through optimal balancing of Seeking and Transfer operations • Reduce Seek operations • Effectively use Transfer operations • In the next lecture, we will compare MapReduce with a traditional RDBMS (Rational Database Management System)

98

Big Data

REFERENCES 99

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

100

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

101

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

102

Big Data

MapReduce vs. RDBMS 103

Hadoop MapReduce vs. RDBMS • RDBMS (Rational Database Management System) Characteristics • RDBMS is good for updating a small proportion of a big database • RDBMS uses a traditional B-Tree, which is highly dependent in the time required to perform seek operations

104

Hadoop MapReduce vs. RDBMS • MapReduce Characteristics • MapReduce is good for updating all (or a majority) of a big database • MapReduce uses Sort and Merge to rebuild the database, which depends more on transfer operations

105

Hadoop MapReduce vs. RDBMS • RDBMS is good for applications that require the datasets of the database to be very frequently updated (e.g., point queries or small dataset updates) • MapReduce is better for WORM (Write Once and Read Many times) based data applications • MapReduce is a complementary system to RDBMS

106

Hadoop MapReduce vs. RDBMS RDBMS

MapReduce

Data Size

Gigabytes (109)

Petabytes (1012)

Access

Interactive & Batch

Batch

Updates

Read & Write Many Times

WORM (Write Once, Read Many Times)

Data Structure

Static Schema

Dynamic Schema

Integrity

High

Low

Scalability

Nonlinear

Linear

107

Hadoop MapReduce vs. RDBMS: Data Types • Structured Data: Data that has a formal defined structure (e.g., XML documents or database tables) • Semi-Structured Data: Data that has a looser format where the data structure is used as a guide and may be ignored • Unstructured Data: Data that does not have any formal structure (e.g., plain text or image data)

108

Hadoop MapReduce vs. RDBMS: Data Types • MapReduce is very effective on unstructured and semistructured data • Why? • MapReduce interprets data during the data processing sessions • MapReduce does not use intrinsic properties of the data as input keys or input values. The parameters used are selected by the person analyzing the data

109

Hadoop MapReduce vs. RDBMS: Scalability • MapReduce has a programming model that is linearly scalable • MapReduce Functions: 2 types • Map function • Reduce function • Both of these functions define a Key-Value pair mapping relation (e.g., Key-Value pair 1 è Key-Value pair 2)

110

Hadoop Hadoop Release Series

Release 2.6.0 became available Nov. 2014

Feature

1.x

0.22

2.X

Secure authentication

Yes

No

Yes

Old configuration names

Yes

New configuration names

No

Yes

Yes

Old MapReduce API

Yes

Yes

Yes

New MapReduce API

Yes (with some missing libraries)

Yes

Yes

MapReduce 1 runtime (Classic)

Yes

Yes

No

MapReduce 2 runtime (YARN)

No

No

Yes

HDFS Federation

No

No

Yes

HDFS High-Availability

No

No

Yes

111

Hadoop Hadoop Release Series • 2.x includes several major new features • MapReduce 2 is the new MapReduce runtime implemented on a new system called YARN • YARN • Yet Another Resource Negotiator • General resource management system for running distributed applications

112

Hadoop Hadoop Release Series • HDFS Federation partitions the HDFS namespace across multiple namenodes • Enables improved support for clusters with very large numbers of files • HDFS High-Availability feature uses standby namenodes for backup, and therefore, the namenode is no longer a potential SPOF (Single Point of Failure)

113

Big Data

REFERENCES 114

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

115

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

116

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

117

Big Data

MapReduce 118

MapReduce Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗

HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem

˗

MapReduce provides Data Analysis ˗ MapReduce = Map Function + Reduce Function

119

MapReduce Scaling Out • Scaling out is done by the DFS (Distributed FileSystem), where the data is divided and stored in distributed computers & servers • Hadoop uses HDFS to move the MapReduce computation to several distributed computing machines that will process a part of the divided data assigned

120

MapReduce Jobs • MapReduce job is a unit of work that needs to be executed • Job types: Data input, MapReduce program, Configuration Information, etc. • Job is executed by dividing it into one of two types of tasks • Map Task • Reduce Task

121

MapReduce Node types for Job execution • Job execution is controlled by 2 types of nodes • Jobtracker • Tasktracker • Jobtracker coordinates all jobs • Jobtracker schedules all tasks and assigns the tasks to tasktrackers

122

MapReduce

• • •

Tasktracker will execute its assigned task Tasktracker will send a progress reports to the Jobtracker Jobtracker will keep a record of the progress of all jobs executed

123

MapReduce Data flow • Hadoop divides the input into input splits (or splits) suitable for the MapReduce job • Split has a fixed-size • Split size is commonly matched to the size of a HDFS block (64 MB) for maximum processing efficiency

124

MapReduce Data flow • Map Task is created for each split • Map Task executes the map function for all records within the split • Hadoop commonly executes the Map Task on the node where the input data resides

125

MapReduce Data flow

• Data-Local Map Task • Data locality optimization does not need to use the cluster network • Data-local flow process shows why the Optimal Split Size = 64 MB HDFS Block Size

126

MapReduce Data flow

Node

Rack Data Center

• Rack-Local Map Task • A node hosting the HDFS block replicas for a map task’s input split could be running other map tasks • Job Scheduler will look for a free map slot on a node in the same rack as one of the blocks

127

Map Task HDFS Block

MapReduce Data flow

• Off-Rack Map Task • Needed when the Job Scheduler cannot perform data-local or rack-local map tasks • Uses inter-rack network transfer

128

MapReduce Map • Map task will write its output to the local disk • Map task output is not the final output, it is only the intermediate output

Reduce • Map task output is processed by Reduce Tasks to produce the final output • Reduce Task output is stored in HDFS • For a completed job, the Map Task output can be discarded

129

MapReduce Single Reduce Task

• • •

Node includes Split, Map, Sort, and Output unit Light blue arrows show data transfers in a node Black arrows show data transfers between nodes

130

MapReduce Single Reduce Task

• Number of reduce tasks is specified independently, and is not based on the size of the input

131

MapReduce Combiner Function • User specified function to run on the Map output è Forms the input to the Reduce function • Specifically designed to minimize the data transferred between Map Tasks and Reduce Tasks • Solves the problem of limited network speed on the cluster and helps to reduce the time in completing MapReduce jobs

132

MapReduce Multiple Reducer • Map tasks partition their output, each creating one partition for each reduce task • Each partition may use many keys and key associated values • All records for a key are kept in a single partition

133

MapReduce Multiple Reducers Shuffle

• Shuffle process is used in the data flow between the Map tasks and Reduce tasks 134

MapReduce Zero Reducer

• Zero reducer uses no shuffle process • Applied when all of the processing can be carried out in parallel Map tasks 135

Big Data

REFERENCES 136

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

137

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

138

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

139

Big Data

HDFS 140

HDFS Hadoop • Hadoop is a Reliable Shared Storage and Analysis System • Hadoop = HDFS + MapReduce + α ˗

HDFS provides Data Storage ˗ HDFS: Hadoop Distributed FileSystem

˗

MapReduce provides Data Analysis ˗ MapReduce = Map Function + Reduce Function

141

HDFS

HDFS: Hadoop Distributed FileSystem • DFS (Distributed FileSystem) is designed for storage management of a network of computers • HDFS is optimized to store large terabyte size files with streaming data access patterns

142

HDFS

HDFS: Hadoop Distributed FileSystem • HDFS was designed to be optimal in performance for a WORM (Write Once, Read Many times) pattern • HDFS is designed to run on clusters of general computers & servers from multiple vendors

143

HDFS HDFS Characteristics • HDFS is optimized for large scale and high throughput data processing • HDFS does not perform well in supporting applications that require minimum delay (e.g., tens of milliseconds range)

144

HDFS Blocks • Files in HDFS are divided into block size chunks è 64 Megabyte default block size • Block is the minimum size of data that it can read or write • Blocks simplifies the storage and replication process è Provides fault tolerance & processing speed enhancement for larger files

145

HDFS HDFS • HDFS clusters use 2 types of nodes • Namenode (master node) • Datanode (worker node)

146

HDFS Namenode • Manages the filesystem namespace • Namenode keeps track of the datanodes that have blocks of a distributed file assigned • Maintains the filesystem tree and the metadata for all the files and directories in the tree • Stores on the local disk using 2 file forms • Namespace Image • Edit Log

147

HDFS Namenode • Namenode holds the filesystem metadata in its memory • Namenode’s memory size determines the limit to the number of files in a filesystem • But then, what is Metadata?

148

HDFS Metadata • Traditional concept of the library card catalogs • Categorizes and describes the contents and context of the data files • Maximizes the usefulness of the original data file by making it easy to find and use

149

HDFS Metadata Types • Structural Metadata • Focuses on the data structure's design and specification • Descriptive Metadata • Focuses on the individual instances of application data or the data content

150

HDFS Datanodes • Workhorse of the filesystem • Store and retrieve blocks when requested by the client or the namenode • Periodically reports back to the namenode with lists of blocks that were stored

151

HDFS Client Access • Client can access the filesystem (on behalf of the user) by communicating with the namenode and datanodes • Client can use a filesystem interface (similar to a POSIX (Portable Operating System Interface)) so the user code does not need to know about the namenode and datanodes to function properly

152

HDFS Namenode Failure • Namenode keeps track of the datanodes that have blocks of a distributed file assigned è Without the namenode, the filesystem cannot be used • If the computer running the namenode malfunctions then reconstruction of the files (from the blocks on the datanodes) would not be possible è Files on the filesystem would be lost

153

HDFS Namenode Failure Resilience • Namenode failure prevention schemes 1. Namenode File Backup 2. Secondary Namenode

154

HDFS 1. Namenode File Backup • Back up the namenode files that form the persistent state of the filesystem’s metadata • Configure the namenode to write its persistent state to multiple filesystems è Synchronous and atomic backup • Common backup configuration è Copy to Local Disk and Remote FileSystem

155

HDFS 2. Secondary Namenode • Secondary namenode does not act the same way as the namenode • Secondary namenode periodically merges the namespace image with the edit log to prevent the edit log from becoming too large • Secondary namenode usually runs on a separate computer to perform the merge process because this requires significant processing capability and memory

156

HDFS Hadoop 2.x Release Series HDFS Reliability Enhancements • HDFS Federation • HDFS HA (High-Availability)

157

HDFS HDFS Federation • Allows a cluster to scale by adding namenodes • Each namenode manages a namespace volume and a block pool • Namespace volume is made up of the metadata for the namespace • Block pool contains all the blocks for the files in the namespace

158

HDFS HDFS Federation • Namespace volumes are all independent • Namenodes do not communicate with each other • Failure of a namenode is also independent to other namenodes • A namenode failure does not influence the availability of another namenode’s namespace

159

HDFS HDFS High-Availability • Pair of namenodes (Primary & Standby) are set to be in Active-Standby configuration • Secondary namenode stores the latest edit log entries and an up-to-date block mapping • When the primary namenode fails, the standby namenode takes over serving client requests

160

HDFS HDFS High-Availability • Although the active-standby namenode can takeover operation quickly (e.g., few tens of seconds), to avoid unnecessary namenode switching, standby namenode activation will be executed after a sufficient observation period (e.g., approximately a minute or a few minutes)

161

Big Data

REFERENCES 162

References • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012. • J. Venner, Pro Hadoop. Apress, 2009. • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path From Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, Winter 2011. • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating revolutionary breakthroughs in commerce, science and society," Computing Community Consortium, pp. 1-15, Dec. 2008. • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan/Feb. 2003.

163

References • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data," Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014. • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013. • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014. • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014. • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-aService: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403– 410, Jun/Jul. 2013.

164

References • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012. • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 392-398, May 2010. • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-bigdata.html [Accessed June 1, 2015] • Hadoop Apache, http://hadoop.apache.org • Wikipedia, http://www.wikipedia.org

Image sources • Walmart Logo, By Walmart [Public domain], via Wikimedia Commons • Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

165

CDN (Content Delivery Network)

CDN Introduction 166

CDN Table of Contents • CDN Motivation & Structure • CDN Procedures • Hierarchical Content Delivery Model • CDN Market & Major Service Providers • CDN Research & Development

167

CDN CDN Motivation • CDN is a network constructed from a group of strategically placed and geographically distributed caching servers • CDN is one of the most efficient solutions for CPs (Content Providers) in serving a large number of user devices, for reduction in content download time and network traffic

168

CDN CDN Motivation • Network traffic that is accessed by mobile users (e.g., smart devices) is rapidly increasing • Mobile network performance is highly dependent on the content download of multimedia data and applications • Several mobile network operators have suffered from service outage or performance deterioration due to the significant increase in use of mobile devices

169

CDN Using CDN, both content download time and network traffic are reduced

CDN Structure Content Provider

User

Caching Server

Store popular contents in advance

Content request and delivery route with CDN Content request and delivery route without CDN

170

CDN CDN in Mobile Networks • Mobile communication networks have a stronger need for both reduced traffic load and content delivery time compared to broadband backbone networks where capacity is abundant such that traffic load reduction may not be as much of a critical issue

171

CDN CDN Structure • CDN usually consists of the CP (Content Provider) and caching servers • CP possesses all contents to serve • Caching servers are distributed in the network containing selected copies of identical contents that the CP stores

172

CDN CDN Structure • When a user requests a content to its nearest caching server, the server can delivery the content if the requested content is in its cache

• Otherwise the caching server redirects the user’s request to the remotely located CP

173

CDN CDN Procedures • When a user requests a content to its nearest caching server, the server can delivery the content if the requested content is in its cache

174

CDN CDN Procedures • If the requested content is not in the local server’s cache, content request is redirected to the remotely located CP

175

CDN Content Aging Procedure • Content aging is focused on delivering the most popular contents to users in the most effective way • Dependent on • Location of caching servers • Number of caching servers • Limited memory size of caching servers • Content Aging • Delete expired contents from the cache server • Download updated contents from the CP

176

CDN

Content Aging Procedure • Each content has a content update period è TTL (Time to Live) • Few seconds for on-line trading • Few seconds for auction information • 24 hours or more for movies

177

CDN

REFERENCES 178

References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/

179

CDN (Content Delivery Network)

CDN Hierarchical Content Delivery 180

Hierarchical Content Delivery Hierarchical Content Delivery • It is not possible for a caching server to save all contents that the CP (Content Providers) serves • Retrieving contents from the remotely located CP can cause a long content download time. In addition, a large amount of traffic will be generated by each server in support of the content’s packet routing

181

Hierarchical Content Delivery Hierarchical Content Delivery • For the given cache size of each server, it is important to maximize the hit rate of the local caching server such that the requested contents do not have to be retrieved from the CP • To accomplish this objective in the Internet in a scalable way, hierarchical cooperative content delivery techniques are used in providing content delivery to local caching servers

182

Hierarchical Content Delivery Hierarchical Content Delivery • CD & LCF (Content Distribution & Location Control Functions) controls the overall content delivery process, and has all content IDs of the CDN • CCF (Cluster Control Function) controls multiple CDPFs (Content Delivery Processing Functions) and saves content IDs of the cluster • CDPF stores and delivers the contents to the users

183

Hierarchical Content Delivery Hierarchical Content Delivery Network Example

184

Hierarchical Content Delivery Content Delivery Procedures • Case 1 • Requested content is in the local cluster • Content request message is delivered to the CCF • CCF sends a session request message to the CDPF to deliver the content to the user • CDPF delivers the content to the user

185

Hierarchical Content Delivery Content Delivery Procedures • Case 1 Procedures

186

Hierarchical Content Delivery Content Delivery Procedures • Case 2 • Requested content is not in the local cluster, but another local cluster (i.e., target cluster) has the content • Procedures • Content request message is redirected from the local cluster to the CD & LCF • Continued…

187

Hierarchical Content Delivery Content Delivery Procedures • Case 2 • Procedures Continued… • CD & LCF checks if the requested content is in the other cluster • Requested content can be delivered from the target cluster to the user directly, or through the local cluster (the local cluster can store the requested content)

188

Hierarchical Content Delivery Content Delivery Procedures • Case 2 Procedures

189

Hierarchical Content Delivery Content Delivery Procedures • Case 3 • When the requested content is not in the CDN • Content request message is sent from the CD & LCF to the CP • CP delivers the content to the user through the local cluster • The requested content can be stored in the local cluster

190

Hierarchical Content Delivery Content Delivery Procedure • Case 3 Procedures

191

CDN

REFERENCES 192

References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/

193

CDN (Content Delivery Network)

CDN Market 194

CDN Market Measuring the CDN Market Value • There are many ways to evaluate the value of the CDN market • Evaluation is related to the diverse range of CDN industry participants • Example of industry participants • CSP (Communications Service Provider) • Industry manufacturers • CDN service providers • Content provider

195

CDN Market Measuring the CDN Market Value • For communication service providers, the CDN’s value includes improving retail service delivery and supporting their efforts to win and retain customers • For industry manufacturers, the market value is related to the demand from telcos, content providers and other businesses

196

CDN Market CDN Market Size • 2014 CDN Market size was $3.71 billion • CDNs Market Components • Content delivery technologies, hardware, analytics, monitoring, encoding, transparent caching, DRM (Digital Rights Management), CMS (Content Management System), OVP (Online Video Platform), etc. • CDN Market Estimations • Expectations to grow to $12.16 billion by 2019 •

Predicted 26.3% CAGR (Compound Annual Growth Rate) from 2014~2019

197

CDN Market CDN Service Providers • Akamai has about 110,000 servers over the world. Akamai's service includes cloud computing, HD video delivery, etc. • Amazon Cloudfront delivers static and streaming contents. Amazon Cloudfront works seamlessly with other Amazon Web and Cloud Service solutions • S3 (Simple Storage Service) • EC2 (Elastic Compute Cloud)

198

CDN Market CDN Service Providers • CDNetworks has POPs (Point of Presences) in 6 continents, including 20 POPs in China. World’s 3rd largest, and Asia’s #1, full-service provider • Level 3 supports a comprehensive encoding suite for video data, and intelligent traffic manager services (i.e., load balance)

199

CDN Market CDN Service Providers • Limtlight has 6,000 servers at 75 POPs (Points of Presence), and more than 30 regional content delivery centers in the U.S., Europe, and Asia • ChinaCache is a CDN market leader in China, which has 127 POPs and 11,000 servers in China. CDN services include hotlink protection, custom CNAME for SSL and Purge All.

200

CDN Market Telcos with a CDN resale agreement CDN Provider Akamai

Operator (Market Region) Verizon (US), NTT Communications (Japan), du (UAE), Telekom Malaysia (Malaysia)

CDNetworks

Andorra Telecom (Andorra), MegaFon (Russia), Telecom Italia Sparkle (Italy), SingTel (Singapore)

ChinaCache

China Mobile (China), HGC (International)

201

CDN Market Telcos with a CDN resale agreement CDN Provider

Operator (Market Region)

EdgeCast

AT&T (US), AAPT (Australia), Deutsche Telekom ICSS (Germany), Dogan Telecom (Turkey), Pacnet (Asia Pacific), Telus (Canada)

Jet-Stream Level 3 Limelight Networks

Telenet (Belgium), Ziggo (Netherlands) Internexa (South America), MWeb (South Africa), STC (Saudi Arabia) Bell Canada (Canada), Bestel (Mexico), Bharti Airtel (India), XO Communications (US)

202

CDN

REFERENCES 203

References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/

204

CDN (Content Delivery Network)

CDN R&D 205

CDN CDN Research & Development • Content Aspects • Content Type based Differentiated Support • Data, Multimedia, Mobile Apps, etc. • Content Aging Control • Content Selection & Deletion • Content Replication Detection • Dynamic Page Publishing • Digital Rights Management • Live Event Management

206

CDN CDN Research & Development • System Aspects • Surrogate Server Location (Dynamic) • Storage Memory Size (Dynamic) • Content Delivery Method • Mobile Device Characteristics, Location • Network Latency • Security & Information Assurance • Anomaly Detection • User Authentication • Content Authentication

207

CDN Mobile CDN Research & Development • Mobile wireless networks have additional challenges in supporting CDN services, e.g., • GPS & Navigation Information • Mobile TV • ITS (Intelligent Transportation System) • LBS (Location Based Service) • Efficient content provisioning is required to provide scalable control over wide coverage areas while providing high levels of QoS with limited resources

208

CDN Mobile CDN Challenges • Mobile node constraints (limited storage, processing power, input capability) due to the portable size of mobile devices • Frequent network disconnections due to mobile users • Location oriented services regarding user mobility • Real time monitoring to obtain the real time status of mobile users

209

CDN CDN vs. Mobile CDN Features

CDN

Mobile CDN [Future]

Content Type

Static, Dynamic, Streaming

Static, Dynamic, Streaming

Users Location

Fixed

Mobile, Fixed

Surrogate Location

Fixed

Fixed, [Mobile]

Surrogate Topology

ISP (Internet Service Provider) Local, Center of Service Area

BSs (Base Stations), RAN (Radio Access Network) Systems, [Mobile Devices]

Maintenance Complexity

Low~Medium

Medium~High [Dynamic]

Services

Multimedia & Data Services, etc.

Mobile Apps, LBS, [Mobile] Cloud, etc.

210

CDN

REFERENCES 211

References • “Content Delivery Functional Architecture in NGN,” Telecommunication Standardization Sector of ITU, White Paper, Sep. 2010. • “Content delivery networks: Market dynamics and growth perspectives,” Informa Telecoms & Media, White Paper, Oct. 2012. • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networkingindex-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015] • Akamai, http://www.akamai.com/index.html/ • LimeLight, http://www.limelight.com/ • Level 3, http://www.level3.com/ • CDNetworks, http://www.us.cdnetworks.com/

212

Cloud Computing, Big Data, & CDN Emerging Technologies

Short Description

Description

Comments

We need your help!