Project Report Final

September 21, 2018 | Author: Masum Hossain | Category: Apache Hadoop, Map Reduce, Big Data, Computer Cluster, Databases

Share Embed Donate

Report this link

Short Description

A final year project on Restaurants of Delhi using BigData Analaytics...

Description

APPENDIX 1

“Advertising promotions for Restaurants of Delhi using Big Data Analytics”

By Md. Masum Hossain Lakshmi Anugya Saraswat Pranjal Sinha

A PROJECT REPORT Submitted to the Department of Computer Sciences & Engineering In partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology

April, 2016

ii

DECLARATION

I hereby declare that this project work submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgment has been made in the text.

Place:

Signature of the Student

Date

Name: Md .Masum Hossain Lakshmi Anugya Saraswat Pranjal Sinha

iii

CERTIFICATE This is to certify that the report entitled “Advertising promotions for Restaurants of Delhi using Big Data Analytics By Mr. Md. Masum Hossain (Roll No.120102019), Ms.Lakshmi (Roll No.120102015), Anugya Saraswat (Roll No.130102801),Pranjal Sinha (Roll 110101177) to Sharda University, towards the fulfillment of requirements of the degree of Bachelor of Technology is record of bonafide final year Project work carried out by him/her in the Department of Computer Science, School of Engineering and Technology, Sharda University. The results/findings contained in this Project have not been submitted in part or full to any other University/Institute for award of any other Degree/Diploma.

Signature of Head of Department Name: Dr. Ishan Ranjan (Office seal)

Signature of Supervisor Name: Ms. Supriya Khaitan Designation: Asst. Professor

Place:

Date:

Date:

iv

Abstract This project deals for for Advertisement promotion for the restaurants of Delhi applying the Big Data Analytics. In the current era, Usage of Internet is rapidly increasing. The users are involved in Distributed processing of mass data through composed by many machines and personalized search services, based on the user profile have been the hotspots of research and development. Hadoop is a software platform which is easy for development and processing mass data. It is written by Java. Hadoop is scalable, economical, efficient and reliable. It can be deployed to a big cluster composed of hundreds of low-cost machines. Main purpose of analysis is extraction of food menu as per user requirement. The system finds out the user’s interested field of information through receiving, organizing and collating the user's information of b browsing, or mining data from the history, such as browser temporary files, personal favourites and many more. Choosing a food from online stores is quite confusing for most of the customers. Customers are interested in buying a food that has been widely acclaimed or available. On the other side, the owners of restaurants are also interested in knowing where their foods stand in competition. Both these issues tackled by the analysis of click-stream. Analysis of clickstreams show how a website is navigated and used by visitors. Click-stream data of online food stores of Delhi contains information which is useful for understanding the effectiveness of marketing and merchandising efforts, such as how customers find the store the food the environment, what food they see, and what food they purchase and finally what is their feedback. In our project, are tried to make an effort to help the customers in finding popular, largely sold food items by the restaurants of Delhi much faster then the normal available methods. For this purpose intend to create a platform that will maintain user profile and also food product advertisements. intend to use Hadoop for click-stream analysis based on the user profile and click-stream analysis; our website will display only that advertisement which helps the customer in arriving decisions. In other words, contents of bpage displayed to customer will be determined on the basis of user profile. With the advancement of technology, large number of people are buying and selling food online. There are commonly used techniques for online marketing such as use of banner cards, email campaigns. Being one of the biggest cosmopolitan cities in India Delhi restaurants and its sole websites are very busy sites. So the effective marketing depends largely on success of

v

online advertising and how fast is resonse is been given . Analysing the effectiveness of website is a matter to concern for corporates that rely on b marketing. b purchasing of food activities involves attracting and retaining customers. Traditional database technology is indeed useful in managing the online stores. Hover, it has serious limitations. The data generated by mouse clicks and corresponding logs are too large to be analysed by traditional technology. New technology such as big data is being explored for finding solution to above problems. In our project, have decided to use open source technology Hadoop. Today the term big data draws a lot of attention, but behind the hype there’s a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, hover, is a potential treasure delightful things of non-traditional, less structured data: blogs, social media, email, sensors, and photographs that can be mined for useful information. Decrease in the cost of storage and increase in computing por have made it possible to collect large data. As a result, more and more companies are now compelled to include non-traditional yet potentially valuable data with their traditional enterprise data and using it for their business intelligence analysis. To derive real business value from big data, need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyse it within the context of all enterprise data. As the world is turning towards use of internet for every day-to-day activity, need for viewing and selecting food of one’s choice is kind of prime importance to the restaurants.Same goes for the customers who are buying foods online the list of irrelevant advertisements frustrates the user, slow or delay in response to the queries proves to be the main reason for failures of most sites . But in our project have tried to demosntrate it smooth for the users to select products by filtering available products based on individual customer’s interests. The e-commerce field is emerging rapidly. Advertisers need a way to promote their products in market. This way is provided by personalized websites like this one. The reports provided by our website makes it easier for them to know the status of their products and hence take necessary measures in order to come up for the faced losses. To summarize, our project is a demonstration of using new technology like Big Data as an adapter between the advertiser and customer for saving money and time.

vi

ACKNOWLEDGEMENT

A major project is a golden opportunity for learning and self-development. I consider myself very lucky and honored to have so many wonderful people lead me through in completion of this project. First and foremost would like to thank Dr. Ishan Ranjan HOD CSE who gave us an opportunity to undertake this project. My grateful thanks to Asst. Prof. Ms. Supriya Khaitan for her guidance in project work. Advertising promotions for Restaurants of Delhi using Big Data Analytics, who in spite of being extraordinarily busy with academics, took time out to hear, guide and keep us on the correct path. We do not know where would have been without her help. CSE department monitored our progress and arranged all facilities to make life easier. We choose this moment to acknowledge their contribution gratefully. We are also grateful to Maxican students for their kind help.We really appreciate all the restaurants from Chittagong & Delhi that frequently provided us their official data.

Name and signature of Students Md.Masum Hossain (120102019) Lakshmi (120102015) Anugya Saraswat (130102801) Pranjal Sinha (110101177)

Table of Contents Appendix 2 Chapter-1 : Project Introduction

Page no.

1.1: Motivation …………………………………………………………….. 10 1.2: Overview ………………………………………………………………. 10 1.3: Expected outcome ……………………………………………………… 11 1.4: Gantt Chart ……………………………………………………………. 12 1.5: Possible risks ………………………………………………………….. 13 Chapter-2: Methodology 2.1: System view ………………………………………………………….. 17 2.2: System components & Functionalities ………………………………... 18 2.3: Data & relational views ………………………………………………. 19 Chapter-3: Design Criteria 3.1: System Design ………………………………………………………… 21 3.2: Design Diagrams ………………………………………………………. 22 3.3: Existing System ………………………………………………………... 23 3.4: Application areas ………………………………………………………. 25 3.5: Advantages of proposed system ……………………………………….. 25 3.6 System analysis ………………………………………………………… 26 Chapter-4: Development & Implementation 4.1: Developmental feasibility ……………………………………………… 30 4.2: Implementation Specifications ………………………………………… 31

8

Chatper-5: Results & Testing 5.1: Result …………………………………………………………………. 36 5.1.1: Success cases ……………………………………………………….. 39 5.1.2: Failure cases ………………………………………………………... 41 5.2: Testing ……………………………………………………………….. 41 5.2.1: Test results of various stages ……………………………………….. 42 Chapter-6: Conclusion & Future Improvements 6.1: Performance Estimation ……………………………………………… 44 6.3: Limitations …………………………………………………………… 46 6.4: Scope of Improvement ………………………………………………... 46 6.5 Conclusion …………………………………………………………….. 49

References

9

List of Tables

Page no.

Table 2.1: Features of HDFS…………………………………………………. 28 Table 2.2: Showing Hadoop response time…………………………………… 45 Table 2.3: Showing Data modeling comparison in RDBMS and Hadoop……. 46

List of Figures 2.1: Showing the work progress over time on the project ……………………. 12 2.2: Showing the work progress over time ……………………………………. 13 2.3: Showing Big –data insights ……………………………………………… 21 2.4: Architecture of online food promotion system ………………………….. 22 2.5: Working of a general restaurant ………………………………………... 23 2.6: How HDFS makes incoming file stores across cluster ………………… 27 2.7: How map-reduce software frame work works …………………………. 29 2.8: Outlook of the website where customers will be looking for food …….. 36 2.9: Section in the website where customers can see other customer’s feedback 37 2.10: Showing section in website for customer feedback ……………………... 37 2.11: Database for menu and other details entry ……………………………... 38 2.12: Adding an item to the database ………………………………………… 38 2.13: Customer feedback ……………………………………………………… 39 2.14: Cluster Summary ………………………………………………………... 39 2.15: Hadoop Task tracker ………………………………………………….... 40 2.16: Failure in connecting localhost: 50070 …………………………………. 41 2.17: Showing jdk1.8.0_77 installed in the system ………………………...… 42 2.18: Hadoop Version ………………………………………………………... 42 2.19: Successful startup of MapReduce ……………………………………… 43 2.20: Showing starting of Database file system in System …………………... 43

10

Appendix 2 CHAPTER-1 Introduction 1.1 Motivation In restaurants all around the country, it seems to have become acceptable for customers to send back a dish because they don’t like it. Not because it was cold, or too salty, or because of an untimely delay in the delivery, but purely because they happened to make the wrong call. We have some sympathy. It happens to us lot of times when order for some food then later change our mind, want to see foods that of similar kinds together in one place but too tired to search each restaurant . We’re all familiar with that pang of envy when our dining companion’s choice looks better than our own. And, with the not inconsiderable costs of eating out in some places, it is only human to begrudge paying for something we’ve decided are not going to enjoy. However, what people don’t tend to think about is that the moment we complain to a server about a dish or food are certainly just wasting our valuable time, subtly alter the very experience are paying for. Any restaurant will tell us this. Everything changes; we tried to see how to reduce the hassle of customers by not making them wait for food or ordering food for long time. So we are trying to make the restaurants introduced with the new technology Hadoop Big-Data, the use of which will make the whole process of ordering food much more easier for the customers and it won’t be necessary for the customers to wait and search in the different restaurants.

1.2 Overview of Project The main target of our project is to demonstrate the benefit of using Hadoop Big Data over normal databases in the market. Doing this want to help people to choose the right food for themselves; for this instance are going to create a database using Big Data Analytics where a user has to create a profile and mark his/her interests according to which the foods available from nearby restaurants will be shown to him/her. Then they can also know about the offer or price of the particular food item and the same time they will get to know about the offers related to a particular food item so it’s beneficial for the customer as well as the restaurants. So, ultimately we can say our project is to help people of a particular area to choose the right food for their health and taste. Every day the amount of people using internet to order products are increasing so is it happening on food items .people are

11

looking for food from different restaurants and trying to order the right food of their choice. General database system that is available in market has a slow response time in comparison to the query made to each database so ultimately as the pressure is building up on databases it’s getting slow to response to the queries made to each database.[3] But with the recently evolved technology Big Data we can make sure that the response is much faster as the database works in parallaly.so our criteria is to make a database with faster response rate to reduce the congestion of customers ordering food online .we assume that by getting faster responses and reducing the congestion can make the online food purchasing much more easier and convenient. Though we have few limitations as a Big data database requires a high capacity system such as (8-10GB RAM) also of course, Real-Time Big Data Analytics is not only positive as it also offers some challenges. It requires special computer programing: The standard version of Hadoop is, at the moment, not yet suitable for realtime analysis. New tools need to be bought and used. There are hover quite some tools available to do the job and Hadoop will be able to process data in real-time in the future. Using real-time insights requires a different way of working within any workflow: for example if any organization normally only receives insights once a week, which is very common in a lot of organizations’, receiving these insights every second will require a different approach and way of working. Insights require action and instead of acting on a weekly basis this action is now in real-time required. This will have an effect on the culture. The objective should be to making a place it can be an organization /restaurant/office an information-centric place.[7,8]

1.3 Expected outcome As our project was mainly based on Big-Data Hadoop, the main purpose of the project was to introduce the very new technology that is evolving every day and help the very crowded sites that is restaurant sites to be hassle free and can work without pressure so in order to that expect the restaurants to understand the working of Big-Data where normal databases which are working sequentially in the retrieval process whereas Big- data will work parallel. So, theoretically it shows that the retrieval process should be fast. Big Data has become a buzz phrase for data production, accumulation and analytic activities that have emerged as a result of accelerated data production triggered by declining computing por and storage costs. Big Data is perceived as a major key in addressing food security challenges and productivity improvements in a future era of significant resource constraints and climate change. Indeed, some agribusiness firms are positioning themselves to exploit data by developing systems to collect, store and analyze them. Therefore, while data and analytics are going to continue to be important resources for the food agribusiness industry, the real source of sustained superior performance will be stakeholders’ and their

12

employees’ sharper insights. So, we expect our system to show a path to the restaurants to make it one step ahead for the overall betterment and development and progress in the business and try to reduce the hassle on the customers.

1.4 Gantt Chart

Figure 2.1: Showing the work progress over time on the project This Gantt chart is showing the time taken by each works done in this project such as creating website, collecting website details, adding different closures to the website, user details, database, price details, offers and user tactics, cluster 1size, cluster 2 size, success rate, response rate among clusters, bug rate.

13

Figure 2.2: Showing the work progress over time 1.5: Possible Risks As with any business initiative, a big data project involves an element of risk. Any project can fail for any number of reasons: bad management, under-budgeting, or a lack of relevant skills. Hover, big data projects bring their own specific risks. Due to the advanced technology often needed and the relative newness of the skillsets required to truly “think big” (or as I prefer to say, “think smart”) with data, Businesspeople are used to taking risks – assessing those risks and safeguarding against them comes naturally, or don’t stay in business for very long. So there’s no need to be scared of big data. But of course we always need to be aware of dangers that could potentially arise if we fail to cover all of it. The phenomenon of “Big Data” exacerbates the tension between potential benefits and privacy risks by upping the ante on both sides of the equation. On the one hand, big data unleashes tremendous benefits not only to individuals but also to communities and society at large, including breakthroughs in health research, sustainable development, energy conservation and personalized marketing. On the other hand, big data introduces new privacy and civil liberties concerns including high-tech profiling, automated decision-making, discrimination, and algorithmic inaccuracies or opacities that strain traditional legal

14

protections The key problems has been:[4,5] I. Cost: Data collection, aggregation, storage, analysis, and reporting all cost money. On top of this, there will be compliancy costs – to avoid falling foul on the issues, raising in the previous point. These costs can be mitigated by careful budgeting during the planning stages, but getting it wrong at that point can lead to spiraling costs, potentially negating any value added to our bottom line by data-driven initiative. This is why “starting with strategy” is so vital. A well-developed strategy will clearly set out what intend to achieve and the benefits that can be gained so they can be balanced against the resources allocated to the project. One of the restaurants that coordinating with us was worried about the costs of storing and maintaining all the data it was collecting to the point that it was considering pulling the plug on one particular analytics project, as the costs looked likely to exceed any potential savings. By identifying and eliminating irrelevant data from the project, the restaurants re able to bring costs back under control and achieve its objectives. II. Bad Data: We have come across many big data projects that start off on the wrong foot by collecting irrelevant, out of date, or erroneous data. This usually comes down to insufficient time being spent on designing the project strategy. The big data gold rush has led to a “collect everything and think about analyzing it later” approach at many organizations. This not only adds to the growing cost of storing the data and ensuring compliance, it leads to large amounts of data that can become outdated very quickly. The real danger here is falling behind competition. If some restaurants are not analyzing the right data, we won’t be drawing the right insights that will provide value. Meanwhile, competitors most likely will be running their own data projects. And if they are getting it right, they’ll take the lead. Working with them restaurants, we’re able to show them how to cut the data down mostly infographics, which clearly shod the relevant data while omitting a lot of the noise. That’s just a simple checklist of the risks that every big data project needs to account for before one cent is spent on infrastructure or data collecting. Businesses of all sizes should engage wholeheartedly with big data projects. If they don’t, they run the serious risk of being left behind. But they also should be aware of the risks and enter into big data projects with their eyes wide open. In one example, NBC used test audiences but paid for it when many in the audience ranked successful shows such as Seinfeld poorly, and cheap copycats better. Eventually, the marketers discovered people were only responding to familiarity, not quality.[22]

15

III. Data Security: This risk is obvious and often uppermost in our minds when we are considering the logistics of data collection and analysis. Data theft is a rampant and growing area of crime – and attacks are getting bigger and more damaging. In fact five of the six most damaging data thefts of all time (eBay, JP Morgan Chase, Adobe, Target, and Evernote) re carried out within the last two years. The bigger the data, the bigger the target it presents to criminals with the tools to steal and sell it. In the case of Target, hackers stole credit and debit card information of 40 million customers, as well as personal identifying information such as email and geographical addresses of up to 110 million people. In March, a federal judge approved a settlement in which Target would pay $10 million into a settlement fund, from which payments of up to $10,000 would be made to everyone affected by the breach.so while developing anything in such big area have to keep in mind about this so that restaurants doesn’t face such kind of problems .[15,25]

1.5 Software Requirement specification: Functional Requirements: Database (Big-Data),

Hadoop compatible version installed,

Non-Functional Requirements: ing food menus on the website, Collecting data from database, Establishment of Cluster connection Users finding it easy to find the right food easily.

16

b) External interfaces. • When people will visit the website they will be making an ID for individual service to provide them the right food from all the possible restaurants of Delhi. • System will generate an inquiry via Hiveql and fetch queries from Big –data database. c) Performance. • The main target to choose Big-data using Hadoop is because its speed the same query being asked by both RDBMS and in HDFS the result was showing the difference d) Attributes: • Our software is easily portable once the systems are enabled with or installed with the HDFS. (It contains Hive QL method). • To maintain the software owners or supervisors has to update the database time to time also users must put their feedback regarding which food they prefer and appreciate to have on. • To ensure security for users separate user id will be made so users can see only the food they are looking for. • We value customer’s personal security no bank details or any other details will be taken during the time of registering oneself for the first time. e) Design constraints: • The most difficult part of the project is to maintain a big database which includes 1000 Gb data of different restaurants that comprises of customers need. • HiveQL is used for fetching data HDFS is the base for making a Big-data. To make the website HTML5, javascript,css being used. • Limitation of resources are there as since all the restaurants doesn’t agree to share their information’s. • Operating systems should have windows 2000/ME onwards. • SAS/ACESS engine. We were able to run our test queries through the SAS interface, which is executed in the Hive environment within our Hadoop cluster.[19,21]

17

Chapter-2 Methodology

2.1 System View The Online Product Promotion System is executed on Ubuntu platform. The Database is created for Actors to interact with system. This website provide to every user who logs in. The customer is shown with product of his interest (i.e. Product which matches to his profile). The Advertiser is provided with reports with any of his products. The processing in background to achieve these targets is as follows:[1,7] o The Database will be connected to website which will work as the interface to the user and database for real time updating. o The data stored in Click-dump will be transferred to Hadoops file system (HDFS) using Sqoop. ( Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as ll as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive Exports can be used to put data from Hadoop into a relational database. o The data from each restaurant will be stored on the HDFS in date-wise folder. The table is stored in HDFS will help to calculate the targets. o This stored click-dump will be processed by Pig Script (written in Pig Latin). This pig script will return number of clicks to each product and send this information to hive by loading it into it. o The hive will provide SQL like command to execute queries. These queries will produce target results. o This process will be scheduled for whole day. The cronjob will be applied to execute whole process one time in day. A second configuration involves building a big data system separately and in parallel (rather than integrated with) the restaurants existing production and enterprise systems. In this model, most companies still take advantage of the cloud for data storage but develop and experiment with enterprise-held big data applications themselves. This two-state

18

approach allows the restaurants to construct the big data framework of the future, while building valuable resources and proprietary knowledge within the company. That provides complete internal control in exchange for the duplication of much of the functionality of the current system and allows for a future migration to a full-fledged big data platform that will eventually allow both systems (conventional and big data) to merge.[23]

2.2 System components & Functionalities: As Big-Data requires a huge number of data to be arranged and retrieved in short amount of time the processor of the system should have a minimum of following components:  RAM 4Gb and above  Virtualization Activated else virtual machine installed or virtual workstation installed in the system.  Ubuntu 14 or above versions installed.  JDK 7 or above installed in the system.  Hadoop compatible version installed.

Functionalities: The US computer software company is the latest to develop its products to cope with the increasingly complex world of big data. The latest version of Hadoop Marketing Suite is designed to allow those within the marketing functions of a business to perform analysis on masses of historical data to predict future trends. The company states that the new developments will allow digital marketers to ‘improve a variety of digital marketing strategies, including personalized engagement, multi-channel campaign execution, and media monetization’. This will be achieved through the ability to forecast campaign results, performing risk analysis all through a predictive marketing dashboard. Brad Rencher, senior vice president and general manager of Digital Marketing Business at Hadoop, said:[5,9] “In the early days of digital marketing, analytics emerged to tell us what happened and, as analytics got better, why it happened. Then solutions emerged to make it easier to act on data and optimize results.”

19

“But the sheer amount of available data presents a challenge to quickly extract insights and act while those insights are still valuable. The new predictive capabilities within the Digital Marketing Suite address these challenges and help marketers turn big data into a big opportunity.” The announcement is indicative of something hear a lot at the Big Data Insight Group – that big data analytics is going to become an invaluable tool in different teams throughout all departments of a restaurant . Rather than being controlled by the IT department alone. Finance, business, marketing and product development will all be using masses of data to gain insights into various aspects of their organization and improve their planning and performance accordingly. To learn more about big data and the opportunities it could present to our organization, regardless of size or sector, may wish to attend the 1st Big Data Insight Group Forum. So more than anything our project will be focusing on navigating a path to insight and business value using technology’s hottest new trend.

2.3: Data & relational views: The high cost of rent, licensing and personnel are daunting. But restaurants face another obstacle: critics, not only professional journalists but amateurs who offer their opinions on social media. A bad review on Google or Yelp can undercut even the best-planned ventures. The importance of quality is why a number of restaurants has started using big data to develop a better understanding of consumer preferences and to improve their food and service. In some cases, these businesses have already achieved revenue gains as a result of their efforts. Using external data to improve a menu: Some restaurants are using outside software providers to gauge which dishes are likely to succeed and reduce the uncertainty of making menu changes. Food Genius aggregates data from restaurant menus around the country to better understand pricing, food and marketing trends. Indeed that’s the option they have in hand to stay in competition in the market but in the end when it comes to discussing to be the best in the market it surely is to do something or choose something which will make a difference to be the best in the market and stay to hold the position for the time while other companies or restaurants are still on general data, For example, a restaurant customer can see what types of food-related

20

keywords and phrases are trending online, the average price of a certain dish, and which menu items are growing or shrinking in popularity.[15,18] “The food industry for far too long has made decisions based on its gut,” says Justin Massa, CEO and founder of Food Genius. The Chicago-based company tracks menu items at more than 350,000 locations and has partnerships with food delivery services Seamless and Grub Hub.Massa said that the data can help restaurants seize opportunities in their niches. “The data is going to tell with something and give with important context but the thing it comes down to is the identity of brand. That’s going to tell us how we’re going to explore that data.” Increasing customer satisfaction with internal data some technical companies are helping restaurants improve operational efficiency. Avero, a restaurant software company, tracks purchases and voided items at point of sales. Restaurants use the data to improve server performance, develop tactics to increase sales and even identify thieving employees, says Sandhya Rao, vice president of marketing and products. Rao says that restaurants may target promotions to certain days or times of the month. According to a company case study among the 30-plus upscale casual restaurants that Avero works with, the average sales increase was five percent, or 250,000 each, over the course of a year. In future mobile apps that allow customers to leave reviews, sign up for loyalty programs, take surveys, and order food through their devices. “Operators can know customers better and customers can enjoy better experiences, which is an encouraging environment to keep them coming back and bringing their friends,” said Jitendra Gupta, CEO.Another company, TapSavvy, is also using customer insights to assist restaurants. After customers eat at one of the restaurants that TapSavvy serves, they receive a tablet to fill out a survey and express criticisms or compliments.By letting customers give feedback while they’re still in the restaurant, they’re less likely to take out their aggression online, says TapSavvy co-founder Yaniv Tal. “If a customer leaves unhappy, word spreads very quickly,” Tal says. To be sure, many restaurants are still not using big data. Yet Massa says that they are missing a potential opportunity to improve performance. He suggests that these businesses might begin by collecting information themselves. For our tests, we simulated a typical data warehouse-type workload where data is loaded in batch, and then queries are executed to answer strategic (not operational) business questions.

21

Chapter-3 Design Criteria

3.1: System Design: With the advancement of technology, large number of people are buying and selling food online. There are commonly used techniques for online marketing such as use of banner cards, email campaigns. Being one of the biggest cosmopolitan cities in India, Delhi restaurants and its sole websites are very busy sites. So the effective marketing depends largely on success of online advertising. Analyzing the effectiveness of website is a matter to concern for corporates that rely on web marketing. Web purchasing of food activities involves attracting and retaining customers. Traditional database technology is indeed useful in managing the online stores. Hover, it has serious limitations, when it comes to analyzing effectiveness of online ads[9,10]

Figure 2.3: Showing Big –data insights From this point of view, study of online food product promotion for Delhi restaurants becomes an important aspect of b marketing. The data generated by mouse clicks and corresponding logs are too large to be analyzed by traditional technology. New technology such as big data is being explored for finding solution to above problems. In the paper, we

22

have decided to use open source technology Hadoop. Today the term big data draws a lot of attention, but behind the hype there’s a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, hover, is a potential treasure delightful things of non-traditional, less structured data: blogs, social media, email, sensors, and photographs that can be mined for useful information. Decrease in the cost of storage and increase in computing por have made it possible to collect large data. As a result, more and more companies are now compelled to include non-traditional yet potentially valuable data with their traditional enterprise data and using it for their business intelligence analysis. To derive real business value from big data, need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily let the customer find the right food in the possible short time. 3.2: Design Diagrams

Figure 2.4: Architecture of Online Food Promotion Food marketing and advertisements uses banner and/or referral sites to attract customers from other sites to an online store. The online food websites uses hyperlinks and image links within the store site for leading the customers to relevant pages.

23

 Classifying hyperlinks by their purpose  Tracking and measuring traffic on hyperlinks.  Analyzing effectiveness (revenue generated profit etc.)

Figure 2.5: working of a general restaurants

3.3 Existing System Most of the companies are not completely satisfied with the current level of data capture and analysis, most companies considering a move toward adopting big data technologies already have a well-staffed and relatively modern IT framework based on relational database (RDB) management systems and conventional data warehousing. Any company already managing a large amount of structured data with enterprise systems and data warehouses is therefore fairly well versed in the day-to-day issues of large-scale data management. It would seem natural for those companies to assume that, as big data is the next big thing happening in the evolution of information technology, it would make sense for them to simply build a NoSQL-type/Hadoop-type of infrastructure themselves, incorporated directly into their current conventional framework. In fact, ESG, the advisory and IT market research firm, estimated that at the beginning of 2014, more than half of

24

large organizations will have begun this type of do-it-with yourself approach. As we've seen, as open source software, the price of a Hadoop-type framework (free) is attractive, and it is relatively easy, providing the company has employees with the requisite skills to begin to work up Hadoop applications using in-house data or data stored in the cloud. Currently many organizations are trying to implement Big Data technology in there database system which is a big step towards fast data transfer. The existing systems in the market has shifted its database systems into Big-Data analytics Hadoop based work once they got to know about it’s working methodology .Today big companies in the world such as Twitter and Facebook uses Big-Data to handle or manage their database . But experimenting with some Hadoop/NoSQL applications for the marketing department is a far cry from developing a fully integrated big data system capable of capturing, storing and analyzing large, multi-structured data sets. In fact, successful implementation of enterprisewide Hadoop frameworks is still relatively uncommon, and mostly the domain of very large and experienced data-intensive companies in the financial services or the pharmaceutical industries. As we have seen, many of those big data projects still primarily involve structured data and depend on SQL and relational data models. Large-scale analysis of totally unstructured data, for the most part, still remains in the rarified realm of powerful Internet tech companies like Google, Yahoo, Facebook and Amazon, or massive retailers like Wal-Mart. Although cloud-based tools have obvious advantages, every company has different data and different analytical requirements. Because so many big data projects are still largely based on structured or semi structured data and relational data models that complement current data management operations, many companies turn to their primary support vendors -- like Oracle or SAP -- to help them create a bridge between old and new and to incorporate Hadoop-like technologies directly into their existing data management approach. Oracle's Big Data Appliance, for example, asserts that its preconfigured offering -- once various costs are taken into account -- is nearly 40% less expensive than an equivalent do-it-with itself built system and can be up and running in a third less time. And, of course, the more fully big data technologies are incorporated directly into a company's IT framework, the more complexity and potential for data sprawl grows. Depending on configurations, full integration into a single, massive data pool (as advocated by big data purists) means pulling in unstructured, unclean data to a company's central data reservoir (even if that data is distributed) and potentially sharing it out to be analyzed, copied and possibly altered by various users throughout the enterprise, often using different configurations of Hadoop or NoSQL written by different programmers for different reasons. Add to that the need to hire expensive Hadoop programmers and data scientists. For traditional RDB managers, that type of approach raises the specter of untold additional data disasters, costs and rescue work requests to already overwhelmed IT staff.

25

3.4 Application Areas Our project is largely based on how the data can be extracted in faster and efficient way so that can let the customers of various restaurants have their queries about foods in much faster rate.so in a way are trying to open a path for the restaurants which are trying to grasp number of audience online to get their desired food from one place rather than visiting different websites and also the best part is the response time .This is the point where it all makes the difference where the customers doesn’t have to wait for long time because of data congestion. Some key application areas of our project are: t will let the customer get the right food in shortest possible time. nowing about the foods of one’s choice is far easier than ever expected. reating a database in real time and with the help of Big Data analytics will help users get real time results. Quick response will also hold the customers from the restaurant perspective.

3.5

Advantages of Proposed System

The advantages of processing data in Big Data in real-time are many:  Errors within the menu are known instantly. Real-time insight into errors helps restaurants to react quickly to mitigate the effects of an operational problem. This can save the operation from falling behind or failing completely or it can save customers from having to stop ordering or eating a particular food item or even the service.  New strategies of any restaurants are noticed immediately. With Real-Time Big Data Analytics we can stay one step ahead of the competition or get notified the moment restaurants direct competitor is changing strategy or reducing its prices for example.  Service improves dramatically, which could lead to higher conversion rate and extra revenue. When restaurants monitor the products that are used by its customers, it can pro-actively respond to upcoming failures. For example, cars with real-time sensors can notify before something is going wrong and let the driver know that the car needs maintenance.

26

 Fraud can be detected the moment it happens and proper measures can be taken to limit the damage. The financial world is very attractive for criminals. With a realtime safeguard system, attempts to hack into any restaurants website are notified instantly. So IT security department of any restaurant can take immediate appropriate action.  Cost savings: The implementation of a Real-Time Big Data Analytics tools may be expensive, it will eventually save a lot of money. There is no waiting time for business leaders and in-memory databases (useful for real-time analytics) also reduce the burden on a restaurants overall IT landscape, freeing up resources previously devoted to responding to requests for reports.  Better sales insights, which could lead to additional revenue. Real-time analytics tell exactly how well sales are doing and in case an internet retailer sees that a product is doing extremely well, it can take action to prevent missing out or losing revenue.  Keep up with customer trends: Insight into competitive offerings, promotions or customer movements provides valuable information regarding coming and going customer trends. Faster decisions can be made with real-time analytics that better suit the (current) customer. 3.6 System Analysis Hadoop (Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing por and the ability to handle virtually limitless concurrent tasks or jobs.) Big Data is a shift to scalable, elastic computing infrastructure; an explosion in the complexity and variety of data available; and the por and value that come from combining disparate data for comprehensive analysis make Hadoop a critical new platform for data-driven enterprises like restaurants. Our Database consists of two main components: 1. HDFS (Hadoop Distributed File System). 2. MapReduce

27

1. HDFS The file store is called the Hadoop Distributed File System, or HDFS. HDFS provides scalable fault-tolerant storage at low cost. The HDFS software detects and compensates for hardware issues, including disk problems and server failure. HDFS stores files across a collection of servers in a cluster. Files are decomposed into blocks, and each block is written to more than one (the number is configurable, but three is common) of the servers. This replication provides both fault-tolerance (loss of a single disk or server does not destroy a file) and performance (any given block can be read from one of several servers, improving system throughput).HDFS ensures data availability by continually monitoring the servers in a cluster and the blocks that they manage. Individual blocks include checksums. When a block is read, the checksum is verified, and if the block has been damaged it will be restored from one of its replicas. If a server or disk fails, all of the data it stored is replicated to some other node or nodes in the cluster, from the collection of replicas. As a result, HDFS runs very ll on commodity hardware. It tolerates, and compensate for, failures in the cluster. As clusters get large, even very expensive faulttolerant servers are likely to fail. Because HDFS expects failure, organizations can spend less on servers and let software compensate for hardware issues.

Figure 2.6: How HDFS makes incoming file stores across cluster

28

Feature Rack awareness

Description Considers a node’s physical location when allocating storage and scheduling tasks Minimal data motion Hadoop moves compute processes to the data on HDFS and not the other way around. Processing tasks can occur on the physical node where the data resides, which significantly reduces network I/O and provides very high aggregate bandwidth. Utilities Dynamically diagnose the health of the file system and rebalance the data on different nodes Rollback Allows operators to bring back the previous version of HDFS after an upgrade, in case of human or systemic errors Standby NameNode Provides redundancy and supports high availability (HA) Operability HDFS requires minimal operator intervention, allowing a single operator to maintain a cluster of 1000s of nodes Table 2.1: Features of HDFS

2. MapReduce HDFS delivers inexpensive, reliable, and available file storage. That service alone, though, would not be enough to create the level of interest, or to drive the rate of adoption, that characterize Hadoop over the past several years. The second major component of Hadoop is the parallel data processing system called MapReduce. Conceptually, MapReduce is simple. MapReduce includes a software component called the job scheduler. The job scheduler is responsible for choosing the servers that will run each user job, and for scheduling execution of multiple user jobs on a shared cluster. The job scheduler consults the NameNode for the location of all of the blocks that make up the file or files required by a job. Each of those servers is instructed to run the user’s analysis code against its local

29

block or blocks. The MapReduce processing infrastructure includes an abstraction called an input split that permits each block to be broken into individual records. There is special processing built in to reassemble records broken by block boundaries. The user code that implements a map job can be virtually anything. MapReduce allows developers to write and deploy code that runs directly on each Data Node server in the cluster. That code understands the format of the data stored in each block in the file, and can implement simple algorithms (count the number of occurrences of a single word, for example) or much more complex ones (e.g. natural language processing, pattern detection and machine learning, feature extraction, or face recognition). At the end of the map phase of a job, results are collected and filtered by a reducer. MapReduce guarantees that data will be delivered to the reducer in sorted order, so output from all mappers is collected and passed through a shuffle and sort process. The sorted output is then passed to the reducer for processing. Results are typically written back to HDFS. Because of the replication built into HDFS, MapReduce is able to provide some other useful features. For example, if one of the servers involved in a MapReduce job is running slowly most of its peers have finished, but it is still working the job scheduler can launch another instance of that particular task on one of the other servers in the cluster that stores the file block in question. This means that overloaded or failing nodes in a cluster need not stop, or even dramatically slow down, a MapReduce job.

Figure 2.7: How map-reduce software frame work works

30

Chapter-4: Development & Implementation

4.1: Developmental feasibility A computer controlled vending machine selling snack foods on credit at the Stanford Artificial Intelligence Laboratory, became one of the first Internet connected appliances. There began the saga of pervasive connectivity – where every device is plugged into everything else – creating the defining trend of 2010 to 2020. In fact, the Internet of Things is anticipated to burgeon to about of 26 billion units excluding PCs, smartphones and tablets by 2020 – and perhaps several categories of these items, that will be connected in 2020, don't even exist at present. The Internet of Things will explode connectivity, and it will also create value – as much as US$ 6.2 trillion in annual revenue by 2025 says a global consulting company. But it will also create massive, massive amounts of data – 40 zettabytes by 2020, according to one estimate. And as all know, the bulk – over 80% - of big data is unstructured, and in motion, existing in a variety of forms and formats both inside and outside company walls. Gathering this data is a huge challenge, but one that technology today is capable of. It’s what comes next - extracting accurate insights in real time and creating foresight from it that, enterprises are yet to nail. The difficult areas of this project was to collect huge amount of data from each restaurants, which in terms of time and size is bit of bulky so this involves much of burden on the system that is handling the work the system has to be capable of handling such amount of data and has to be large enough to be compatible with the data handling .minimum requirement of a system would be following: 

Dual Quad-core CPUs or greater that have Hyper-Threading enabled. We had to estimate our computing workload, needed to consider using a more powerful CPU.



Use High Availability (HA) and dual power supplies for the master node's host machine.



4-8 GBs of memory per processor core, with 6% overhead for virtualization.



Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth.

Though Big-data takes powerful systems to process but it is not impossible to assemble handful number of systems then using so many systems with limited or low power.

31

4.2: Implementation Specifications When re beginning with the Big Data Extensions deployment tasks, we made sure that our system meets all of the prerequisites. Big Data Extensions requires that needed to install and configure VMware, and that our environment meets minimum resource requirements. We had to also make sure that have licenses for the VMware components of deployment. VMware Requirements Before with can install Big Data Extensions, with must have set up the following VMware products. ■ Installing VMware 10.0 (or later) Enterprise or Enterprise Plus. Note The Big Data Extensions graphical user interface is only supported when using VMware b Client 10.0 and later. If install Big Data Extensions on vSphere 9.0, must perform all administrative tasks using the command-line interface. So had to install the latest version of VMware workstation. ■ When installing Big Data Extensions must use VMware® vCenter™ Single Sign-On to provide user authentication. When logging in can pass authentication to the VMware Single Sign-On server, which can configure with multiple identity sources such as Active Directory and OpenLDAP(OpenLDAP is a free, open source implementation of the Lightweight Directory Access Protocol (LDAP) developed by the OpenLDAP Project.) On successful authentication, with username and password is exchanged for a security token which is used to access VMware components such as Big Data Extensions. ■ Enable the vSphere Network Time Protocol on the ESXi hosts. The Network Time Protocol (NTP) daemon ensures that time-dependent processes occur in sync across hosts. Cluster Settings We had to configure our cluster with the following settings.

32

■ Enable Hyper V or Virtualization enabled from BIOS setup (Windows 10) ■ Enabled Host Monitoring. ■ Enabled Admission Control and set desired policy. The default policy is to tolerate one host failure. ■ The virtual machine restart priority was set to High ■ Set the virtual machine monitoring to virtual machine and Application Monitoring. ■ Set the Monitoring sensitivity to High. ■ Enabled vMotion and Fault Tolerance Logging. ■ All hosts in the cluster have Hardware VT enabled in the BIOS. ■ The Management Network VMkernel Port has vMotion and Fault Tolerance Logging enabled. Network Settings Big Data Extensions deploys clusters on a single network. Virtual machines are deployed with one NIC, which is attached to a specific Port Group. The environment determines how this Port Group is configured and which network backs the Port Group.

33

Either a vSwitch or vSphere Distributed Switch can be used to provide the Port Group backing a Serengeti cluster. vDS acts as a single virtual switch across all attached hosts while a vSwitch is per-host and requires the Port Group to be configured manually. When configuring network for use with Big Data Extensions, the following ports must be open as listening ports. ■ Ports 8080 and 8443 are used by the Big Data Extensions plug-in user interface and the Serengeti Command-Line Interface Client. ■ Port 9000 is used by SSH clients. ■ To prevent having to open a network firewall port to access Hadoop services, log into the Hadoop client node, and from that node which can access cluster. ■ To connect to the Internet (for example, to create an internal Yum repository from which to install Hadoop distributions), with may use a proxy. Direct Attached Storage Direct Attached Storage should be attached and configured on the physical controller to present each disk separately to the operating system. This configuration is commonly described as Just a Bunch of Disks (JBOD).We had to create VMFS Data-stores on Direct Attached Storage using the following disk drive recommendations. ■ 6-8 disk drives per host. The more disk drives per host, the better the performance. ■ 1-1.5 disk drives per processor core. ■ 7,200 RPM disk Serial ATA disk drives.

34

Resource Requirements for the vSphere Management Server and Templates ■ Resource pool with at least 3.5GB RAM ■ 40GB or more (recommended) disk space for the management server and Hadoop template virtual disks. Resource Requirements for the Hadoop Cluster ■ Data-store free space is not less than the total size needed by the Hadoop cluster, plus swap disks for each Hadoop node that is equal to the memory size requested. ■ Network configured across all relevant hosts, and has connectivity with the network in use by the management server. ■ HA is enabled for the master node if HA protection is needed. We have used shared storage in order to use HA or FT to protect the Hadoop master node. Hardware Requirements Host hardware is listed in the VMware Compatibility Guide. To run at optimal performance, install our vSphere and Big Data Extensions environment on the following hardware. ■ Dual Quad-core CPUs or greater that have Hyper-Threading enabled. If we can estimate our computing workload, consider using a more powerful CPU. ■ Used High Availability (HA) and dual power supplies for the master node's host machine. ■ 4-8 GBs of memory per processor core, with 6% overhead for virtualization.

35

■ Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth. Tested Host and Virtual Machine Support The following is the maximum host and virtual machine support that has been confirmed to successfully run with Big Data Extensions. ■ We have used my visual database 64 bit ■ Virtual hosts deployed on 4 physical hosts, running 3 virtual machines. Licensing With had to use a vSphere Enterprise license or above in order to use VMware High Availability (HA) and VMware Distributed Resources Scheduler (DRS). VMware's products predate the virtualization extensions to the x86 instruction set, and do not require virtualization-enabled processors. On newer processors, the hypervisor is now designed to take advantage of the extensions. However, unlike many other hypervisors, VMware still supports older processors. In such cases, it uses the CPU to run code directly whenever possible (as, for example, when running user-mode and virtual 8086 mode code on x86). When direct execution cannot operate, such as with kernel-level and real-mode code, VMware products use binary translation (BT) to re-write the code dynamically. The translated code gets stored in spare memory, typically at the end of the address space, which segmentation mechanisms can protect and make invisible. For these reasons, VMware operates dramatically faster than emulators, running at more than 80% of the speed that the virtual guest operating-system would run directly on the same hardware. In one study VMware claims a slowdown over native ranging from 0–6 percent for the VMware ESX Server. VMware's approach avoids some of the difficulties of virtualization on x86-based platforms. Virtual machines may deal with offending instructions by replacing them, or by simply running kernel-code in user-mode. Replacing instructions runs the risk that the code may fail to find the expected content if it reads itself; one cannot protect code against reading while allowing normal execution, and replacing in-place becomes complicated. Running the code unmodified in user-mode will also fail, as most instructions which just read the machine-state do not cause an exception and will betray the real state of the program, and certain instructions silently change behavior in user-mode.

36

Chatper-5 Results & Testing

5.1: Result: After starting collecting the menu of different restaurants around Delhi and Chittagong city finally collected around 135Gb data which includes pictures, video, menu, restaurant details .Our project has a vast area of exploration as began with creating a database that will hold the menu, delivery details of the food item, time, price, picture of the food, are have also added mail as feedback from the user.

Figure 2.8: Outlook of the website where customers will be looking for food We tried to make the outlook as better as we can so that customers feels it as comfortable as possible and spend some time looking for the items, it’s also user friendly as all the options are nearby for a new user even we are advancing to add online immediate help for the customer so that the customer can get the necessary help regarding their order and all this will allow them to search and find the right food in much easier and faster way.

37

Figure 2.9: Section in the website where customers can see other customer’s feedback

Figure 2.10: Showing section in website for customer feedback

38

Figure 2.11: Database for menu and other details entry

Figure 2.12: Adding an item to the database Adding items refers to adding food details such as photo of the food, cost, delivery date, code of the food category and the seller restaurant which dispatched the item.

39

Figure 2.13: Customer feedback 5.1.1: Success cases Our primary target is to make Hadoop single node cluster successfully made 3 clusters one of it (mirror) is the ACER aspire E-11. This system has 4GB of RAM with a 2.67 Ghz processing speed which can run initial programs of a Hadoop cluster.

Figure 2.14: Cluster Summary

40

Figure 2.15: Hadoop Task tracker

It shows that have successfully initiated Hadoop single node cluster in our system .as it’s showing the summary of our work where it includes total running nodes, running map reduce tasks, occupied map reduced tasks capacity, average task per node .In the task tracker status can see Hadoop running tasks and it’s status, non-running tasks and it’s status, tasks from running jobs and local logs. Hadoop being successfully installed required successful installation of JDK .There was three primary steps that had to start before testing successful integration of Hadoop cluster in the system:  Starting all database filesystem admin ‘dfs’  Starting all mapreduce functions in Hadoop ‘mapred’ These attempts will allow the user or admin to start the logging in permission for the user to access to the database made by Hadoop. After Starting the dfs and mapred in the command prompt it will show the process that will confirm that it is successfully loading the both dfs and mapred in the memory and start the local host for response to the system for any query by the user and create the possible entry made by admin or request for inquiry by the user.

41

5.1.2: Failure cases had three localhost address to check on the status of our single node cluster where have two address http://localhost:50060, http://localhost:50030 working properly as their screen shot has been added on the success cases. But for the address http://localhost:50070 it’s not responding:

Figure 2.16: Failure in connecting localhost:50070

5.2: Testing have tried to test the single node cluster that made and successfully achieved two addresses working properly which need to at-least make sure that a Hadoop single node cluster has been developed in the system and can initiate data entry in it, it provides multiple steps as have installed Hadoop version 1.2.1 which involves the system to work map-reduce terminology that will allow the user to access the database gradually once the database is of the size of estimated 135GB. Testing case 1, 2 successful as the local host responded after starting MapReduce and database file system and formatting namenode. It provided the information off detecting single node status in the system are currently working on.

42

5.2.1: Test results of various stages First tasted if the java JDK version is successfully installed in the system

Figure 2.17: showing jdk1.8.0_77 installed in the system Next we need check if the Hadoop is being installed in the system

Figure 2.18: Hadoop Version

43

Now testing startup of successful Map reduce in the system if it cannot load it will show an error message .If successful it will ask for user password and start MapReduce connection .

Figure 2.19: Successful startup of MapReduce Now test Hadoop database file system, to load it start the dfs load it in the system with the code: Start-dfs.sh

Figure 2.20: Showing starting of Database file system in System

44

Chapter-6: Conclusion & Future Improvements

6.1: Performance Estimation As per the capability of the system the smooth running of the process depends largely on the capacity of the system the more the capacity is better the performance will be good we’ re trying to compare the result of Hadoop and RDBMS

Table 2.2: Showing Hadoop response time The important takeaway is to understand at a high level how data is stored in HDFS and managed in the Hive environment. The physical data modeling experiments that performed ultimately affect how the data is stored in blocks in HDFS and in the nodes where the data is located and how the data is accessed. This is particularly true for the tests in which partitioned the data using the Partition statement to redistribute the data based on the buckets or ranges defined in the partitions. We began our experiments without indexes, partitions, or statistics in both schemas and in both environments. The intent of the first experiment was to determine whether a star schema or flat table performed better in Hive

45

or in the RDBMS for our queries. During subsequent rounds of testing, used compression and added indexes and partitions to tune the data, Data Modeling Considerations in Hadoop and Hive structures. As a final test, ran the same queries against our final data structures using Impala. Impala bypasses the MapReduce layer used by Hive

Table Name

RDBMS

PAGE_CLICK_FACT 573.18 GB

Hadoop (Text File)

Hadoop (Compressed Sequence File)

328.30 GB

42.28 GB

PAGE_CLICK_FLAT 1001.11 991.47 GB GB

124.59 GB

Table2.2: Showing Data modeling comparison in RDBMS and Hadoop

46

6.2: Limitations Due to limited amount of time could not collect ample amount of data also the created 3 clusters out of which had to shift to a single mirror cluster as carrying 3 clusters with huge capabilities is not an easy task. Moreover collecting different restaurant menus also involves the restaurant authority to approve and accept the proposal wanted to know the following questions:  What is the percentage of viewers who clicks on the advertisement?  How many of the visitors actually purchase food from the store? How much revenue/profit is generated by advertisement this menu will be shown in a project work .But it was never easy to find out the actual benefit .As due to security issues not all companies or restaurants will to provide the information. While installing Hadoop in the system had to also keep in mind about the capacity of the system as it has to be above 3.5 GB of RAM and above 2.00 Ghz of processing speed .Which was not easy to get in order to make different clusters ,In our computer lab at SET I -214 most of the computers has 1 GB RAM ,also processing speed was not up to standard ,combining three computer RAM could make one, If a single cluster g5ets down the whole system becomes unresponsive which is a difficult situation to solve . 6.4: Scope of Improvement There is ample amount of scope for us to improve this project as are covering only the restaurants of Delhi and not sales support. It focuses on food products, advertisers and customers. More specifically the system will be deigned to manage the product information. System also used to provide: 1. Statistical analysis to advertiser, offers. 2. This statistical analysis is limited to provide mostly clicked food product, showing food of users’ interest, report generation about food position in market. 3. Food displayed on dashboard must match to customer’s profile and it has a benefit of collecting the delivery report as feedback which can be used to improve food matters. With the advancement of technology, large number of people are buying and selling food online. There are commonly used techniques for online marketing such as use of banner cards, email campaigns. Being one of the biggest cosmopolitan cities in India Delhi restaurants and its sole websites are very busy sites. So the effective marketing depends largely on success of online advertising. Analyzing the effectiveness of website is a matter

47

to concern for corporates that rely on b marketing, purchasing of food activities involves attracting and retaining customers. Traditional database technology is indeed useful in managing the online stores. Hover, it has serious limitations, when it comes to analyzing effectiveness of online ads. Here, need to find answers to daunting questions such as: wers who clicks on the advertisement?

From this point of view, study of online food product promotion for Delhi restaurants becomes an important aspect of b marketing. The data generated by mouse clicks and corresponding logs are too large to be analyzed by traditional technology. New technology such as big data is being explored for finding solution to above problems. In the paper, have decided to use open source technology Hadoop. Today the term big data draws a lot of attention, but behind the hype there’s a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, hover, is a potential treasure delightful things of non-traditional, less structured data: blogs, social media, email, sensors, and photographs that can be mined for useful information. Decrease in the cost of storage and increase in computing power have made it possible to collect large data. As a result, more and more companies are now compelled to include non-traditional yet potentially valuable data with their traditional enterprise data and using it for their business intelligence analysis. To derive real business value from big data, we need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyses it within the context of all our enterprise data.

48

Conclusion: As the world is turning towards use of internet for every day-to-day activity, need for viewing and selecting food of one’s choice is kind of prime importance to the restaurants. The list of irrelevant advertisements frustrates the user, which proves to be the main reason for failures of most sites. But our website makes it smooth for the users to select products by filtering available products based on individual customer’s interests. The e-commerce field is emerging rapidly. Advertisers need a way to promote their products in market. This way is provided by personalized websites like this one. The reports provided by our website makes it easier for them to know the status of their products and hence take necessary measures in order to come up for the faced losses. So, our project is an effort to minimize the hard work of people and restaurants and get the things they want in shortest possible time from one place we tried to explore the possibilities that can be used in regards to handles peoples food matters that relates the restaurants with evolving technology that can reduce the hassle and make customers feel a wonderful experience ordering food online while we try to make sure customers don’t need to visit different restaurants online in one place they are getting all their necessary items ,price, menu and can also view the feedback from other customers .although databases don’t solve all aspects of the big data problem, several tools — some based on databases — get part-way there. What’s missing is two side folded: First, we must improve statistics and machine learning algorithms to be more robust and easier for unsophisticated users to apply, while simultaneously training students in their intricacies. Second, we need to develop a data management ecosystem around these algorithms so that users can manage and evolve their data, enforce consistency properties over it, and browse, visualize, and understand their algorithms

49

References [1] Running Hadoop on Ubuntu Linux,Windows(single-node cluster). http://www.michealnoll.com/tutorials/ running-Hadoop-on-Ubuntu-Linux-single-node-cluster, December 2012. [page 32,39,49,120-140] [2] Hadoop: The Denitive Guide. OReilly Media. From Avro to ZooKeeper, May 2012. [pqge 3-7] [3] The Unied Modeling Language User Guide. Addison sley, October 1998. [page 9-20] [4] Hortonworks Ari Zilka, CTO. Hadoop. 2011. [page,3,6,19,27] [5] Jeffrey Dean and Sanjay Ghemawat. The google le system. IEEE, 2004. [page 40-60] [6] ZHAI Yan-dong YANG Bin HUANG Lan*, WANG Xiao-i. Extraction of user prole based on the hadoop framework. IEEE, 2009. [page 19-29] [7] LI Chao-qing LI Xiang-yang. Several technical problems and solutions of mass data processing. Journal China College of Insurance Management. [page 4,29,33,40,52] [8] MIKE2.0. Big data denition. [page 1-7] [9] Roger S. Pressman. Software Engineering: A Practitioners Approach. 7th edition, McGrawHill, 2012. [page 9] [10] Howard Gobioff Sanjay Ghemawat and Shun-Tak Leung. Mapreduce: Simplied data processing on large clusters. IEEE, 2004.[page 9,12,33] [11] Pig Programming. OReilly Media inc., Alan gates, Octomeber 2011. [page 3-100] [12] Apache Sqoop Cookbook, OReilly Media, Inc.,Kathleen Ting and Jarek Jarcec Cecho,July,2013. [13].Indian restaurants scenario in current days over time http://india.blogs.nytimes.com/2012/05/01/in-india-more-food-and-more-suffering/?_r=0 [14] BUSINESS INTELLIGENCE AND ANALYTICS: FROM BIG DATA TO BIG IMPACT by Hsinchun Chen, Roger H. L. Chiang, Veda C. Storey [ page 19-65] [15] Big Data: A Revolution That Will Transform How we Live, Work and Think by Viktor Mayer-Schonberger , Kenneth Cukier (John Murray Publishers Ltd). [page 1-7] [16] From Databases to Big Data by Sam Madden • Massachusetts Institute of Technology

50

[17] Benefit-Risk Analysis for Big Data Projects by Jules Polonetsky ,Omer Tene, Joseph Jerome [page 33-50] [18] Data Modeling Considerations in Hadoop and Hive by Clark Bradley, Ralph Hollinshead, Scott Kraus, Jason Lefler, Roshan Taheri October 2013[page17,40,49] [19]How big data is changing the database scenario for good http://www.infoworld.com/article/3003647/database/how-big-data-is-changing-thedatabase-landscape-for-good.html

-

[20] Analytics and Big Data: The Davenport Collection (6 Items); By Thomas H. Davenport, Jeanne G. Harris, Jinho Kim, Robert Morison [page 234-250] [21] Big Data & Analytics: Bangladesh on a Parallel World - Boomerang Blog www.boomerangbd.com/blog/.../big-data-analytics-bangladesh-on-a-parallel-world/ [22] IBM - PureData Big Data Analytics - Data Warehouse - Bangladesh www.ibm.com/ibm/puresystems/bd/en/big-data/ [23] Big Data & Analytics: Bangladesh on a Parallel World by Narmin Tartila [24] Big Data: A Revolution That Will Transform How We Live, Work, and Think by by Kenneth Cukier and Viktor Mayer Schonberger [page 9-135] [25] Hadoop For Dummies by Dirk Deroos page [23,50,99] [27] Hadoop Operations by Eric Sammer [page 90,124,223] [28] Hadoop in Practice by Alex Holmes . [page 23,29,57,180] [29] MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop by Donald Miner. [page 40,43,49,67,97] [30] Professional Hadoop Solutions by by Boris Lublinsky, Kevin T Smith, Alexey Yakubovich [page 34,45,77,89] [31] The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee and Jeff Cummings.[page 33,34,49]

Project Report Final

Short Description

Description

Comments

We need your help!