Big Data Analytics Seminar Report 2020-21
August 6, 2024 | Author: Anonymous | Category: N/A
Short Description
Download Big Data Analytics Seminar Report 2020-21...
Description
Big Data Analytics
Seminar Report 2020-21
ABSTRACT Big data is a new driver of the world economic and societal changes. The world’s data collection is reaching a tipping point for major technological changes that can bring new ways in decision making, managing our health, cities, finance and education. While the data complexities are increasing including data’s volume, variety, velocity and veracity, the real impact hinges on our ability to uncover the `value’ in the data through Big Data Analytics technologies. Big Data Analytics poses a grand challenge on the design of highly scalable algorithms and systems to integrate the data and uncover large hidden values from datasets that are diverse, complex, and of a massive scale. Potential breakthroughs include new algorithms, methodologies, systems and applications in Big Data Analytics that discover useful and hidden knowledge from the Big Data efficiently and effectively. Big data analytics must also be team effort cutting across academic institutions, government and society and industry, and by researchers from multiple disciplines including computer science and engineering, health, data science and social and policy areas.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
CONTENTS
INTRODUCTION What is Data, Big Data, Big Data Analytics Benefits using Big Data Analytics History and Evolution of Big Data Analytics Why is Big Data Analytics Important Types of Big Data Characteristics of Big Data Applications of Big Data Advantages and Disadvantages of Big Data Tools used in Big Data Analytics The sources of Big Data Impact of Big Data on Business How it works and key technologies Big Data Analytics uses and challenges Lifecycle of Big Data Analytics Different types of Big Data Analytics CONCLUSION REFERENCES
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
INTRODUCTION Big data analytics is the process of examining large data sets containing a variety of data types – i.e., big data -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. The primary goal of big data analytics is to help companies make more informed business decisions by enabling data scientists, predictive modelers and other analytics professionals to analyze large volumes of transaction data, as well as other forms of data that may be untapped by conventional business intelligence(BI) programs. That could include Web server logs and Internet clickstream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things. With the launch of Web 2.0, a large amount of valuable business data started being generated beyond the organization by consumers and, generally, by web users. This data can be structured or unstructured, and can come from multiple sources such as social networks, products viewed in virtual stores, information read by sensors, GPS signals from mobile devices, IP addresses, cookies, bar codes, etc.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
What is Data? The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
What is Big Data? Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Data also exists in different formats, like structured data, semi-structured data, and unstructured data. For example, in a regular Excel sheet, data is classified as structured data—with a definite format. In contrast, emails fall under semistructured, and your pictures and videos fall under unstructured data. All this data combined makes up Big Data.
What is Big Data Analytics? Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things.
Benefits of using big data analytics:
Uncover the need for new features or products
Understand the full customer journey
More effective marketing
More effective customer support
Greater responsiveness to market trends
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
History and evolution of big data analytics The concept of big data has been around for years; most organizations now understand that if they capture all the data that streams into their businesses, they can apply analytics and get significant value from it. But even in the 1950s, decades before anyone uttered the term “big data,” businesses were using basic analytics (essentially numbers in a spreadsheet that were manually examined) to uncover insights and trends. The new benefits that big data analytics brings to the table, however, are speed and efficiency. Whereas a few years ago a business would have gathered information, run analytics and unearthed information that could be used for future decisions, today that business can identify insights for immediate decisions. The ability to work faster – and stay agile – gives organizations a competitive edge they didn’t have before.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Why is big data analytics important? Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. In the report Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50 businesses to understand how they used big data. He found they got value in the following ways: 1. Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business. 2. Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned. 3. New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Types of Big-Data Big Data is generally categorized into three different varieties. They are as shown below:
Structured Data Semi-Structured Data Unstructured Data
Structured Data owns a dedicated data model, It also has a well-defined structure, it follows a consistent order and it is designed in such a way that it can be easily accessed and used by a person or a computer. Structured data is usually stored in well-defined columns and also Databases. Example: Database Management Systems (DBMS)
Semi-Structured Data can be considered as another form of Structured Data. It inherits a few properties of Structured Data, but the major part of this kind of data fails to have a definite structure and also, it does not obey the formal structure of data models such as an RDBMS. Example: Comma Separated Values (CSV) File.
Unstructured Data is completely a different type of which neither has a structure nor obeys to follow the formal structural rules of data models. It does not even have a consistent format and it found to be varying all the time. But, rarely it may have information related to data and time. Example: Audio Files, Images etc.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Characteristics of Big Data
Volume Volume refers to the unimaginable amounts of information generated every second from social media, cell phones, cars, credit cards, M2M sensors, images, video, and whatnot. We are currently using distributed systems, to store data in several locations and brought together by a software Framework like Hadoop. Facebook alone can generate about billion messages, 4.5 billion times that the “like” button is recorded, and over 350 million new posts are uploaded each day. Such a huge amount of data can only be handled by Big Data Technologies. Variety As Discussed before, Big Data is generated in multiple varieties. Compared to the traditional data like phone numbers and addresses, the latest trend of data is in the form of photos, videos, and audios and many more, making about 80% of the data to be completely unstructured. Structured data is just the tip of the iceberg. Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Veracity Veracity basically means the degree of reliability that the data has to offer. Since a major part of the data is unstructured and irrelevant, Big Data needs to find an alternate way to filter them or to translate them out as the data is crucial in business developments. Value Value is the major issue that we need to concentrate on. It is not just the amount of data that we store or process. It is actually the amount of valuable, reliable and trustworthy data that needs to be stored, processed, analyzed to find insights. Velocity Last but never least, Velocity plays a major role compared to the others, there is no point in investing so much to end up waiting for the data. So, the major aspect of Big Data is to provide data on demand and at a faster pace.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Applications of Big Data Big Data is considered the most valuable and powerful fuel that can run the massive IT industries of the 21st Century. Big Data is being the most wide-spread technology that is being used in almost every business sector. Let us now check out a few as mentioned below. Travel and Tourism is one of the biggest users of Big Data Technology. It has enabled us to predict the requirements for travel facilities in many places, improving business through dynamic pricing and many more.
Financial and Banking Sectors extensively uses Big Data Technology. Big data analytics can aid banks in understanding customer behavior based on the inputs received from their investment patterns, shopping trends, motivation to invest and personal or financial backgrounds.
Big Data has already started to create a huge difference in the healthcare sector. With the help of predictive analytics, medical professionals and Health Care Personnel are now able to provide personalized healthcare services to individual patients.
Telecommunication and Multimedia sector is one of the primary users of Big Data. There are zettabytes of getting generated every day and to handle such huge data would need nothing other than Big Data Technologies.
Government and Military also use Big Data Technology at a higher rate. You can consider the amount of data Government generates on its records and in the military, a normal fighter jet plane requires to process petabytes of data during its flight. Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Benefits or advantages of Big Data Following are the benefits or advantages of Big Data: ➨Big data analysis derives innovative solutions. Big data analysis helps in understanding and targeting customers. It helps in optimizing business processes. ➨It helps in improving science and research. ➨It improves healthcare and public health with availability of record of patients. ➨It helps in financial tradings, sports, polling, security/law enforcement etc. ➨Any one can access vast information via surveys and deliver answer of any query. ➨Every second additions are made. ➨One platform carry unlimited information.
Drawbacks or disadvantages of Big Data Following are the drawbacks or disadvantages of Big Data: ➨Traditional storage can cost lot of money to store big data. ➨Lots of big data is unstructured. ➨Big data analysis violates principles of privacy. ➨It can be used for manipulation of customer records. ➨It may increase social stratification. ➨Big data analysis is not useful in short run. It needs to be analyzed for longer duration to leverage its benefits. ➨Big data analysis results are misleading sometimes. ➨Speedy updates in big data can mismatch real figures.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Tools Used in Big Data Analytics Here are some of the tools used in Big Data analytics:
Hadoop - helps in storing and analyzing data
Mongo DB - used on datasets that change frequently
Talend - used for data integration and management
Cassandra - a distributed database used to handle chunks of data
Spark - used for real-time processing and analyzing large amounts of data
STORM - an open-source real-time computational system
Kafka - a distributed streaming platform that is used for fault-tolerant storage
The Sources of Big Data The bulk of big data generated comes from three primary sources: social data, machine data and transactional data. In addition, companies need to make the distinction between data which is generated internally, that is to say it resides behind a company’s firewall, and externally data generated which needs to be imported into a system. Whether data is unstructured or structured is also an important factor. Unstructured data does not have a pre-defined data model and therefore requires more resources to make sense of it. The three primary sources of Big Data Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media that are uploaded and shared via the world’s favorite social media platforms. This kind of data provides invaluable insights into consumer behavior and sentiment and can be enormously influential in marketing analytics. The public web is another good source of social data, and tools like Google Trends can be used to good effect to increase the volume of big data.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Machine data is defined as information which is generated by industrial equipment, sensors that are installed in machinery, and even web logs which track user behavior. This type of data is expected to grow exponentially as the internet of things grows ever more pervasive and expands around the world. Sensors such as medical devices, smart meters, road cameras, satellites, games and the rapidly growing Internet Of Things will deliver high velocity, value, volume and variety of data in the very near future. Transactional data is generated from all the daily transactions that take place both online and offline. Invoices, payment orders, storage records, delivery receipts – all are characterized as transactional data yet data alone is almost meaningless, and most organizations struggle to make sense of the data that they are generating and how it can be put to good use.
Impact of Big Data on Business With the help of big data, companies aim at offering improved customer services, which can help increase profit. Enhanced customer experience is the primary goal of most companies. Other goals include better target marketing, cost reduction, and improved efficiency of existing processes. Big data technologies help companies store large volumes of data while enabling significant cost benefits. Such technologies include cloud-based analytics and Hadoop. They help businesses analyze information and improve decision-making. Furthermore, data breaches pose the need for enhanced security, which technology application can solve. Big data has the potential to bring social and economic benefits to businesses. Therefore, several government agencies have formulated policies for promoting the development of big data. Over the years, big data analytics has evolved with the adoption of agile technologies and the increase of focus on advanced analytics. There is no single technology that encompasses big data analytics. Several technologies work together to help Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
companies procure optimum value from the information. Among them are machine learning, artificial intelligence, quantum computing, Hadoop, in-memory analytics, and predictive analytics. These technology trends are likely to spur the demand for big data analytics over the forecast period. Earlier, big data was mainly deployed by businesses that could afford the technologies and channels used to gather and analyze data. Nowadays, both large and small business enterprises are increasingly relying on big data for intelligent business insights. Thereby, they boost the demand for big data. Enterprises from all industries contemplate ways of how big data can be used in business. Its uses are poised to improve productivity, identify customer needs, offer a competitive advantage, and scope for sustainable economic development.
How it works and key technologies There’s no single technology that encompasses big data analytics. Of course, there’s advanced analytics that can be applied to big data, but in reality several types of technology work together to help you get the most value from your information. Here are the biggest players: Machine Learning. Machine learning, a specific subset of AI that trains a machine how to learn, makes it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. And by building precise models, an organization has a better chance of identifying profitable opportunities – or avoiding unknown risks. Data management. Data needs to be high quality and well-governed before it can be reliably analyzed. With data constantly flowing in and out of an organization, it's important to establish repeatable processes to build and maintain standards for data quality. Once data is reliable, organizations should establish a master data management program that gets the entire enterprise on the same page.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Data mining. Data mining technology helps you examine large amounts of data to discover patterns in the data – and this information can be used for further analysis to help answer complex business questions. With data mining software, you can sift through all the chaotic and repetitive noise in data, pinpoint what's relevant, use that information to assess likely outcomes, and then accelerate the pace of making informed decisions. Hadoop. This open source software framework can store large amounts of data and run applications on clusters of commodity hardware. It has become a key technology to doing business due to the constant increase of data volumes and varieties, and its distributed computing model processes big data fast. An additional benefit is that Hadoop's open source framework is free and uses commodity hardware to store large quantities of data. In-memory analytics. By analyzing data from system memory (instead of from your hard disk drive), you can derive immediate insights from your data and act on them quickly. This technology is able to remove data prep and analytical processing latencies to test new scenarios and create models; it's not only an easy way for organizations to stay agile and make better business decisions, it also enables them to run iterative and interactive analytics scenarios. Predictive analytics. Predictive analytics technology uses data, statistical algorithms and machine-learning techniques to identify the likelihood of future outcomes based on historical data. It's all about providing a best assessment on what will happen in the future, so organizations can feel more confident that they're making the best possible business decision. Some of the most common applications of predictive analytics include fraud detection, risk, operations and marketing. Text mining. With text mining technology, you can analyze text data from the web, comment fields, books and other text-based sources to uncover insights you hadn't noticed before. Text mining uses machine learning or natural language processing technology to comb through documents – emails, blogs, Twitter feeds, surveys, competitive intelligence and more – to help you analyze large amounts of information and discover new topics and term relationships. Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Big data analytics uses and challenges Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information services providers. In addition, streaming analytics applications are becoming common in big data environments as users look to perform real-time analytics on data fed into Hadoop systems through stream processing engines, such as Spark, Flink and Storm. Early big data systems were mostly deployed on premises, particularly in large organizations that collected, organized and analyzed massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud. The same goes for Hadoop suppliers such as Cloudera-Hortonworks, which supports the distribution of the big data framework on the AWS and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as they need and then take them offline with usage-based pricing that doesn't require ongoing software licenses. Big data has become increasingly beneficial in supply chain analytics. Big supply chain analytics utilizes big data and quantitative methods to enhance decision making processes across the supply chain. Specifically, big supply chain analytics expands datasets for increased analysis that goes beyond the traditional internal data found on enterprise resource planning (ERP) and supply chain management (SCM) systems. Also, big supply chain analytics implements highly effective statistical methods on new and existing data sources. The insights gathered facilitate better informed and more effective decisions that benefit and improve the supply chain. Potential pitfalls of big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced data scientists and data engineers to fill the gaps.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Lifecycle of Big Data Analytics Now, let’s review the lifecycle of Big Data analytics: Stage 1 - Business case evaluation - The Big Data analytics lifecycle begins with a business case, which defines the reason and goal behind the analysis. Stage 2 - Identification of data - Here, a broad variety of data sources are identified. Stage 3 - Data filtering - All of the identified data from the previous stage is filtered here to remove corrupt data. Stage 4 - Data extraction - Data that is not compatible with the tool is extracted and then transformed into a compatible form. Stage 5 - Data aggregation - In this stage, data with the same fields across different datasets are integrated. Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to discover useful information. Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView, Big Data analysts can produce graphic visualizations of the analysis. Stage 8 - Final analysis result - This is the last step of the Big Data analytics lifecycle, where the final results of the analysis are made available to business stakeholders who will take action.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Different Types of Big Data Analytics There are four types of Big Data analytics:
Descriptive Analytics This summarizes past data into a form that people can easily read. This helps in creating reports, like a company’s revenue, profit, sales, and so on. Also, it helps in the tabulation of social media metrics. Use Case: The Dow Chemical Company analyzed its past data to increase facility utilization across its office and lab space. Using descriptive analytics, Dow was able to identify underutilized space. This space consolidation helped the company save nearly US $4 million annually.
Diagnostic Analytics This is done to understand what caused a problem in the first place. Techniques like drill-down, data mining, and data recovery are all examples. Organizations use diagnostic analytics because they provide an in-depth insight into a particular problem. Use Case: An ecommerce company’s report shows that their sales have gone down, although customers are adding products to their carts. This can be due to various reasons like the form didn’t load correctly, the shipping fee is too high, or there are not enough payment options available. This is where you can use diagnostic analytics to find the reason.
Predictive Analytics This type of analytics looks into the historical and present data to make predictions of the future. The predictive analytics uses data mining, AI, and machine learning to analyze current data and make predictions about the future. It works on predicting customer trends, market trends, and so on.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
Use Case: PayPal determines what kind of precautions they have to take to protect their clients against fraudulent transactions. Using predictive analytics, the company uses all the historical payment data and user behavior data and builds an algorithm that predicts fraudulent activities.
Prescriptive Analytics This type of analytics prescribes the solution to a particular problem. Perspective analytics works with both descriptive and predictive analytics. Most of the time, it relies on AI and machine learning. Use Case: Prescriptive analytics can be used to maximize an airline’s profit. This type of analytics is used to build an algorithm that will automatically adjust the flight fares based on numerous factors, including customer demand, weather, destination, holiday seasons, and oil prices.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
CONCLUSION Big Data Analytics is a security enhancing tool of the future. The amount of information that can be gathered, organized, and applied to users in a personalized fashion would take a human, days, weeks, or even months to accomplish. In the capitalistic market such as the United States of America’s, competition is key. Time cannot be wasted gathering information and making decisions on incidents that have already taken place. Stopping incidents in their tracks, completing investigative work, and quarantining threatening sources needs to happen immediately and allow for administrators/management to make a on the spot decision. With big data analytics, more educated decisions can be made and focus can remain on business operations moving forward. The availability of Big Data, low-cost commodity hardware, and new information management and analytic software have produced a unique moment in the history of data analysis. The convergence of these trends means that we have the capabilities required to analyze astonishing data sets quickly and cost-effectively for the first time in history. These capabilities are neither theoretical nor trivial. They represent a genuine leap forward and a clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and profitability. The Age of Big Data is here, and these are truly revolutionary times if both business and technology professionals continue to work together and deliver on the promise.
Dept. of Computer Engineering
GPC Kasaragod
Big Data Analytics
Seminar Report 2020-21
REFERENCES www.123seminarsonly.com www.wikipedia.com www.edureka.co
Dept. of Computer Engineering
GPC Kasaragod
View more...
Comments