Info Miner

December 14, 2016 | Author: SaiKrishnaReddy | Category: N/A
Share Embed Donate


Short Description

Abstract for Info miner project...

Description

PROJECT PRESENTATION COMPETITION APOGEE 2011 ABSTRACT

COLLEGE NAME: Manipal Institute of Technolgy, Manipal TITLE OF PROJECT:

InfoMiner TEAM LEADER : Syed Aqueel Haider 9008420619

[email protected]

TEAM MEMBERS Rishabh Mehrotra (BITS Pilani)

9014516301

[email protected]

ABSTRACT TITLE OF PROJECT:

InfoMiner

CATEGORY PREFERENCE

Software Design (Adaptive Technology) OBJECTIVE : To develop a Business Intelligence model which automatically crawls the web for news articles and after detecting corporate news articles, find the company being talked about in those articles.

IMPLEMENTATION METHODOLOGY: Our project is divided into various modules: 

Automatically extracting/crawling news articles from the web



Classifying these news articles as corporate or non-corporate



Using Natural language Processing tools to find the name of the organization which is being talked about in the news article.

We use Nutch crawler to crawl the web for news articles and pre-process it by POS(Part-OfSpeech) tagging and NER(Named Entity Recognition) parser to extract features for training model. We use Support Vector Machine (LIBSVM toolkit) to train our classifier. All NLP techniques are implemented in Java.

APPLICATION : In this era of information overload, we require intelligent systems that can read, interpret and analyze information themselves. Our project is one which fulfils all these parameters. All companies need to be aware of their rivals as to what all things they are involved in, where on the web are they being talked about etc. Our project provides them with all they need to know about all other companies.

This project finds major applications in Business Intelligence.

JUSTIFY CHOICE OF CATEGORY: We use Machine Learning, specifically Support vector Machines, to train our classifier which automatically classifies corporate and non-corporate news articles. Also our system after extracting news articles, learns itself and is intelligent enough to find the name of the organization which is being talked about in the news article. Thus our system evolves an intelligence of its own and has a decision making capability using which it detects the main organization being talked about in the news. So it is fit for Adaptive Technology.

BASIC EXPLANATION OF THE PROJECT: With the rapid advancements in the field of information technology, the amount of information available has increased tremendously. News articles constitute the largest available portion of factual information about events happening in the world. Corporate news constitutes a major chunk of these news articles. Such news is related to a wide range of events such as acquisitions, mergers, Shares/stock performances, product launches, executive changes, projects, legal proceedings, among others. Now this is a huge amount of information and can be spread on the internet in a haphazard way. However, once organized in a systematic manner, this pool of information becomes potentially a very good resource for various tasks like analyzing the market trends of companies, helping in corporate decision making, tracking the activities of rival companies etc. This project finds a way of identifying corporate news from a collection of news articles and then pairing the news with the organization/company which is being talked about in the article. The model is capable of differentiating the main organization (which is the focus of the news) from other organizations which find mention.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF