091104-M Hornick-Slides-Oracle Data Mining Case Study

March 3, 2018 | Author: krishnendu sengupta | Category: Cluster Analysis, Information Technology, Information Technology Management, Technology, Computing
Share Embed Donate


Short Description

Download 091104-M Hornick-Slides-Oracle Data Mining Case Study...

Description

Oracle Data Mining for Text, Clustering, and Classification: Case Study of a Recommendation Engine Mark Hornick Pablo Tamayo Senior Manager, Development Consulting MTS [email protected] [email protected] Data Mining Technologies Group Copyright © 2009 Oracle Corporation

Introduction

Recommendation Engine at Oracle OpenWorld Conference 2008 2009

Recommend conference sessions to attendees Enhance session enrollment application Use Oracle Data Mining and Oracle Data Miner UI K-means, Naïve Bayes, Text Mining, Code Generation

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

High Level Objectives

Help attendees find relevant sessions

Maximize individual OOW experience / value Increase session attendance

Copyright © 2009 Oracle Corporation

Technical Objectives and Constraints

Recommend 2009 sessions before any history of who registered for any 2009 sessions Use no session ratings data from attendees Recommend sessions by relative preference Recommend exhibitors and demos for attendees Identify top N related sessions to a given session Use an automated data mining-based solution

Copyright © 2009 Oracle Corporation

Approach Deduction Query refinement Users specify what they want to retrieve

Induction Model-based recommendation engine Recommend sessions most relevant to attendee profile Improve likelihood of finding sessions of interest

…enhance Schedule Builder tool with Oracle Data Mining-generated session recommendations Copyright © 2009 Oracle Corporation

Enrollment Application – Schedule Builder

Oracle Data Mining Automatically sifts through data to find hidden patterns, discover new insights, and make predictions Wide range of capabilities Predict customer behavior (Classification) Predict or estimate a value (Regression) Group similar documents (Clustering and Text Mining) Identify factors that determine an outcome (Attribute Importance) Find profiles of targeted people or items (Decision Trees) Determine important relationships and “market baskets” (Associations) Extract higher-level text features (Feature Extraction) Find fraud or “rare events” (Anomaly Detection) …and others

Oracle Data Miner user interface supporting guided analytics Copyright © 2009 Oracle Corporation

Approach – 30,000 ft.

2008 Data - Sessions - Attendees - Attendance

Model Build Apply

2009 Data - Sessions - Attendees New attendee registers and completes survey

Ranked Session Recommendations for each Attendee

Approach – 30,000 ft.

Attendee logs into Schedule Builder

2009 Session recommendations filtered by user criteria

Ranked Sessions retrieved

Ranked Session Recommendations for Attendees

Success Metrics

Conversion rate % attendees who used at least 1 recommendation Enrollment vs. actual attendance

Test Metrics Enrichment curve Global measure of merrit

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Conference Session Recommendation Problem Sessions are single use No two are exactly alike conference to conference Sessions have no history and no future Don‟t know who will attend a given session until after the session No rating information available, attendance only

Infer preferences using higher level projections Session themes Attendee profiles

Copyright © 2009 Oracle Corporation

Conference Data OOW ‟08

Sessions (1850+) Title, abstract, track(s)

Attendees (41700+) Survey questions, position, product usage

Attendance (206700+) Who attended which sessions

Copyright © 2009 Oracle Corporation

Attendee Interests from OOW‟08 registration survey Applications Fusion Agile BEA EBS Hyperion Primavera PeopleSoft Siebel JD Edwards On Demand App Integration Architecture Development and Management Strategy Product Area Customer Relationship Management Governance, Risk, and Compliance Master Data Management Fulfillment (order management / logistics) Supply Chain Management / Planning Human Capital Management Procurement Project Management Business Intelligence Product Lifecycle Management Asset Lifecycle Management Enterprise Performance Management Financial Management

Technology Business Intelligence Security SOA, BPM, Web Services, App Server Content Management, Collaboration, Web 2.0 Predictive Analytics, Data Mining Database Enterprise Management Identity Management Warehousing Performance / Scalability, GRID / RAC High Availability Middleware Development .Net Database Java Fusion Development Service-Oriented Architecture Tools Development and Management Oracle Services Oracle Consulting Oracle Support Oracle University Oracle Linux Support Oracle Advanced Customer Services Oracle On Demand

Industry Automotive Chemicals Communications Consumer Good Education and Research Engineering, Construction and Real Estate Financial Services Healthcare High Tech Industrial Manufacturing Life Sciences Media and Entertainment Natural Resources Oil and Gas Professional Services Public Sector Retail Travel and Transportation

…and others

Data Preparation

Sessions Concatenate relevant columns to facilitate text mining

Attendance Remove duplicates

Attendees Synonyms in attribute values, e.g., state = OH and Ohio Incomplete data, e.g., region = null Multi-valued attributes requiring parsing, e.g., member of user groups separated by „;‟ or „/‟

Map data columns between 2008 and 2009 e.g., Advanced customer services split between Apps and Tech Free form columns, e.g., job title = Vice President, V.P., VP Copyright © 2009 Oracle Corporation

Free Form Fields Job Title Example

create table ATTENDEE09_PREP as … case when a.job_title like ''%Manager%'' then 1 else 0 end job_title_manager, case when a.job_title like ''%President%'' then 1 else 0 end job_title_president, case when a.job_title like ''%Vice%'„ then 1 else 0 end job_title_vice, case when a.job_title like ''%V.P.%'„ then 1 else 0 end job_title_president, case when a.job_title like ''%V.P.%'' then 1 else 0 end job_title_vice, … from ATTENDEE09

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Methodology

Build classification model to predict clusters for attendees, then score attendees for each cluster

Cluster Sessions

2008 Attendees

2008 Sessions

2008 Attendees

2008 Session Clusters (themes) Ranked Session Rec‟s

x

New 2009 Sessions

.86 .73

New 2009 Sessions Cluster Scores Vectors

.66



New 2009 Attendee Cluster Scores Vector New 2009 Attendees

= …

Vector multiply each attendee‟s cluster scores against each session‟s cluster scores for total order ranking of recommendations

Model Building and Scoring Details

Cluster sessions Concatenate all session-related text Text Mining data preparation – create text index Lexer with stemming Custom “stopword” list

Copyright © 2009 Oracle Corporation

Session S291749 integrate account Payable with Oracle Title: Integrating Oracle Accounts Imaging and Process Management

Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0 integrate Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your account Payable system by Oracle Financials Accounts utilize Oracle Imaging and Process Management utilizing and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed develop to process of invoices. invoice automate the processing

1.

Perform Stemming (example)

Session S291749 integrate account Payable with Title: Integrating Oracle X Accounts X Oracle X Imaging and X Process Management

Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0

X

X

XX

XX

integrate Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your account Payable system by Oracle Financials Accounts utilize Oracle Imaging and Process Management utilizing and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed develop to process of invoices. invoice automate the processing

X X X X X X X

X

1.

Perform stemming (example)

2.

Remove stopwords

X

XX X X XX X X

Creating a Text Index, Stoplist, Lexer Using Oracle Text CREATE INDEX session09_txt_idx ON session09_txt (session_txt) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER OOW_LEXER STOPLIST OOW_STOPLIST');

ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH'); ctx_ddl.set_attribute('oow_lexer','index_text','true'); ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST'); ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/ ctx_ddl.add_stopword('oow_stoplist', 'oracle'); Copyright © 2009 Oracle Corporation

Session Term Scores Example

Integrate

.23

Account

.04

Payable

.26

Imaging

.62

Process

.09

Management

.05

Technology

.17

Content

.08

Collaboration

.43



Copyright © 2009 Oracle Corporation

TF-IDF (term-frequency – inverse document frequency) Statistical measure evaluates importance of a given word to a document in a corpus Word importance increases proportionally to the number of times a word appears in document, but offset by frequency of word in corpus

Copyright © 2009 Oracle Corporation

TF-IDF Example One way to compute

Consider A session, S1, title and abstract containing 100 words Word „mining‟ appears 6 times in S1 Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06 Of 1850 sessions, say 25 contain the word „mining‟ Inverse document frequency is calculated as ln(1850 / 25) = 4.3 TF-IDF score for „mining‟ in S1 is 0.06 * 4.3, or 0.26 Copyright © 2009 Oracle Corporation

Session Term Scores Example

Integrate

.23

Account

.04

Payable

.26

Imaging

.62

Process

.09

Management

.05

Technology

.17

Content

.08

Collaboration

.43

Specify the maximum number of terms to represent entire corpus to represent the document



Copyright © 2009 Oracle Corporation

Model Building and Scoring Details

Cluster sessions Concatenate all session-related text Text Mining data prep – create text index Lexer with stemming Custom stop word list 1000 max terms in corpus 30 max terms per document Build k-Means model with 20 clusters (themes) Score 2008 and 2009 sessions to identify theme probabilities

Copyright © 2009 Oracle Corporation

Clustering Results for 2008 Sessions Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS

ClusterID Count 18 103 19 94 20 82 23 53 24 127 25 148 26 112 27 92 28 66 29 77 30 125 31 62 32 121 33 33 34 95 35 52 36 76 37 80 38 80 39 69

Model Building and Scoring Details

Classify attendee interests in themes Build Naïve Bayes model using 2008 attendees Predict 2009 attendee interest in each of the 20 themes

New 2009 Attendees

Copyright © 2009 Oracle Corporation

ATTEND_ID COMPANY_REVENUE DB_REL_ODB_10G DB_REL_ODB_8I DB_REL_ODB_9I DEV_EN_11G_PREVIEW DEV_EN_BORLAND_JBUILDER DEV_EN_ECLIPSE DEV_EN_MS_DOT_NET DEV_EN_MS_VISUAL_STUDIO DEV_EN_ORA_APPS_EXPRES DEV_EN_ORA_FORMS DEV_EN_ORA_JDEV_10G DEV_EN_ORA_SQL_DEV DEV_EN_OTHER DEV_EN_OTHER_JAVA_IDE DEV_EN_SQL_EDITORS DEV_EN_TEXT_EDITOR DEV_EN_TOAD DEV_EN_VI GEOGRAPHIC_REGION INDUSTRY ORACLE_PARTNER ORA_EBS ORA_JDE ORA_PS ORA_SIEBEL PROFIT_MAGAZINE_SUBSCRIPTION UG_MEM_APOUC UG_MEM_EOUC UG_MEM_HEUG UG_MEM_IOUG UG_MEM_OAUG UG_MEM_ODTUG UG_MEM_OHUG UG_MEM_QIUG UG_INFO_APOUC UG_INFO_EOUC UG_INFO_HEUG UG_INFO_IOUG UG_INFO_OAUG UG_INFO_ODTUG UG_INFO_OHUG UG_INFO_QIUG UG_INFO_DO_NOT_SEND_ORA_INFO JOB_TITLE_MANAGER JOB_TITLE_PARTNER JOB_TITLE_PROJECT_LEAD JOB_TITLE_MARKETING JOB_TITLE_PRESIDENT JOB_TITLE_VICE JOB_TITLE_DIRECTOR JOB_TITLE_ARCHITECT JOB_TITLE_ANALYST JOB_TITLE_DBA JOB_TITLE_DEVELOPER JOB_TITLE_SALES JOB_TITLE_PROD_MGR JOB_TITLE_CHIEF_OFFICER JOB_TITLE_CONSULTANT JOB_TITLE_SENIOR JOB_TITLE_STUDENT

Attendee Attributes

“Joe the DBA”

Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS

DB_REL_ODB_10G DEV_EN_TEXT_EDITOR DEV_EN_VI GEOGRAPHIC_REGION INDUSTRY ORACLE_PARTNER JOB_TITLE_DBA JOB_TITLE_SENIOR

1 1 1 Americas Aerospace Yes 1 1

Predict themes (clusters) for “Joe”

ClusterID 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Probability 0.0005 0.3997 0.0002 0.0005 0.0005 0.2190 0.4245 0.3010 0.0502 0.0009 0.0098 0.0031 0.0000 0.0038 0.0031 0.0260 0.0188 0.0278 0.0075 0.0994

How Does This Session Rank for Joe? Title: Integrating Oracle Accounts Payable with Oracle Imaging and Process Management Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0 Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.

Cluster Probabilities for Session S291749

Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS

ClusterID Probability 18 0.0023 19 0.0021 20 0.9534 23 0.0020 24 0.0020 25 0.0027 26 0.0018 27 0.0032 28 0.0018 29 0.0022 30 0.0026 31 0.0049 32 0.0037 33 0.0015 34 0.0016 35 0.0016 36 0.0027 37 0.0022 38 0.0037 39 0.0019

Computing this Session‟s Score Specifically for Joe… Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS

ClusterID 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Session Joe's Cluster S291749 Cluster Probability Probability Product 0.0005 0.0023 = 0.000001 x 0.3997 0.0021 = 0.000848 x 0.0002 0.9534 0.000216 x = 0.0005 0.0020 = 0.000001 x 0.0005 0.0020 = 0.000001 x 0.2190 0.0027 = 0.000587 x 0.4245 0.0018 = 0.000780 x 0.3010 0.0032 = 0.000960 x 0.0502 0.0018 = 0.000088 x 0.0009 0.0022 = 0.000002 x 0.0098 0.0026 = 0.000025 x 0.0031 0.0049 = 0.000015 x 0.0000 0.0037 = 0.000000 x 0.0038 0.0015 = 0.000006 x 0.0031 0.0016 = 0.000005 x x 0.0260 0.0016 = 0.000041 x 0.0188 0.0027 = 0.000051 x 0.0278 0.0022 = 0.000062 x 0.0075 0.0037 = 0.000028 x 0.0994 0.0019 = 0.000191 SCORE: 0.003908

Recommendation Score Query

Copyright © 2009 Oracle Corporation

Session N



Session 1

Probability

select attend_id, session_id, score from ( select a.attend_id, s.session_id, sum(a.probability * s.probability) score from SESSION_TXT09_SCORES_T20 s, ATTENDEE09_SCORES_T20) a where a.prediction= s.cluster_id group by a.attend_id, s.session_id ) order by attend_id, score desc

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Evaluating Recommendations Producing Training (Build) and Test Datasets

„08 Session Data

Build

Test

‟08 Attendee Data

Build

Test Cross-sell / Up-sell Space: Recommend new sessions to same attendees

Build the models using these datasets

Test the models using these datasets

Projection Mining Space: Recommend new sessions to new attendees

Typical space for recommendations: Recommend same sessions to new attendees

Evaluating Results: Session Recommendation Curve Model scores as a function of rank

Dot == Scored Session Threshold separating high from low confidence recommendations

Linear behavior of recommendations

Represents the location of “hits” (attendee attended session)

Enrichment Curve

Recommendation Enrichment Score

Running calculation where enrichment is maximum deviation from 0

Point of maximum enrichment

Represents the location of “hits”

NE = 2.88

Lift = 3.07

ROC = 0.79

Model score

Model-ranked sessions

Attendee W1152645

Model-ranked sessions

NE = 1.63 Lift = 2.47 ROC = 0.71

Model score

Model-ranked sessions

Attendee W1144260

Model-ranked sessions

Model-ranked sessions

NE = 1.07 Lift = 1.55 ROC = 0.51

Model score

Model-ranked sessions

Attendee W1134872

Model-ranked sessions

Model-ranked sessions

Model-ranked sessions

Global Measure of Merit Random recommendations obtain an enrichment score of 1

PM Model

P(NE)

Random Model

NE Normalized Enrichment

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Recommending Exhibitors and Demos

Recommending Exhibitors and Demos

Use clustering model from session data Score exhibitors and demo text against 20 themes Use existing attendee theme scores to compute recommendation scores for each exhibitor and demo

New 2009 Attendees

2009 Exhibitors and Demos

Copyright © 2009 Oracle Corporation

Computing Related Sessions

Computing Related Sessions

Data preparation Focus on tracks, tags, categories Tokenize targeted terms from title and abstract fields E.g., “Oracle Data Mining”  “OracleDataMining”

Cluster sessions into 200 clusters using K-Means

Multiply cluster score vectors for similarity score

Copyright © 2009 Oracle Corporation

Computing Related Sessions



Cluster Sessions

2009 Sessions

Score each session against each theme (cluster)

2009 Themes (200 clusters)

2009 Sessions 2009 Themes (200 clusters)

x

.95

=

.81

2009 Session Cluster Scores Vector

Other 2009 Sessions Cluster Scores Vectors

.67



Vector multiply each session‟s cluster scores against all other sessions‟ cluster scores for total order ranking of related sessions







Ranked Related Sessions

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality

New features for OOW „09 Demonstration OOW‟08 results and summary

Copyright © 2009 Oracle Corporation

OOW‟08 Recommendation Engine Results

Distinct Schedule Builder visitors: 15667 Distinct visitors signup: 3266 Distinct visitors attended: 1775 Signup conversion rate: 20.3% (3266 / 15667) Attended conversion rate: 11.3% (1775 / 15667)

Conversion rate percentage of attendees who used at least 1 recommendation

Copyright © 2009 Oracle Corporation

Conversion Rates in other Domains

OOW Signup Sessions

20.3

OOW Attended Sessions 11.3

Circa 2004

OOW‟08 Recommendation Engine Results Detail

Recommendations Signup 1768 attendees (11.3%) selected exactly 1 820 (5.2%) selected 2 recommendations 678 attendees (4.3%) selected 3 or more 32 attendees selected between 8 and 10

Recommendations: Selected vs. Attended 2000 1500 Selected Count

1000

Attended Count

500 0 Exactly 1 Exactly 2

Actually Attended 1246 attendees (8%) attended exactly 1 382 (2.4%) attended 2 recommended sessions 147 attendees (0.9%) attended 3 or more 23 attendees attended between 5 and 9

Copyright © 2009 Oracle Corporation

More than 3

Summary Oracle Data Mining provides a robust platform for Text Mining and building a Recommendation Engine Oracle Data Mining with Oracle Data Miner code generation facilitated deployment of mining solution Recommendation evaluation techniques show the models were able to predict sessions of interest OOW conversion rates show that session recommendations were perceived useful to attendees

Copyright © 2009 Oracle Corporation

For More Information

search.oracle.com Oracle Data Mining

or

oracle.com

www.oracle.com/technology/products/bi/odm/index.html

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle‟s products remains at the sole discretion of Oracle.

Copyright © 2009 Oracle Corporation

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF