Download 091104-M Hornick-Slides-Oracle Data Mining Case Study...
Oracle Data Mining for Text, Clustering, and Classification: Case Study of a Recommendation Engine Mark Hornick Pablo Tamayo Senior Manager, Development Consulting MTS
[email protected] [email protected] Data Mining Technologies Group Copyright © 2009 Oracle Corporation
Introduction
Recommendation Engine at Oracle OpenWorld Conference 2008 2009
Recommend conference sessions to attendees Enhance session enrollment application Use Oracle Data Mining and Oracle Data Miner UI K-means, Naïve Bayes, Text Mining, Code Generation
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
High Level Objectives
Help attendees find relevant sessions
Maximize individual OOW experience / value Increase session attendance
Copyright © 2009 Oracle Corporation
Technical Objectives and Constraints
Recommend 2009 sessions before any history of who registered for any 2009 sessions Use no session ratings data from attendees Recommend sessions by relative preference Recommend exhibitors and demos for attendees Identify top N related sessions to a given session Use an automated data mining-based solution
Copyright © 2009 Oracle Corporation
Approach Deduction Query refinement Users specify what they want to retrieve
Induction Model-based recommendation engine Recommend sessions most relevant to attendee profile Improve likelihood of finding sessions of interest
…enhance Schedule Builder tool with Oracle Data Mining-generated session recommendations Copyright © 2009 Oracle Corporation
Enrollment Application – Schedule Builder
Oracle Data Mining Automatically sifts through data to find hidden patterns, discover new insights, and make predictions Wide range of capabilities Predict customer behavior (Classification) Predict or estimate a value (Regression) Group similar documents (Clustering and Text Mining) Identify factors that determine an outcome (Attribute Importance) Find profiles of targeted people or items (Decision Trees) Determine important relationships and “market baskets” (Associations) Extract higher-level text features (Feature Extraction) Find fraud or “rare events” (Anomaly Detection) …and others
Oracle Data Miner user interface supporting guided analytics Copyright © 2009 Oracle Corporation
Approach – 30,000 ft.
2008 Data - Sessions - Attendees - Attendance
Model Build Apply
2009 Data - Sessions - Attendees New attendee registers and completes survey
Ranked Session Recommendations for each Attendee
Approach – 30,000 ft.
Attendee logs into Schedule Builder
2009 Session recommendations filtered by user criteria
Ranked Sessions retrieved
Ranked Session Recommendations for Attendees
Success Metrics
Conversion rate % attendees who used at least 1 recommendation Enrollment vs. actual attendance
Test Metrics Enrichment curve Global measure of merrit
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Conference Session Recommendation Problem Sessions are single use No two are exactly alike conference to conference Sessions have no history and no future Don‟t know who will attend a given session until after the session No rating information available, attendance only
Infer preferences using higher level projections Session themes Attendee profiles
Copyright © 2009 Oracle Corporation
Conference Data OOW ‟08
Sessions (1850+) Title, abstract, track(s)
Attendees (41700+) Survey questions, position, product usage
Attendance (206700+) Who attended which sessions
Copyright © 2009 Oracle Corporation
Attendee Interests from OOW‟08 registration survey Applications Fusion Agile BEA EBS Hyperion Primavera PeopleSoft Siebel JD Edwards On Demand App Integration Architecture Development and Management Strategy Product Area Customer Relationship Management Governance, Risk, and Compliance Master Data Management Fulfillment (order management / logistics) Supply Chain Management / Planning Human Capital Management Procurement Project Management Business Intelligence Product Lifecycle Management Asset Lifecycle Management Enterprise Performance Management Financial Management
Technology Business Intelligence Security SOA, BPM, Web Services, App Server Content Management, Collaboration, Web 2.0 Predictive Analytics, Data Mining Database Enterprise Management Identity Management Warehousing Performance / Scalability, GRID / RAC High Availability Middleware Development .Net Database Java Fusion Development Service-Oriented Architecture Tools Development and Management Oracle Services Oracle Consulting Oracle Support Oracle University Oracle Linux Support Oracle Advanced Customer Services Oracle On Demand
Industry Automotive Chemicals Communications Consumer Good Education and Research Engineering, Construction and Real Estate Financial Services Healthcare High Tech Industrial Manufacturing Life Sciences Media and Entertainment Natural Resources Oil and Gas Professional Services Public Sector Retail Travel and Transportation
…and others
Data Preparation
Sessions Concatenate relevant columns to facilitate text mining
Attendance Remove duplicates
Attendees Synonyms in attribute values, e.g., state = OH and Ohio Incomplete data, e.g., region = null Multi-valued attributes requiring parsing, e.g., member of user groups separated by „;‟ or „/‟
Map data columns between 2008 and 2009 e.g., Advanced customer services split between Apps and Tech Free form columns, e.g., job title = Vice President, V.P., VP Copyright © 2009 Oracle Corporation
Free Form Fields Job Title Example
create table ATTENDEE09_PREP as … case when a.job_title like ''%Manager%'' then 1 else 0 end job_title_manager, case when a.job_title like ''%President%'' then 1 else 0 end job_title_president, case when a.job_title like ''%Vice%'„ then 1 else 0 end job_title_vice, case when a.job_title like ''%V.P.%'„ then 1 else 0 end job_title_president, case when a.job_title like ''%V.P.%'' then 1 else 0 end job_title_vice, … from ATTENDEE09
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Methodology
Build classification model to predict clusters for attendees, then score attendees for each cluster
Cluster Sessions
2008 Attendees
2008 Sessions
2008 Attendees
2008 Session Clusters (themes) Ranked Session Rec‟s
x
New 2009 Sessions
.86 .73
New 2009 Sessions Cluster Scores Vectors
.66
…
New 2009 Attendee Cluster Scores Vector New 2009 Attendees
= …
Vector multiply each attendee‟s cluster scores against each session‟s cluster scores for total order ranking of recommendations
Model Building and Scoring Details
Cluster sessions Concatenate all session-related text Text Mining data preparation – create text index Lexer with stemming Custom “stopword” list
Copyright © 2009 Oracle Corporation
Session S291749 integrate account Payable with Oracle Title: Integrating Oracle Accounts Imaging and Process Management
Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0 integrate Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your account Payable system by Oracle Financials Accounts utilize Oracle Imaging and Process Management utilizing and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed develop to process of invoices. invoice automate the processing
1.
Perform Stemming (example)
Session S291749 integrate account Payable with Title: Integrating Oracle X Accounts X Oracle X Imaging and X Process Management
Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0
X
X
XX
XX
integrate Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your account Payable system by Oracle Financials Accounts utilize Oracle Imaging and Process Management utilizing and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed develop to process of invoices. invoice automate the processing
X X X X X X X
X
1.
Perform stemming (example)
2.
Remove stopwords
X
XX X X XX X X
Creating a Text Index, Stoplist, Lexer Using Oracle Text CREATE INDEX session09_txt_idx ON session09_txt (session_txt) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER OOW_LEXER STOPLIST OOW_STOPLIST');
ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH'); ctx_ddl.set_attribute('oow_lexer','index_text','true'); ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST'); ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/ ctx_ddl.add_stopword('oow_stoplist', 'oracle'); Copyright © 2009 Oracle Corporation
Session Term Scores Example
Integrate
.23
Account
.04
Payable
.26
Imaging
.62
Process
.09
Management
.05
Technology
.17
Content
.08
Collaboration
.43
…
Copyright © 2009 Oracle Corporation
TF-IDF (term-frequency – inverse document frequency) Statistical measure evaluates importance of a given word to a document in a corpus Word importance increases proportionally to the number of times a word appears in document, but offset by frequency of word in corpus
Copyright © 2009 Oracle Corporation
TF-IDF Example One way to compute
Consider A session, S1, title and abstract containing 100 words Word „mining‟ appears 6 times in S1 Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06 Of 1850 sessions, say 25 contain the word „mining‟ Inverse document frequency is calculated as ln(1850 / 25) = 4.3 TF-IDF score for „mining‟ in S1 is 0.06 * 4.3, or 0.26 Copyright © 2009 Oracle Corporation
Session Term Scores Example
Integrate
.23
Account
.04
Payable
.26
Imaging
.62
Process
.09
Management
.05
Technology
.17
Content
.08
Collaboration
.43
Specify the maximum number of terms to represent entire corpus to represent the document
…
Copyright © 2009 Oracle Corporation
Model Building and Scoring Details
Cluster sessions Concatenate all session-related text Text Mining data prep – create text index Lexer with stemming Custom stop word list 1000 max terms in corpus 30 max terms per document Build k-Means model with 20 clusters (themes) Score 2008 and 2009 sessions to identify theme probabilities
Copyright © 2009 Oracle Corporation
Clustering Results for 2008 Sessions Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS
ClusterID Count 18 103 19 94 20 82 23 53 24 127 25 148 26 112 27 92 28 66 29 77 30 125 31 62 32 121 33 33 34 95 35 52 36 76 37 80 38 80 39 69
Model Building and Scoring Details
Classify attendee interests in themes Build Naïve Bayes model using 2008 attendees Predict 2009 attendee interest in each of the 20 themes
New 2009 Attendees
Copyright © 2009 Oracle Corporation
ATTEND_ID COMPANY_REVENUE DB_REL_ODB_10G DB_REL_ODB_8I DB_REL_ODB_9I DEV_EN_11G_PREVIEW DEV_EN_BORLAND_JBUILDER DEV_EN_ECLIPSE DEV_EN_MS_DOT_NET DEV_EN_MS_VISUAL_STUDIO DEV_EN_ORA_APPS_EXPRES DEV_EN_ORA_FORMS DEV_EN_ORA_JDEV_10G DEV_EN_ORA_SQL_DEV DEV_EN_OTHER DEV_EN_OTHER_JAVA_IDE DEV_EN_SQL_EDITORS DEV_EN_TEXT_EDITOR DEV_EN_TOAD DEV_EN_VI GEOGRAPHIC_REGION INDUSTRY ORACLE_PARTNER ORA_EBS ORA_JDE ORA_PS ORA_SIEBEL PROFIT_MAGAZINE_SUBSCRIPTION UG_MEM_APOUC UG_MEM_EOUC UG_MEM_HEUG UG_MEM_IOUG UG_MEM_OAUG UG_MEM_ODTUG UG_MEM_OHUG UG_MEM_QIUG UG_INFO_APOUC UG_INFO_EOUC UG_INFO_HEUG UG_INFO_IOUG UG_INFO_OAUG UG_INFO_ODTUG UG_INFO_OHUG UG_INFO_QIUG UG_INFO_DO_NOT_SEND_ORA_INFO JOB_TITLE_MANAGER JOB_TITLE_PARTNER JOB_TITLE_PROJECT_LEAD JOB_TITLE_MARKETING JOB_TITLE_PRESIDENT JOB_TITLE_VICE JOB_TITLE_DIRECTOR JOB_TITLE_ARCHITECT JOB_TITLE_ANALYST JOB_TITLE_DBA JOB_TITLE_DEVELOPER JOB_TITLE_SALES JOB_TITLE_PROD_MGR JOB_TITLE_CHIEF_OFFICER JOB_TITLE_CONSULTANT JOB_TITLE_SENIOR JOB_TITLE_STUDENT
Attendee Attributes
“Joe the DBA”
Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS
DB_REL_ODB_10G DEV_EN_TEXT_EDITOR DEV_EN_VI GEOGRAPHIC_REGION INDUSTRY ORACLE_PARTNER JOB_TITLE_DBA JOB_TITLE_SENIOR
1 1 1 Americas Aerospace Yes 1 1
Predict themes (clusters) for “Joe”
ClusterID 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Probability 0.0005 0.3997 0.0002 0.0005 0.0005 0.2190 0.4245 0.3010 0.0502 0.0009 0.0098 0.0031 0.0000 0.0038 0.0031 0.0260 0.0188 0.0278 0.0075 0.0994
How Does This Session Rank for Joe? Title: Integrating Oracle Accounts Payable with Oracle Imaging and Process Management Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0 Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.
Cluster Probabilities for Session S291749
Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS
ClusterID Probability 18 0.0023 19 0.0021 20 0.9534 23 0.0020 24 0.0020 25 0.0027 26 0.0018 27 0.0032 28 0.0018 29 0.0022 30 0.0026 31 0.0049 32 0.0037 33 0.0015 34 0.0016 35 0.0016 36 0.0027 37 0.0022 38 0.0037 39 0.0019
Computing this Session‟s Score Specifically for Joe… Theme (Cluster Name) INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT PLM-AGILE-PRODUCT-CONTACT-CENTER SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING DATABASE-11G-DATA-TECHNOLOGY-FEATURES RAC-DATABASE-MANAGER-GRID-AVAILABILITY ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING SOA-BPM-SERVER-APPLICATION-FUSION MEETING-SIG-IOUG-DATABASE-APPLICATION EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 JD-EDWARDS-ENTERPRISEONE-QUEST-OOW TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 12-SUITE-RELEASE-BUSINESS-PROCUREMENT OAUG-SIG-SUITE-TRANSPORTATION-USERS
ClusterID 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Session Joe's Cluster S291749 Cluster Probability Probability Product 0.0005 0.0023 = 0.000001 x 0.3997 0.0021 = 0.000848 x 0.0002 0.9534 0.000216 x = 0.0005 0.0020 = 0.000001 x 0.0005 0.0020 = 0.000001 x 0.2190 0.0027 = 0.000587 x 0.4245 0.0018 = 0.000780 x 0.3010 0.0032 = 0.000960 x 0.0502 0.0018 = 0.000088 x 0.0009 0.0022 = 0.000002 x 0.0098 0.0026 = 0.000025 x 0.0031 0.0049 = 0.000015 x 0.0000 0.0037 = 0.000000 x 0.0038 0.0015 = 0.000006 x 0.0031 0.0016 = 0.000005 x x 0.0260 0.0016 = 0.000041 x 0.0188 0.0027 = 0.000051 x 0.0278 0.0022 = 0.000062 x 0.0075 0.0037 = 0.000028 x 0.0994 0.0019 = 0.000191 SCORE: 0.003908
Recommendation Score Query
Copyright © 2009 Oracle Corporation
Session N
…
Session 1
Probability
select attend_id, session_id, score from ( select a.attend_id, s.session_id, sum(a.probability * s.probability) score from SESSION_TXT09_SCORES_T20 s, ATTENDEE09_SCORES_T20) a where a.prediction= s.cluster_id group by a.attend_id, s.session_id ) order by attend_id, score desc
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Evaluating Recommendations Producing Training (Build) and Test Datasets
„08 Session Data
Build
Test
‟08 Attendee Data
Build
Test Cross-sell / Up-sell Space: Recommend new sessions to same attendees
Build the models using these datasets
Test the models using these datasets
Projection Mining Space: Recommend new sessions to new attendees
Typical space for recommendations: Recommend same sessions to new attendees
Evaluating Results: Session Recommendation Curve Model scores as a function of rank
Dot == Scored Session Threshold separating high from low confidence recommendations
Linear behavior of recommendations
Represents the location of “hits” (attendee attended session)
Enrichment Curve
Recommendation Enrichment Score
Running calculation where enrichment is maximum deviation from 0
Point of maximum enrichment
Represents the location of “hits”
NE = 2.88
Lift = 3.07
ROC = 0.79
Model score
Model-ranked sessions
Attendee W1152645
Model-ranked sessions
NE = 1.63 Lift = 2.47 ROC = 0.71
Model score
Model-ranked sessions
Attendee W1144260
Model-ranked sessions
Model-ranked sessions
NE = 1.07 Lift = 1.55 ROC = 0.51
Model score
Model-ranked sessions
Attendee W1134872
Model-ranked sessions
Model-ranked sessions
Model-ranked sessions
Global Measure of Merit Random recommendations obtain an enrichment score of 1
PM Model
P(NE)
Random Model
NE Normalized Enrichment
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Recommending Exhibitors and Demos
Recommending Exhibitors and Demos
Use clustering model from session data Score exhibitors and demo text against 20 themes Use existing attendee theme scores to compute recommendation scores for each exhibitor and demo
New 2009 Attendees
2009 Exhibitors and Demos
Copyright © 2009 Oracle Corporation
Computing Related Sessions
Computing Related Sessions
Data preparation Focus on tracks, tags, categories Tokenize targeted terms from title and abstract fields E.g., “Oracle Data Mining” “OracleDataMining”
Cluster sessions into 200 clusters using K-Means
Multiply cluster score vectors for similarity score
Copyright © 2009 Oracle Corporation
Computing Related Sessions
…
Cluster Sessions
2009 Sessions
Score each session against each theme (cluster)
2009 Themes (200 clusters)
2009 Sessions 2009 Themes (200 clusters)
x
.95
=
.81
2009 Session Cluster Scores Vector
Other 2009 Sessions Cluster Scores Vectors
.67
…
Vector multiply each session‟s cluster scores against all other sessions‟ cluster scores for total order ranking of related sessions
…
…
…
Ranked Related Sessions
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario Overview Technical problem and data Methodology for OOW ‟08 and „09 Evaluating recommendation quality
New features for OOW „09 Demonstration OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
OOW‟08 Recommendation Engine Results
Distinct Schedule Builder visitors: 15667 Distinct visitors signup: 3266 Distinct visitors attended: 1775 Signup conversion rate: 20.3% (3266 / 15667) Attended conversion rate: 11.3% (1775 / 15667)
Conversion rate percentage of attendees who used at least 1 recommendation
Copyright © 2009 Oracle Corporation
Conversion Rates in other Domains
OOW Signup Sessions
20.3
OOW Attended Sessions 11.3
Circa 2004
OOW‟08 Recommendation Engine Results Detail
Recommendations Signup 1768 attendees (11.3%) selected exactly 1 820 (5.2%) selected 2 recommendations 678 attendees (4.3%) selected 3 or more 32 attendees selected between 8 and 10
Recommendations: Selected vs. Attended 2000 1500 Selected Count
1000
Attended Count
500 0 Exactly 1 Exactly 2
Actually Attended 1246 attendees (8%) attended exactly 1 382 (2.4%) attended 2 recommended sessions 147 attendees (0.9%) attended 3 or more 23 attendees attended between 5 and 9
Copyright © 2009 Oracle Corporation
More than 3
Summary Oracle Data Mining provides a robust platform for Text Mining and building a Recommendation Engine Oracle Data Mining with Oracle Data Miner code generation facilitated deployment of mining solution Recommendation evaluation techniques show the models were able to predict sessions of interest OOW conversion rates show that session recommendations were perceived useful to attendees
Copyright © 2009 Oracle Corporation
For More Information
search.oracle.com Oracle Data Mining
or
oracle.com
www.oracle.com/technology/products/bi/odm/index.html
The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle‟s products remains at the sole discretion of Oracle.
Copyright © 2009 Oracle Corporation