SoLoMo Analytics for Telco Big Data Monetization - 06964900

May 27, 2016 | Author: Hán La | Category: N/A

Short Description

about SOLOMO...

Description

SoLoMo analytics for telco Big Data monetization The mobile Internet brought tremendous opportunities for businesses to capitalize on the vast amount of SoLoMo (social-location-mobile) data for delivering high-quality and personalized customer services. In this paper, we describe algorithms and technologies for discovering actionable customer insights using the combined power of social network, location pattern mining, and mobile usage analysis. We illustrate our implementation using Big Data platforms including IBM InfoSphereA BigInsights, IBM InfoSphere Streams, and IBM NetezzaA Data Warehouse, while addressing various Big Data-related challenges, such as context generation of unstructured data and high-performance analytics for both data at rest and data in motion. The presented system combines location, social interactions, and user behavior data to find like-minded communities. The system leverages Big Data capabilities to attempt to scale to support the subscriber base of large telecoms in an efficient manner.

Introduction In 2013, the mobile-phone user base reached almost 93% of the global population, with more than one billion smartphones in use [1]. Those devices have created a whole ecosystem of applications and services, which have rapidly changed our daily lives and continue to transform our society. This ongoing mobile revolution has brought tremendous opportunities for businesses to capitalize on the vast amount of data that is being generated through mobile phone usage. For example, phone calls are evidence of a social link between users. The applications used, or web pages visited, give hints of a user’s topical interest, activity, or commercial intention. The location of the device can also be used to enrich data about customers. Hence, the electronic traces of a mobile phone recorded by a telecom can be used to give deep insights about people’s interests, lifestyles, and social patterns. The need to analyze this data motivated us to develop a SoLoMo (social-location-mobile) analytics solution, which exploits knowledge of a user’s social network, locations, and online usage patterns to provide meaningful insights in a reasonable timeframe. One of the main challenges for processing this data is the size and the speed at which it is being generated. Every day, billions of electronic records are generated by mobile

Digital Object Identifier: 10.1147/JRD.2014.2336177

H. Cao W. S. Dong L. S. Liu C. Y. Ma W. H. Qian J. W. Shi C. H. Tian Y. Wang D. Konopnicki M. Shmueli-Scheuer D. Cohen N. Modani H. Lamba A. Dwivedi A. A. Nanavati M. Kumar

phone users through the phone calls they make, web activities they perform, the changes in their location, etc. New Big Data systems and technologies are required to handle such data (most of which are perishable) in a highly scalable, cost-effective, and fault-tolerant fashion. In this paper, we demonstrate our Big Data solution that is designed to process mobile data and discover actionable customer insights using the combination of social networks, location pattern mining, and mobile usage analysis. We also discuss details of the implementation, which was designed on top of Big Data platforms, including IBM InfoSphere* BigInsights, and IBM InfoSphere Streams [2], and IBM Netezza* Data Warehouse [3]. Specifically, we present a unified SoLoMo analysis approach that can enable insights that were previously not possible. In particular, we introduce a method for finding like-minded communities that factor in the location data as well as derived interests along with the social interaction data. By taking location into account, we can find communities of users that are not only connected but also share the same physical spaces. Using derived interests from the mobile browsing history can help us find communities that are not only connected but also have similar interests. The like-minded communities we thus find can be leveraged wherever a word-of-mouth or peer-pressure marketing strategy is deemed appropriate, e.g., recommendations or viral marketing. Our focus in this paper is to present a

ÓCopyright 2014 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. 0018-8646/14 B 2014 IBM

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

H. CAO ET AL.

9:1

Figure 1 SoLoMo architecture. (RT: real-time.)

system-oriented view, and due to space constraint, we do not present the results on the efficiency or accuracy of the algorithms. Detailed experiments with real-world data would provide a fertile territory for future research. The privacy policies and laws with respect to data mining of phone information do vary from country to country, and any such analysis would be expected to take into account such laws and sensitivities.

System architecture The primary motivation for building the tool was to address the issues faced in analyzing the huge amount of data generated by a telecom every day. The data in the telecom domain is unique and creates challenges not only related to the volume of the data, but also dealing with the variety and velocity of the data. Therefore, it was crucial to develop a system that can effectively deal with all the above-mentioned characteristics. Our proposed Big Data solution uses a modular architecture to address the challenges posed in this domain.

9:2

H. CAO ET AL.

The functionalities provided by the solution turn vast amounts of low-value inaccurate data into high-value customer insights. Figure 1 presents the five key components (marked with numbers in the figure) in the solution, as well as how this solution integrates with campaign management systems in the enterprise. IBM InfoSphere Streams component The IBM InfoSphere Streams component (denoted as 1 in Figure 1) is responsible for providing real-time performance on a continuous stream of data, such as network data and call data records (CDR). It is vital to obtain responses in real-time, since it is essential for some of the key functionalities of the solution to be carried out effectively. For example, real-time analysis of an event detected on the basis of location and mobile usage should be done to be able to instantaneously create a location-based service. We use IBM InfoSphere Streams [3] to provide such key features. The Streams-processed events are archived into a file-storage

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

format, and the analyzed data is saved in the IBM Netezza Data Warehouse. IBM InfoSphere BigInsights component To analyze the vast amount of CDR data, an IBM InfoSphere BigInsights [4] component (denoted as 2 in Figure 1) is used. BigInsights provides a Hadoop** MapReduce framework on top of which are implemented key functionalities such as general social network analytics (SNA) and spatial network analytics algorithms. These implementations help to efficiently extract each individual’s social neighborhood and key location characteristics. Note that the base Hadoop platform does not support social network analysis, and we add the parallel social network analysis accelerator to support the graph analysis. The compressed and cleaner customer data is then stored in a data warehouse. Telco analytics data warehouse/datamart component After IBM InfoSphere BigInsights and IBM InfoSphere Streams processing, the higher value insights are stored in a data warehouse with facts along three key dimensions. The social dimension contains the user’s social influence score and community affiliation. The location dimension stores facts about the user’s time and location. The mobile usage dimension stores the user’s top websites, mobile apps, and interests. In this component (denoted as 3 in Figure 1) in particular, we fully leverage the IBM Netezza system, which provides parallelism as well as integration capability with IBM SPSS* [5] to allow deep statistics and clustering algorithms to efficiently run on top of these base facts, so further insights can be derived. Another key feature of Netezza is that it natively supports in-database MapReduce. The inputs and outputs of MapReduce jobs are based on database tables. Modeling environment The new customer segments such as community, mobility, and lifestyle are discovered using graph algorithms and clustering models implemented through IBM SPSS (denoted as 4 in Figure 1). We describe these analytical models in subsequent sections. Visualization Finally, the solution also includes several web-based visualization components (denoted as 5 in Figure 1) that allow the end user of the solution to efficiently consume the SoLoMo insights. Those include visualizations related to social networks, location, and aggregation of the three key dimensions. We describe these in the sections below.

Social networks profiling We construct a social network of customers from call data records (CDRs) to mine social network features, which can provide deep insights into customer roles and behaviors on

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

social network, such as their social influence and the community to which they belong. These features can be used in various applications like viral marketing, customer targeting, churn prediction, etc. Figure 2 shows the architecture of the social networks profiling module. Social network features The individual social network features describe user social influence, activity, etc. Such features are defined based on the key performance indicators (KPIs) such as PageRank [6], in-degree, out-degree, etc., which are derived from SNA. The group-based social network features represent the characteristics of communities with strong and weak connections. These features are defined based on group-based SNA KPIs like communities, cliques, k-cores, etc., which are derived from group-based SNA. The layered approach shown in Figure 2 allows developers to customize their own social network features based on the SNA KPIs. Parallel social network analysis As shown in Figure 2, the parallel social network analysis is built upon the MapReduce computation platform. Above the MapReduce computing framework, we have a graph data model and the message passing framework (MPF) as the unified SNA framework. The graph data model is object-oriented and is represented by the adjacency list. MPF is a unified architecture for parallel graph analysis algorithms. SNA developers can use the graph object data model and MPF to implement arbitrary graph analysis algorithms. In the SNA KPIs layer, we implemented typical SNA algorithms such as weakly connected components, k-core, maximal cliques, community detection, etc., using the graph object model and MPF. The details of these parallel algorithms are described in [7, 8]. Front-end visualization Front-end visualization for social network profiling represents social networking features from two perspectives. In particular, the community graph visualization represents group-based social features, such as the closeness of community. The ego-network visualization depicts individual social features, such as PageRank, in-degree, and out-degree, etc. We divide the problem of community-graph rendering into two sub-problems: 1) how to cluster the nodes within the same community and 2) how to distinguish individual features among the group. While the first is a common problem of the graph layout, the second problem is related to node visualization methods that involve information visualization and graph-drawing techniques. We use a multidimensional scaling (MDS) graph layout to assign locations to individual nodes in multidimensional spaces, such that individual nodes that are in the same community

H. CAO ET AL.

9:3

Figure 2 Parallel social network analysis architecture. (SNA: social network analysis; KPI: key performance indicator; HDFS: Hadoop Distributed File System; HITS: hyperlink-induced topic search.)

are close to one another. Furthermore, individual nodes are displayed in different sizes and colors. Different colors correspond to different communities, and nodes with different sizes represent their different influence. The ego network for a given individual in a network is defined as the subgraph that represents all the direct relationships and two-step relationships between the selected individual and others. It indicates the impact of the selected individual on others.

Location-based profiling Location information can provide useful insights for building customer profiles. CDRs contain location information that reflects movements and behaviors of subscribers. Although the location data in CDRs is typically at cell granularity, which is spatially coarse and not as precise as GPS (Global Positioning System) data, this data is still useful in characterizing the mobility of users. There are numerous mobility features that can be extracted from the CDR sequence to build customer profiles from the location perspective. We classify the features into three levels, as shown in Figure 3. Location features We now describe the location features that we extracted from the CDRs. The first of these features are the low-level

9:4

H. CAO ET AL.

features (with physical context) that are directly extracted by statistics from the data layer, where the raw CDR data is combined with the map reference data. These features include the visiting frequency (can be represented by hotspots on map), the top-k frequently visited POI (points of interest) at specific location levels, and the top-k frequently visited POI types at aggregated concept levels, etc. With varying time window bases, these features can characterize customers’ behaviors at different time scales (e.g., last week, last month, etc.) and based on different temporal characteristics (e.g., working hours on working days, weekends and holidays, etc.). The middle-level features (with semantics context) are derived using probabilistic modeling and data mining techniques from the low-level features and the data layer. Such features include daily ranges of travel, speed (average and instant), likely transportation mode [9], likely home location(s) and work location(s) [10], popular routes given origin-destination (OD) pairs (e.g., home-to-work routes), likely OD pairs during a particular time period, and different types of trajectories’ geometric features, etc. The high-level features (with application context) are defined by application-driven methods. Different applications may have different focus on the customers’ mobility features. For instance, the managers of a shopping mall may not care about how far the target customers travel in a day, but

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

Figure 3 Location features extracted from CDR for location-based profiling. (OD: origin-destination; POI: point-of-interest).

may be more interested in which competitor shopping centers the target customers visit often. Such features can be defined, e.g., by combinations of low-level and middle-level features, with incorporating application driven thresholds. Location analysis Based on CDR data, the location analytics components consists of two parts: (1) off-line location analytics on data-at-rest, which extracts mobility features and models the customers’ moving patterns, and (2) real-time location analytics for customer targeting. Off-line location analytics The offline location analytics component maintains a subscriber data model as the base for analyses. The raw CDR data is first processed by BigInsights, and the parsed data representing the trajectories are stored in a data warehouse. The aforementioned three-level mobility features are then calculated according to pre-defined time-window configurations; thus, the customer profiles can be adaptive over time as new data comes. Data mining techniques provided by IBM SPSS are adopted in the data manipulations and the advanced pattern analyses. The two most typical moving patterns modeling tasks are customer micro-segmentation and association rule mining.

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

The customer micro-segmentation can be done by rule-based filtering or unsupervised learning. Filtering is preferable if the application user has clear requirements on what kind of customers they are targeting, and the provided features sufficiently characterize the target customers. Unsupervised learning such as clustering (e.g., k-means and DBSCAN [density-based spatial clustering of applications with noise]) [11] is done based on a feature set that the user is interested in. The subscriber clustering results typically are a number of subscriber groups. The subscribers in the same group behave similarly in the feature space, whereas the subscribers in different groups behave differently. The association rule mining aims to provide additional insights into customers’ behaviors. By combining additional data sources such as the pay-by-card data or mobile usage data, the analytics component is capable of determining the associations between behaviors (e.g., credit card consumption, web browsing, etc.) and the spatiotemporal contexts [12, 13]. Typical spatiotemporal contexts include the attributes of location, e.g., what kinds of POI are around, which can be dynamic and changing over time. As mentioned earlier, because CDR location data are typically at cell granularity, the location uncertainty may introduce some noise while carrying out location analytics.

H. CAO ET AL.

9:5

However, empirical results show that the noise often, if not always, can be tolerable in practice. The temporal characteristics of a behavior, such as day of week and hour in a day, are also important. For example, a typical association rule found from data can be: Sunday ^ noon ^ xxx mall ! shopping ^ have lunch ð25%; 86%Þ The first number in the parentheses is the rule support, and the second is the rule confidence [11]. This rule is interpreted as: (1) 25% of all subscribers go to xxx mall for lunch every Sunday at noon, and (2) once a subscriber visits the mall on Sunday at noon, there is an 86% possibility that he/she will buy something as well as have lunch. Such an analysis can provide predictive rules for a campaign. If it is known that some customers regularly go out (i.e., has high probability to go out) for dinner on weekends, a coupon sent on a Friday afternoon or evening may significantly impact a customer’s final decision. This helps the user to reach the target customers at the best time before the purchase really happens. Real-time location analytics The incoming data stream from the offline analytics contains location information (e.g., GPS location of a cell-tower). Based on the patterns and rules discovered from the off-line analytics, real-time location analytics processes the incoming data stream in real time to (1) calculate or infer the precise location from multiple sources of location data with different precisions, including 2G/3G/4G CDR, available GPS tracking, and WiFi offload, etc., and (2) trigger predictive model scoring with the inferred real-time location and other spatiotemporal contexts as inputs. If a subscriber is regarded as satisfying the promotion condition, then additional promotion actions will be taken, e.g., sending a coupon to the subscriber’s cell phone number.

Mobile usage profiling Telecom companies derive user profiles from both structured and unstructured data such as user demographics and analysis of CDRs. While social network and location analysis usually handles structured CDRs data, mobile usage analysis examines web browsing activity, extracted from the unstructured elements of Event Data Records (EDRs). EDRs extend CDRs beyond voice call, and they capture various telecom network activities such as sent message, web browsing, movie download, etc. From the browsing activity, we can understand the user interests and utilize them for customer targeting, micro-segmentation, and more. For example, users accessing www.nba.com are likely to be basketball fans and can be targeted with promotions of basketball-related products. We perform categorization and aggregation for utilizing web

9:6

H. CAO ET AL.

browsing activity information: browsed pages are first effectively categorized, thereby transforming opaque URLs (uniform resource locators) into meaningful categories; later on, categories are carefully aggregated into comprehensive user profiles. In the following sections, we describe these two phases for generating user profiles based on mobile usage. Mobile usage features With respect to high-level features, telecom companies monitor the data traffic that traverses their systems. In particular, each HTTP (Hypertext Transfer Protocol) request is recorded in a system log, containing all users’ interactions with web pages (documents). A log record of the form huser; document-url; contexti captures a single Buser document[ association, in some context. Note that context captures metadata extracted from the association, e.g., time, date, user agent, and content type. With low-level features, every web page has a unique URL made of mandatory scheme and domain along with optional port, path, and query: Bscheme://domain:port/path?query[ For brevity, we now ignore the scheme and port parts. We define three URL levels: domain, path, and query, respectively, made of just domain, domain with path, and all three. Similar to [14], we denote query URLs as dynamic queries. In this spirit, path URLs may contain information about static queries, which we also denote as a title queries. Dynamic queries are typical for search engines such as Google** and, Yahoo!**, as well as for internal searches of corporate networks. For example, from this query-level URL http://www.google.com/search?q=starbucks+menu we extract the dynamic query words Bstarbucks[ and Bmenu[. Title queries usually origin from URLs that represent articles, where we split the path into query words, separated by F_ or F_. For example, from http://news.yahoo.com/ will-states-accept-obama-s-insurance-exchange-fix214110316.html, a news article about Obama’s insurance exchange fix, we extract the title-query words Bwill[, Bstates[, Baccept[, Bobama[, Binsurance[, Bexchange[, and Bfix[. Mobile usage analysis In this section, we describe the analysis flow for mobile usage as depicted in Figure 4. Modeling web pages Common approaches to modeling web page data [15, 16] extract page content and metadata such as title, hyperlinks, and layout and apply categorization of this information. Our approach is making use of two public taxonomies: DMOZ Open Directory Project (ODP) [17] and Wikipedia** [18]. In the next paragraphs, we describe these open data sources and how they are utilized. The ODP (Open Directory Project), one of the largest collaborative sources for manually annotated web pages,

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

Figure 4 Mobile usage analysis flow.

categorizes more than 4 million web pages into more than 590,000 categories. Categories such as arts, business, and computers are expressed as a tree, where subcategories represent more specific concepts than their parents. For example, the branch BTop/Arts/Television/Networks[ is split into two subcategories, BTop/Arts/Television/Networks/ Cable[ and BTop/Arts/Television/Networks/Satellite[. Each category contains URL links to related web sites, along with a short description of each website. Most ODP URLs are of type domain or path. For example, the domain URL money.cnn.com and the path URL www.cnn.com/CNN/ Programs are both associated with the ODP category BArts/ Television/Networks/Cable/CNN[. Wikipedia is a dynamic collaborative free encyclopedia, containing more than 4 million articles, and more than 900,000 categories that are structured as a graph. Being highly dynamic, Wikipedia quickly reflects newly emerging events and concepts. Since Wikipedia articles tend to be quite wordy, it is adequate for categorizing query URLs - both dynamic-query and title-query URLs (e.g., emerging topics). URL pattern analyzer For the processing of dynamic-query and title-query URL forms and for handling some special case URLs, we employ

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

a hierarchical pattern analysis phase. Each pattern is driven by a regular expression and may depend on other patterns. Listing 1 shows an example pattern, applied for detecting dynamic-query URLs and extracting the query words. The element BRes[ enables specifying which parts of the regular expression is the result string, while the BSep[ field dictates how to split that string into words. The BGood Examples[ and BBad Examples[ elements allow validating patterns at system startup. Finally, the BType[ element classifies the detected pattern, dictating which categorization logic should be applied. The type value is one of BTitle Query[, BDynamic Query[, BExplicit[, and BPage.[ Matches of BTitle Query[ and BDynamic Query[ are handled by searching for categories over textual indices of ODP and Wikipedia. BExplicit[ matches are special patterns tailored to match certain families of URLs to a predefined category, e.g., BAdult[ category. BPage[ matches usually indicate that none of the pattern types applied. We expand on handling the pattern types in the categorization section. Categorization In this section, we describe the process of categorization that consists of creating indices from the ODP and Wikipedia sources, and the categorization flow.

H. CAO ET AL.

9:7

Listing 1

Example of pattern analyzer.

Utility indices A few utility indices are prepared in advance, and consulted with for categorization: Wikipedia taxonomy, Wikipedia textual index, ODP textual index, and ODP URL index. Wikipedia indices: The Wikipedia taxonomy index captures the category hierarchy of Wikipedia, with each category accessible by name and pointing to all its parent categories. Wikipedia textual index contains a document for each Wikipedia none-category document (i.e., articles that appear in Wikipedia). The text of that document is indexed and searchable, and the Wikipedia categories of that document are saved as metadata of that document. In addition to its immediate categories, their ancestor categories are indexed as metadata of that document, thereby maintaining the entire ancestor category set for the document, up to a certain predefined height. Our experiments indicated that in Wikipedia, ancestor categories of heights larger than four divert significantly from the original immediate category; therefore, we ignore higher ancestors. The ancestor height is maintained in the index - zero for the immediate category, one for its parent categories, etc. In this process, cycles that evidently exist in the Wikipedia taxonomy are avoided, by keeping only the shortest path to an ancestor. During search time (i.e., upon an arrival of query-type URL), we access both Wikipedia indices as follows: the Wikipedia textual index is accessed, and the best articles that match the queries are retrieved, then the Wikipedia taxonomy index is used to extract the labels of the categories associated with the retrieved documents. ODP indices: The ODP textual index contains a single document for each ODP category. The short description of that category is indexed and searchable for that document.

9:8

H. CAO ET AL.

ODP URL index also contains a document for each ODP category. The URLs associated with an ODP category are indexed into two searchable fields: the entire URL phrase is added to the BComplete URL[ field, and if the URL has no query part, it is also added to the BDomain Suffix Path Prefix[ (DSPP) field. For example, the URL http://www.x.y. z/a/b/c?q=r is added as is to the BComplete URL[ field but not to the DSPP field, while the URL http://www.x.y.z/a/b/c is added to both fields. At search time, when accessing the ODP URL index, a special tokenization applies for searching the DSPP field. For example, the URL http://www.x.y.z/a/b/c?q=r is tokenized into {B$$x.y.z/a/b/c[, B$$x.y.z/a/b[, B$$x.y.z/a[, B$$x.y.z[, Bx.y.z[, B$$y.z[, By.z[}, and the longest match of these tokens is returned. Note a few aspects of this tokenization: only the domain and path of the URL are considered; the first generic domain component (Bwww[) is ignored; the B$$[ string marks an entire domain part; first the path is trimmed into its prefixes, then the domain is trimmed into its suffixes; and the domain trimming stops at a domain name with two components. Categorization flow In this component, each input URL is passed through cascading logic, and the first step that holds would set the result category. We now explain the cascading steps. Step 1: Complete URLVWe search for the complete URL in the BComplete URL[ field of the ODP URL index. Step 2: URL pattern analysisVAnalysis is applied on the URL, and the URL is handled according to the result type of the pattern. For BExplicit[ type, the explicitly specified category is returned. For BDynamic Query[

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

Table 1 Example of user profile with top-three categories.

and BTitle Query[ types, the extracted words are used for constructing search queries, first for the ODP textual index and then for the Wikipedia taxonomy and textual indexes. Upon matching in the ODP textual index, the top result category is returned. Otherwise, first a category by the same name is searched for in the Wikipedia taxonomy. If one exists, it is returned. Otherwise, upon a matching Wikipedia textual index, the top 100 result documents are selected, and their indexed ancestor categories (up to heights 4) are accumulated, taking into account both document scores and ancestor heights. This results in a set of candidate categories, each with a score and name. To further select a meaningful category, a two-pass voting process is performed, in which candidate categories are voting for each other: first, each category propagates its score to the words that make up its name, and then each word propagates (back) its score to all the categories that contain it. The top scored category is returned. Step 3: ODP URL searchVWe search the DSPP field of the ODP URL index, tokenizing the input URL as explained above, and returning the longest (and hence first) match, if one exists. Step 4: Fail. Experiments with various logs from different geographies achieve average coverage of 87%. The quality is discussed in [19]. User profile Categorization turns user URL associations into user category associations and allows to aggregate higher-level user profiles. Similar to databases GROUP BY operator, profile categories are ranked by accumulation, and top ranked categories are selected. For a more consistent profile presentation, ODP categories can be mapped into Wikipedia categories as suggested in [20, 21]. Table 1 shows an example of a user profile consisting of top category accumulations.

Finding like-minded communities: Combining SoloMo analysis Understanding the target audience is important for designing a campaign. Typically a campaign for individual users leverages only clustering and micro-segmentation

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

techniques. However, if peer pressure or social influence are used in a campaign (e.g., in the case of a viral marketing campaigns), it is important to also identify the target groups, among which members of a group are not only well connected, but also share similar attributes such as common interests or shopping patterns. We call the generated group with such features a like-minded community [20]. The similarity of a pair of users is calculated using the cosine similarity between a pair of users. The like-mindedness of a community is defined as average similarity of all pairs of users in that community. In this section, we will describe the steps to build a like-minded community using points of interest computed by the location analysis component and interest profiles computed by the mobile usage analysis component. For example, in location analysis component, we can use the points of interest for computing like-mindedness and claiming that two people are like-minded if they frequently visit similar set of locations. Similarly, in mobile usage analysis, two people are like-minded if they share a number of interests. We also provide the high-level implementation details of the SoLoMo analytics system using Big Data frameworks. We do not describe the algorithm to find like-minded communities in full detail; instead, we focus on the details specific to the Big Data frameworks we used. First, we construct the social network graph based on the CDR data of the subscribers. In the social graph, each node indicates a subscriber, and an edge between two nodes indicates interaction between the two subscribers. The weight on the edge indicates how frequently and for what duration the two subscribers have been talking. We then compute the interests of the user as described in mobile usage analysis section and the POIs as described in the location-based profiling section. We then find the induced subgraph for each interest topic, and also for each point of interest. An induced subgraph for a particular topic retains only those nodes from the social graph that have an interest in that particular topic and the edges incident on these nodes. These induced subgraphs help in establishing what subscribers are connected over similar interests and also frequently talk to one another. These induced subgraphs are found in parallel, leveraging the Netezza’s NZSQL framework. The induced subgraphs are found using the join query on the subscribers and the topics in nzsql. Next, we find maximal cliques in each such induced subgraph using the method proposed in [7]. As stated earlier, Netezza provides native support for the MapReduce computing paradigm. Over the cliques found in the previous step, we run the frequent itemset treating each maximal clique as a transaction, and users as items. Since each maximal clique is a collection of users, it can be treated as a transaction. For this purpose, we use the Netezza analytical function ARULE. The model name and the support level for the frequent itemset mining are specified while

H. CAO ET AL.

9:9

calling the ARULE function of Netezza. Identifiers for the cliques are passed as the TID (transactionID) parameter, and the members of the cliques correspond to the item parameter. The maximum set size is also specified as a parameter. After finding the FIS (frequent itemsets), we apply support threshold to prune the set of frequent itemsets (i.e., groups of users who have appeared in a certain number of maximal cliques across the induced subgraphs for different points of interests and/or different mobile usage based interest topics). We can, of course, find the FIS separately on the POIs and interest topics and then choose to retain the FIS, which are meeting the support criteria in either/both. Once the FIS of interest are determined, we find the union of all the FIS. We call the collection of the members of union of the FIS as core people. Using the method mentioned above, we again find the induced subgraph for the core member, which is called the induced graph of core people. We then find communities on this induced graph of core people. Note that the community finding algorithms used in [22] are not suitable for a parallel implementation, and hence we use the algorithm proposed in [8] to find the like-minded communities. This algorithm provides potentially overlapping communities. Once the like-minded communities are computed, the influencer score is calculated for the all the members in the various communities (e.g., using PageRank [6]). The rank of the members within the community is calculated using the Netezza rank analytical function along with the partition by utility in Netezza. The analytical function helps in improved query processing [23] by executing simpler SQL queries. For each found community, following four metrics are computed to help characterizing the features Size: Size indicates the number of members in a particular community. Density: We use the average degree (within community) as the density measure, i.e., the ratio of the number of edges inside the community to the number of members in the community. Like-mindedness: The like-mindedness metric indicates how like-minded/similar the members of the particular community are. Like-mindedness is computed over one or more dimensions available from other modules such as the location module or mobile usage module. The score ranges from 0 to 1,0 indicating that community is not at all like-minded, and 1 indicating that community consists of highly like-minded people. Activity score: The activity score indicates, on average, how active each member is in terms of purchasing items or rating an item, or in general, in any activity. It is the mean of the activity score of each community member. All these metrics can assist to launch an effective viral marketing campaign or provide deep insights of the

9 : 10

H. CAO ET AL.

subscribers. Using the Netezza High Capacity Appliance, these metrics can be computed efficiently over extremely large data volumes.

Related work A few parallel social network analysis platforms are proposed in [7, 24, 25] to process large-scale graphs. The authors propose to accelerate the processing in parallel computation. The focus of our work is on the provisioning of the integrated analytics solution using the parallel social network analysis algorithms. The front-end visualizations in social networks profiling are proposed based on D3 (data-driven documents) [26], which is a representation-transparent approach to visualization for the web. In this work, we extended D3 to make our visualizations fine-integrated with the web environment. Community detection has long been one of the fundamental topics of attention for the network science researchers. Ever since the seminal paper by Girvan and Newmann [27], much work and interest has been generated in this field. The set of algorithms that can be used to find communities can be broadly divided into six categories [28], namely (i) graph partitioning, (ii) hierarchical clustering, (iii) partitional clustering, (iv) spectral clustering, (v) divisive algorithms, and (vi) modularity-based methods. Most of the approaches try to optimize the given objective functions. The most notable and state-of-the-art algorithms are [29] and [30]. Both of these approaches try to maximize the given objective function, which in this case is modularity. Characterizing human mobility by analyzing anonymized mobile phone data (typically CDRs) has become a hot research topic recently. Modani et al. [22] reviewed the methods of collecting location data from cellular phone network. Isaacman et al. [10] proposed algorithms that identify generally important places, such as home and work locations, of subscribers. Becker et al. [31] presented a comprehensive study on how to use CDR to calculate subscribers’ daily travel range, traffic volumes, and carbon footprint of home-to-work commutes, etc. In this work, we extracted user interests from the mobile browsing log using open source taxonomies such as ODP and Wikipedia. Several papers have utilized the ODP taxonomy along with the web browsing logs for different uses. Recently, Konopnicki and Shmueli-Scheuer [19] uses the ODP taxonomy to model user profiles based only on their domain and URL levels browsing logs; in this work, we extend the scope to utilize Wikipedia source and support dynamic queries, such as Title. The work in [32] focused on exploiting the ODP to achieve high-quality personalized web search based on the distance of the categories of the returned URL to the user profile categories. The distance is measured by hierarchical semantic and the ODP tree structure. Tanudjaja and Mui [33]

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

applied the ODP to enhance the HITS algorithm [34] using dynamic user profiles. Wikipedia is also used to model user behavior for both search terms and web documents. Min and Jones [35] used an unsupervised clustering method to model user search interests using Wikipedia’s category. Other authors [36, 37] generated user profiles based on the page content, whereas in our setting, we only allow access to the URL without fetching the page content. All the work performed in this area has used either connections or interests of users to find communities. Little attention has been paid to the integration of connections among the people and their shared interests to find communities. In most of the cases, text has been considered as the second attribute [38, 39]. Attribute information-based clustering has also been proposed [40]. Modani et al. [22] proposed a way of finding communities with higher modularity and with higher Blike mindedness[ as well, in which like mindedness was a metric used to represent the similarity among members of a community, in terms of product purchases or movie ratings. The same algorithm was made use of in SoLoMo on the variety of data attributes. The algorithm has previously been applied only to movie rating dataset. The algorithm’s ability to handle various types of data was tested in this particular tool. To the best of our knowledge, our tool may be the first to enable analyzing subscribers on all three dimensions: location, social, and mobile. Most existing tools have used combinations of only two of the dimensions. The Livehoods project [41] studied the friendship network along with the locations checked in by the people to come up with various neighborhoods of the city, which were qualitatively validated. Work by the MIT Reality Mining group discusses using mobile phones as social sensors [42], inferring social network based on the calling patterns [43], computing communities [44], and profiling users based on the spatiotemporal patterns [45]. Similarly, much work has been done over mobile and social networks [46–49].

Conclusions and future work In this paper, we presented our BigData solution called SoLoMo, which provides a coherent set of analytics functions to process the vast amounts of data generated in the telecom area every day. In addition, we addressed the challenge of scale involved in this setting, while still providing meaningful insight in acceptable timeframe, as telcos have very large (several million) subscriber bases. Some of the directions to which our work can be extended include 1) fusing available location data from multiple sources (CDR, 2G/3G/4G, GPS, etc.) while considering spatiotemporal constraints for more accurate location inference and 2) using the like-minded communities generated to design efficient strategies for social campaigns and to determine the appropriate targets for such campaigns.

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

*Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both. **Trademark, service mark, or registered trademark of Apache Software Foundation, Google, Inc., Yahoo! Inc., or Wikimedia Foundation in the United States, other countries, or both.

References 1. Global Mobile Statistics 2013 Part A. [Online]. Available: http://mobithinking.com/mobile-marketing-tools/latest-mobilestats/a 2. IBM InfoSphere Streams, IBM Corporation, Armonk, NY, USA. [Online]. Available: http://www-03.ibm.com/software/products/en/ infosphere-streams/ 3. IBM Netezza Data Warehouse, IBM Corporation, Armonk, NY, USA. [Online]. Available: http://www-01.ibm.com/software/data/ netezza/ 4. IBM InfoSphere BigInsights, IBM Corporation, Armonk, NY, USA. [Online]. Available: http://www-01.ibm.com/software/data/ infosphere/biginsights/ 5. IBM SPSS, IBM Corporation, Armonk, NY, USA. [Online]. Available: http://www-01.ibm.com/software/analytics/spss/ 6. L. Page, S. Brin, R. Motwani, and T. Winograd, BThe pagerank citation ranking: Bringing order to the web,[ Stanford InfoLab, Stanford, CA, USA, Tech. Rep., 1999. 7. W. Xue, J. Shi, and B. Yang, BX-RIME: Cloud-based large scale social network analysis,[ in Proc. IEEE Int. Conf. SCC, 2010, pp. 506–513. 8. J. Shi, W. Xue, W. Wang, Y. Zhang, B. Yang, and J. Li, BScalable community detection in massive social networks using MapReduce,[ IBM J. Res. & Dev., vol. 57, no. 3/4, pt. 12, pp. 12:1–12:14, May–Jul. 2013. 9. L. Stenneth, O. Wolfson, P. S. Yu, and B. Xu, BTransportation mode detection using mobile phones and GIS information,[ in Proc. 19th ACM SIGSPATIAL Int. Conf. Adv. Geogr. Inf. Syst., 2011, pp. 54–63. 10. S. Isaacman, R. Becker, R. Caćeres, S. Kobourov, M. Martonosi, J. Rowland, and A. Varshavsky, BIdentifying important places in people’s lives from cellular network data,[ in Proc. 9th Int. Conf. Pervasive Comput., 2011, pp. 133–151. 11. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Mateo, CA, USA: Morgan Kaufmann, 2011. 12. W. Dong, L. Li, C. Zhou, Y. Wang, M. Li, C. Tian, and W. Sun, BDiscovery of generalized spatial association rules,[ in Proc. IEEE Int. Conf. SOLI, 2012, pp. 60–65. 13. W. Dong, W. Fan, L. Shi, C. Zhou, and X. Yan, BA general framework to encode heterogeneous information sources for contextual pattern mining,[ in Proc. ACM Int. CIKM, 2012, pp. 65–74. 14. URL Types - The URL Cleaninghouse. [Online]. Available: http://urlclearinghouse.wikidot.com/types 15. X. Qi and B. D. Davison, BWeb page classification: Features and algorithms,[ ACM Comput. Surv., vol. 41, no. 2, pp. 1–31, Feb. 2009. 16. D. Cohn and T. Hofmann, BThe missing link - A probabilistic model of document content and hypertext connectivity,[ in Proc. Adv. NIPS, 2001, pp. 430–436. 17. Open Directory Project (ODP). [Online]. Available: http://www. dmoz.org/ 18. Wikipedia. [Online]. Available: http://www.wikipedia.org/ 19. D. Konopnicki and M. Shmueli-Scheuer, BCustomer analyst for the telecom industry,[ in Large-Scale Data Analytics. New York, NY, USA: Springer Science and Business Media, 2014. 20. Articles With Open Directory Project Links. [Online]. Available: http://en.wikipedia.org/wiki/Category:Articles_with_Open_ Directory_Project_links 21. Wikipedia Mapping. [Online]. Available: http://projects.dmoz.org/ project.cgi?id=7

H. CAO ET AL.

9 : 11

22. N. Modani, S. Nagar, S. Shannigrahi, R. Gupta, K. Dey, S. Goyal, and A. A. Nanavati, BLike-Minded communities: Bringing the familiarity and similarity together,[ J. World Wide Web, vol. 17, no. 5, pp. 899–919, 2014. 23. Netezza NPS v7.0.3 IEHSc. [Online]. Available: http://pic.dhe. ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz. dbu.doc%2Fc_dbuser_overview_analytic_funcs.html 24. U. Kang, C. E. Tsourakakis, and C. Faloutsos, BPegasus: A peta-scale graph mining system implementation and observations,[ in Proc. IEEE Int. Conf. Data Mining, 2009, pp. 229–238. 25. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, BPregel: A system for large-scale graph processing,[ in Proc. ACM Int. Conf. SIGMOD, 2010, pp. 135–146. 26. M. Bostock, V. Ogievetsky, and J. Heer, BD3 data-driven documents,[ IEEE Trans. Vis. Comput. Graphics, vol. 17, no. 12, pp. 2301–2309, Dec. 2011. 27. M. Girvan and M. Newmann, BCommunity structure in social and biological networks,[ Proc. Nat. Acad. Sci. USA, vol. 99, no. 12, pp. 7821–7826, Jun. 2002. 28. S. Fortunato, BCommunity detection in graphs,[ Phys. Rep., vol. 486, no. 3–5, pp. 3–5, Feb. 2010. 29. M. Newmann, BModularity and community structure in networks,[ Proc. Nat. Acad. Sci. USA, vol. 103, no. 23, pp. 8577–8582, Jun. 2006. 30. V. Blondel, J. Guillame, J. Lambiotte, and R. Lefebvre, BFast unfolding of communities in large networks,[ J. Stat. Mech., Theory Exp., vol. 2008, no. 10, p. P10008, Oct. 2008. 31. R. Becker, R. Caćeres, K. Hanson, S. Isaacman, J. M. Loh, M. Martonosi, J. Rowland, S. Urbanek, A. Varshavsky, and C. Volinsky, BHuman mobility characterization from cellular network data,[ Commun. ACM, vol. 56, no. 1, pp. 74–82, Jan. 2013. 32. P. A. Chirita, W. Nejdl, R. Paiu, and C. Kohlsch, BUsing ODP metadata to personalize search,[ in Proc. 28th Annu. Int. ACM SIGIR, 2005, pp. 178–185. 33. F. Tanudjaja and L. Mui, BPersona: A contextualized and personalized web search,[ in Proc. 35th Annu. Hawaii Int. Conf. Syst. Sci., 2001, p. 67. 34. J. M. Kleinberg, BAuthoritative sources in a hyperlinked environment,[ J. ACM, vol. 46, no. 5, pp. 604–632, Sep. 1999. 35. J. Min and G. J. F. Jones, BBuilding user interest profiles from Wikipedia clusters,[ presented at the Workshop Enriching Information Retrieval ENIR/SIGIR, Beijing, China, Jul. 2011. 36. K. Ramanathan, J. Giraudi, and A. Gupta, BCreating hierarchical user profiles using wikipedia,[ HP Labs, Palo Alto, CA, USA, Tech. Rep. 127, 2008. 37. K. Ramanathan and K. Kapoor, BCreating user profiles using wikipedia,[ in Proc. 28th Int. Conf. Conceptual Modeling, 2009, pp. 415–427. 38. N. Barbieri, F. Bonchi, and G. Manco, BCascade-based community detection,[ in Proc. WSDM, 2013, pp. 33–42. 39. M. Sachan, D. Contractor, T. Faruquie, and L. Subramaniam, BUsing content and interactions for discovering communities in social networks,[ in Proc. World Wide Web, 2012, pp. 330–340. 40. Y. Zhou, H. Cheng, and J. Yu, BGraph clustering based on structural/attribute similarities,[ J. Proc. VLDB Endowment, vol. 2, no. 1, pp. 718–729, Aug. 2009. 41. J. Cranshaw, R. Schwartz, J. Hong, and N. Sadeh, BThe livehoods project: Utilizing social media to understand the dynamics of a city,[ in Proc. ICWSM, 2012, pp. 58–65. 42. N. Eagle, BMobile phones as social sensors,[ in Handbook of Emergent Technologies in Social Research. Oxford, U.K.: Oxford Univ. Press, 2005. 43. N. Eagle, A. Pentland, and D. Lazer, BInferring social network structure using mobile phone data,[ Proc. Nat. Acad. Sci. USA, vol. 106, no. 36, pp. 15 274–15 278, Sep. 2007. 44. N. Eagle, Y. de Montjoye, and L. Bettencourt, BCommunity computing: Comparisons between rural and urban societies using mobile phone data,[ in Proc. IEEE Soc. Comput., 2009, pp. 144–150.

9 : 12

H. CAO ET AL.

45. M. A. Bayir, M. Demirbas, and N. Eagle, BDiscovering spatiotemporal mobility profiles of cellphone users,[ in Proc. Int. Symp. World Wireless, Mobile Multimedia Netw., 2009, pp. 1–9. 46. A. Nanavati et al., BAnalyzing the structure and evolution of massive telecom graphs,[ IEEE Trans. Knowl. Data Eng., vol. 20, no. 5, pp. 703–718, May 2008. 47. V. Pandit, N. Modani, S. Mukherjea, A. Nanavati, S. Roy, and A. Agarwal, BExtracting dense communities from telecom call graphs,[ in Proc. Commun. Syst. Softw. Middleware, 2008, pp. 82–89. 48. K. Dasgupta, R. Singh, B. Vishwanathan, D. Chakraborty, S. Mukherjea, A. Nanavati, and A. Joshi, BSocial ties and their relevance to churn in mobile telecom networks,[ in Proc. Extending Database Technol., 2008, pp. 668–677. 49. A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and A. Joshi, BOn the structural properties of massive telecom graphs: Finding and implications,[ in Proc. Int. Conf. Knowl. Manage., 2006, pp. 435–444.

Received February 22, 2014; accepted for publication March 17, 2014 Heng Cao IBM Research - China, Shanghai 201203, China ([email protected]). Ms. Cao is an IBM Senior Technical Staff Member and heads the IBM Research - China Shanghai Lab. She also serves as the IBM Research Global Labs’ Analytics Leader and leads the cross-geography Research teams in developing innovative analytics technologies to address the emerging analytics requirements from Growth Markets. Prior to that, she was on assignment from the IBM Thomas J. Watson Research Center to the IBM China Research Lab as the CTO for business analytics and optimization. Ms. Cao and her team participated in many IBM on-demand business transformation projects and successfully helped the business to improve performance through analytics. She was the recipient of many IBM awards including the IBM Outstanding Technical Achievement. She also received the 2008 National Women of Color Rising Star Award and the 2010 INFORMS (Institute for Operations Research and the Management Sciences) Daniel H. Wagner Prize. Wei Shan Dong IBM Research - China, Beijing 100193, China ([email protected]). Dr. Dong is a Research Staff Member in IBM Research - China. He received his B.E. degree in computer science from the University of Science and Technology of China (USTC) in 2004, and his Ph.D. degree in pattern recognition and intelligent system from the Institute of Automation, Chinese Academy of Sciences in 2009. He joined IBM Research - China in 2009. His research interests include data mining (especially on spatiotemporal data), evolutionary computation, and computer vision. Leslie S. Liu IBM Research - China, Beijing 100193, China ([email protected]). Dr. Liu currently is a Research Staff Member at IBM Research - China working on Big Data-related research including telecom mobility patterns and user profiling for the connected vehicle industry. Before joining IBM Research - China, Dr. Liu was a Staff Member at the IBM Thomas J. Watson Research Center, where he worked on innovations and research such as secure and scalable mobile systems in the enterprise, next-generation application development platforms, and cloud-based service models. Dr. Liu also led a mobile service engagement team with members from North America, China, Taiwan, and India. Dr. Liu and his team have been actively engaged with opportunities with customers from financial, automotive, defense and insurance industries. The team has generated multi-million dollars’ worth of revenue since 2008. Dr. Liu is the author of nine patent applications and has published many technical papers in mobile, multimedia, and cloud-related conferences and journals. Dr. Liu also served as program chair and technical committee members on multiple IEEE (Institute of Electrical and Electronics Engineers) and ACM (Association for Computing Machinery) conferences.

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

Chun Yang Ma IBM Research - China, Beijing 100193, China ([email protected]). Dr. Ma is a Staff Researcher in IBM Research - China. She received her B.S. and Ph.D. degree in computer science from Zhejiang University, China, in 2006 and 2012. Her current research interests include spatial database, data access methods, and spatiotemporal data mining. Wei Hong Qian IBM Research - China, Beijing 100193, China ([email protected]). Ms. Qian is a staff researcher in IBM Research - China. She received her B.S. and M.S. degrees from Zhejiang University, majoring in computer science and technology. Her research interests include interactive visual text analysis, interactive visual social network analysis, simple visualization, text analytics, embedded systems, etc. Ju Wei Shi IBM Research - China, Beijing 100193, China (jwshi@ cn.ibm.com). Mr. Shi is a Research Staff Member in the Information Management department at IBM Research - China. He received his B.S. and M.S. degrees in electrical engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2005 and 2008, respectively. He subsequently joined IBM Research - China, where he worked on Big Data analytics, such as Hadoop self-tuning, social network analysis using MapReduce, Hadoop performance on PowerPC, and data management and analytics applications across industries. He also worked in Microsoft Research Asia as a visiting student in 2005. Mr. Shi has more than 10 papers published and 20 patents filed. Chun Hua Tian IBM Research - China, Beijing 100193, China ([email protected]). Dr. Tian is a Research Staff Member and Manager in the Service Research department. He holds a Ph.D. degree in automation science and engineering from Tsinghua University. His current research interests include data mining, logistics and supply chain management, and rule-based optimization. Yu Wang IBM Research - China, Shanghai 201203, China ([email protected]). Mr. Wang is a Researcher in the IBM Research - China Shanghai Lab. He received his B.S. degree at XiDian University and an M.S. degree at SiChuan University. His current research interests include real-time database and Big Data analytics such as spatial-temporal data analysis and social network analysis using MapReduce.

David Konopnicki IBM Research Division, IBM Research Haifa, Haifa University Campus, 31905 Haifa ([email protected]). Dr. Konopnicki manages the Information Retrieval Group in IBM Research - Haifa and has been involved in unstructured content analytics both from a theoretical and a practical point of view. In academia, Dr. Konopnicki developed search systems for the early web. In the IBM Software Group, and in IBM Research, he has been leading a variety of projects: development of large-scale full-text search engines, building customer profiles from enterprise and social media sources, massive-scale analytics with applications to Telco companies, and more. Dr. Konopnicki is an IBM Master Inventor and holds a Ph.D. degree in computer science from the Technion-Israel Institute of Technology. Michal Shmueli-Scheuer IBM Research Division, IBM Research - Haifa, Haifa University Campus, 31905 Haifa (shmueli@il. ibm.com). Dr. Shmueli-Scheuer is a Researcher in the Information Retrieval Group in IBM Research - Haifa. Dr. Shmueli-Scheuer received her Ph.D. degree in information and computer science at the University of California, Irvine, in 2009. Her area of expertise is in the fields of large-scale analytics, database, and information systems, focusing on user-behavior analytics and information management on the web. She has authored numerous papers on data management and information retrieval in leading conferences.

IBM J. RES. & DEV.

VOL. 58

NO. 5/6

PAPER 9

SEPTEMBER/NOVEMBER 2014

Doron Cohen IBM Research Division, IBM Research - Haifa, Haifa University Campus, 31905 Haifa ([email protected]). Mr. Cohen is a Researcher in the Information Retrieval Group in IBM Research - Haifa. He holds an M.Sc. degree from the Technion-Israel Institute for Technology, and in 1990 he joined IBM to first work on compiler backend optimizations and later on information retrieval. Mr. Cohen has authored several papers on information retrieval in leading conferences. Natwar Modani IBM Research - India, ISID Campus, Vasant Kunj, New Delhi-70, India ([email protected]). Mr. Modani is a Senior Software Engineer in the Telecom Research Innovation Center at the IBM Research - India Lab. He received an M.E. (Integrated) degree in electrical communication engineering from Indian Institute of Science (IISc), Bangalore, India. He subsequently joined IBM Research - India, where he has worked in eCommerce, autonomic systems, and social network analysis areas. He has received an IBM Client Value Outstanding Technical Achievement Award and IBM Research Division Award. He is coauthor of 15 patents and 19 technical papers. Hemank Lamba IBM Research - India, ISID Campus, Vasant Kunj, New Delhi-70, India ([email protected]). Mr. Lamba is a software engineer in the Social Network Analytics Group at IBM Research - India. Prior to IBM, he was at Indraprastha Institute of Information Technology Delhi, where he completed his B.Tech. (with Hons.) in computer science engineering in 2012. He has authored or coauthored six papers in peer-reviewed international conferences. He has worked on solutions dealing with viral marketing, mining unusual patterns, and incentive mechanism design. Ananth Dwivedi IBM Research - India, ISID Campus, Vasant Kunj, New Delhi-70, India ([email protected]). Mr. Dwivedi is a software engineer in the Social Network Analysis Group at IBM Research - India. He received a B.Tech. degree in computer engineering from Indian Institute of Technology, Banaras Hindu University in 2012. He has worked on viral marketing campaign management (Vibes).

Amit A. Nanavati IBM Research - India, ISID Campus, Vasant Kunj, New Delhi-70, India ([email protected]). Dr. Nanavati is a Research Staff Member in the Mobile and Telecom Research department at the India Research Lab. He received a B.S. degree in computer science from Maharaja Sayajirao University in 1989, and an M.S. degree in systems science and a Ph.D. degree in computer science from Louisiana State University in 1994 and 1996, respectively. He subsequently joined Netscape Communications Corporation and then moved to IBM Research - India in 2000. In 2011, he was named a Master Inventor and became an IBM Academy of Technology member in 2013. He has authored over 40 patents (19 issued) and 45 publications. He coauthored a book on Speech in Mobile and Pervasive Environments published by John Wiley, United Kingdom, in 2012. Manish Kumar IBM Telecom Industry, Sales and Distribution, IBM Singapore Pte Ltd, Singapore 486048 ([email protected]). Mr. Kumar is a Solutions Leader in the IBM Asia Pacific region. He received his B.Sc. Honors degree in physics from Dibrugarh University, India, in 1994. He has worked in a number of large communications service provider companies and subsequently joined IBM Global Telco Solutions Center, where he created the industry-defining service delivery platform solution that created a multi-million revenue opportunity for IBM customers and IBM. He was awarded the Asia Pacific Hypergrowth Award. Mr. Kumar specializes in new growth and revenue-generating services and platforms and is an active member of multiple Telco forums and communities.

H. CAO ET AL.

9 : 13

SoLoMo Analytics for Telco Big Data Monetization - 06964900

Short Description

Description

Comments

We need your help!