US20160247163A1
2016-08-25
15/021,695
2014-10-14
US 11,270,316 B2
2022-03-08
WO; PCT/IB2014/065299; 20141014
WO; WO2015/056170; 20150423
Mark D Featherstone | Tony Wu
Elliott, Ostrander & Preston, P.C.
2035-11-03
A method for data processing includes obtaining from multiple different organizations (24, 52, 54, 56) customer relationship management (CRM) records (110) and communication records (90). Reference vectors (130) of feature values are computed for the communication records in a training set. Global weights are computed for the feature values by evaluating the reference vectors for all of the different organizations. For each organization, respective company weights are computed by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization. For each person belonging to a given organization, respective user weights are computed for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user. The weights are applied in order to assign the communication records that are not in the training set to respective ones of the CRM records.
Get notified when new applications in this technology area are published.
G06Q10/06311 » CPC further
Administration; Management; Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models; Operations research or analysis; Resource planning, allocation or scheduling for a business operation Scheduling, planning or task assignment for a person or group
G06Q10/107 » CPC further
Administration; Management; Office automation, e.g. computer aided management of electronic mail or groupware ; Time management, e.g. calendars, reminders, meetings or time accounting Computer aided management of electronic mail
G06Q30/01 » CPC main
Commerce, e.g. shopping or e-commerce Customer relationship, e.g. warranty
G06F16/24578 » CPC further
Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data; Querying; Query processing with adaptation to user needs using ranking
G06F16/285 » CPC further
Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data; Databases characterised by their database models, e.g. relational or object models; Relational databases Clustering or classification
G06Q10/10 » CPC further
Administration; Management Office automation, e.g. computer aided management of electronic mail or groupware ; Time management, e.g. calendars, reminders, meetings or time accounting
G06Q30/00 IPC
Commerce, e.g. shopping or e-commerce
G06F16/28 IPC
Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data Databases characterised by their database models, e.g. relational or object models
G06F16/2457 IPC
Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data; Querying; Query processing with adaptation to user needs
G06Q10/06 » CPC further
Administration; Management Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
This application claims the benefit of U.S. Provisional Patent Application 61/891,540, filed Oct. 16, 2013, which is incorporated herein by reference.
The present invention relates generally to customer relationship management (CRM) systems, and particularly to methods, apparatus and software for automation of data entry into CRM systems.
Computerized CRM systems and software are widely used in all sorts of businesses to manage company interactions with current, future and past customers, including sales, marketing, and customer service and support. CRM systems store contact data, communications, and other customer-related information and documents from all departments that interact with customers in a central repository. This centralization and organization gives management and employees access to data on demand and facilitates cooperation between departments and enhancement of business processes.
The leading supplier of CRM software-as-services at present is Salesforce.com, which offers a suite of cloud-based products, including the âSales Cloudâ sales force automation package. This package keeps track of contacts including leads, opportunities, accounts, partners and competitors. It also integrates with e-mail programs, such as Microsoft OutlookÂź, and enables users to associate e-mail items with the proper contacts, as well as synchronizing calendars and tracking follow-up of leads and opportunities. Similar capabilities, with similar sorts of sales force automation data models, are offered by other CRM vendors, such as SAP SE (Walldorf, Germany), as well as Microsoft Dynamics CRM and Oracle.
Embodiments of the present invention that are described hereinbelow provide methods, systems and software for automating the entry of data into CRM systems.
There is therefore provided, in accordance with an embodiment of the present invention, a method for data processing, which includes obtaining from each organization among multiple different organizations a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record including multiple CRM fields containing first data. A respective second plurality of communication records belonging to the organization is also obtained, including a set of the communication records that have been assigned to respective CRM records within the first plurality. Each communication record includes multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication.
Respective reference vectors of feature values are computed for the communication records in the set. Each feature value indicates a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned. Global weights are computed for the feature values by evaluating the reference vectors computed for all of the different organizations. For each organization, respective company weights are computed for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization. For each person belonging to a given organization, respective user weights are computed for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user.
Further vectors of the feature values are computed for the communication records that are not in the set, and the global weights, company weights, and user weights are applied to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records. Entries are inserted in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
Typically, computing the further vectors of the feature values includes computing a respective vector for each communication record and each of one or more candidate CRM records for assignment of the communication record thereto, and assigning the communication record includes applying the respective vector in deciding whether to assign the communication record to each of the candidate CRM records. Applying the respective vector may include computing a weighted sum over the feature values in the respective vector using the global weights, company weights, and user weights, and assigning the communication record to a candidate CRM record if the weighted sum meets a predefined criterion.
In a disclosed embodiment, computing the global weights, company weights, and user weights includes applying a support vector machine to the reference vectors of the feature values for the communication records in the set in order to define classifiers that assign the communication records to the CRM records based on the global weights, company weights, and user weights.
In some embodiments, computing the company weights includes, after computing the global weights, evaluating the reference vectors computed over the CRM records and communication records belonging to each organization in order to find differences between the global weights and the company weights for the organization. Similarly, computing the user weights includes, after computing the company weights for the given organization, evaluating the reference vectors computed over the communication records belonging each person belonging to the given organization in order to find differences between the company weights and the user weights for the person.
Typically, obtaining the CRM records includes identifying in the CRM records at least some of the communication records that have been associated with the respective CRM records, and including the identified communication records in the set for use in computing the weights. Additionally or alternatively, obtaining the CRM records includes, after inserting entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto, identifying the inserted entries to which modifications were made, after insertion in the CRM records, by users of the CRM records, and computing the global weights, company weights, and user weights includes applying the modifications in determining the weights. The modifications may include deletion of the assigned communication records, reassignment of the assigned communication records to other CRM records, and changes in entries in the CRM records corresponding to the assigned communication records.
In disclosed embodiments, the vectors include feature values indicative of personas occurring both in the given communication record and the respective CRM record, feature values indicative of a relation between a date of the given communication record and timeline of the respective CRM record, and/or feature values indicative of company and product names occurring both in the given communication record and the respective CRM record.
There is also provided, in accordance with an embodiment of the present invention, data processing apparatus, which includes a memory, configured to store records belonging to multiple different organizations. The records include, for each organization, a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record including multiple CRM fields containing first data; and a respective second plurality of communication records belonging to the organization, including a set of the communication records that have been assigned to respective CRM records within the first plurality, each communication record including multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication.
A processor is configured to compute respective reference vectors of feature values for the communication records in the set, each feature value indicating a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned. The processor is configured to compute global weights for the feature values by evaluating the reference vectors computed for all of the different organizations, to compute, for each organization, respective company weights for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization, and to compute, for each person belonging to a given organization, respective user weights for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user. The processor is further configured to compute further vectors of the feature values for the communication records that are not in the set, to apply the global weights, company weights, and user weights to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records, and to insert entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
There is additionally provided, in accordance with an embodiment of the present invention, a computer software product, including a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to obtain records belonging to multiple different organizations. The records include, for each organization, a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record including multiple CRM fields containing first data; and a respective second plurality of communication records belonging to the organization, including a set of the communication records that have been assigned to respective CRM records within the first plurality, each communication record including multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication.
The instructions cause the computer to compute respective reference vectors of feature values for the communication records in the set, each feature value indicating a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned. The instructions further cause the computer to compute global weights for the feature values by evaluating the reference vectors computed for all of the different organizations, to compute, for each organization, respective company weights for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization, and to compute, for each person belonging to a given organization, respective user weights for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user. The instructions cause the computer to compute further vectors of the feature values for the communication records that are not in the set, to apply the global weights, company weights, and user weights to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records, and to insert entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
FIG. 1 is block diagram that schematically illustrates a system for communications and customer relationship management, in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart that schematically illustrates a method for automated CRM data entry, in accordance with an embodiment of the present invention;
FIG. 3 is a block diagram that schematically illustrates a method for construction of feature vectors, in accordance with an embodiment of the present invention; and
FIG. 4 is a flow chart that schematically illustrates a method for automatic learning of feature weights, in accordance with an embodiment of the present invention.
Although CRM systems that are known in the art provide facilities for storing applicable communications (such as e-mail exchanges and calendar items) in the appropriate CRM records, implementation of this capability requires extensive manual data entry by users of the system. As a result, most CRM systems have problems of partial, missing and âdirtyâ (inaccurate) data.
Embodiments of the present invention that are described herein address these problems by providing computerized tools that automatically match a company's CRM records and communication records, and on this basis insert appropriate communication-based entries (alternatively referred to simply as âcommunication entriesâ) into the CRM system without requiring additional user interaction. These tools thus provide the company with more complete and accurate CRM data to assist sales and management personnel in customer follow-up, and may also add insights based on the CRM and communication data, such as identification and development of new opportunities that might otherwise be missed. The present patent application focusing on the matching and entry insertion functions of these tools. These functions operate in conjunction with a company's existing CRM and communication systems without requiring modification to these systems, and are capable of serving multiple, unrelated organizations.
Automatic insertion of communication-related entries into a company's CRM system requires that each relevant communication record be matched to the correct CRM record. Correct matching is a major challenge, since the some salesperson, customer, or product may appear in many different communication records and CRM records. Incorrect matches introduce ânoiseâ into the CRM system that reduces and can even negate the usefulness of the automatic tool.
The embodiments disclosed herein overcome these problems by a process of multi-tier learning, applied at three different levels over multiple different organizations:
In the disclosed embodiments, the matching tool obtains and stores CRM records and communication records belonging to the organization from each organization among multiple different organizations. The CRM records comprise multiple CRM fields containing CRM data, including communication entries. Each communication record likewise comprises multiple communication record fields containing data relating to a communication and identifying at least one user of the communication in question within the organization. Typically, the corpus of communication records collected by the matching tool includes a set of the records that have already been assigned to respective CRM records, and can thus be used as a training set for purposes of the learning process.
To initiate the learning process, the matching tool computes respective reference vectors of feature values for the communication records in the training set. :Each feature value indicates a degree of correspondence between a communication datum in a specified field of a given communication record and a CRM datum in a specified field of the CRM record to which the given communication record has been assigned. For example, at the simplest level, a feature value may be one or zero to indicate whether or not one of the names (in the form of e-mail addresses) in the âtoâ field of an e-mail record is identical to (the e-mail address of) the âownerâ of a CRM record; but typically, tens or hundreds of such features are evaluated and included in the feature vector. As other examples, the feature values may indicate a relation between a date of the given communication record and timeline of the respective CRM record, or company and product names occurring both in the given communication record and in the respective CRM record.
Based on these reference vectors, the matching tool computes:
The matching tool then applies these weights in classifying and assigning further communication records that are not in the initial training set. For each of these further records, the tool computes feature values and applies the global weights, company weights, and user weights to the corresponding feature vectors in order to assign the communication records to respective CRM records. The tool then inserts entries in the respective CRM records corresponding to the communication records that have been assigned in this manner.
Typically, the matching tool computes a respective vector for each of these further communication records and each candidate CRM record to which the communication record might be assigned, and uses the feature vector in computing a score. In some embodiments, the matching tool computes the score as a weighted sum over the feature values in the respective vector using the global weights, company weights, and user weights, and then assigns the communication record to a candidate CRM record if the weighted sum meets a predefined criterion, such as if the score exceeds a certain threshold. In the disclosed embodiments, the matching tool applies a support vector machine (SVM) to the reference vectors of the training set in order to define classifiers that assign the communication records to the CRM records and provide values of the global weights, company weights, and user weights.
The global weights, company weights, and user weights may be defined serially. In other words, after computing the global weights, the matching tool evaluates the reference vectors computed over the CRM records and communication records belonging to each organization in order to find differences between the global weights and the company weights for the organization. Then, after computing the company weights for a given organization, the matching tool evaluates the reference vectors computed over the communication records belonging to each person in the given organization in order to find differences between the company weights and the user weights for the person. The company weights and user weights may then be stored and applied in the matching process as difference values, relative to the global or respective company weight.
The matching tool may assemble the initial training set automatically, by identifying in the CRM database certain communication records that have already been associated with particular CRM records (for example, communication entries that have been saved in the CRM system manually by system users). After the matching tool has processed communication records to assign them to CRM records and has accordingly inserted entries in the CRM records, the matching tool may subsequently identify the inserted entries to which modifications were made by users of the CRM records. Such modifications may comprise, for example, deletion of the assigned communication records, reassignment of the assigned communication records to other CRM records, and/or changes in entries in the CRM records corresponding to the assigned communication records. These sorts of modifications can be particularly useful in correcting the weights, and the matching tool applies them accordingly.
FIG. 1 is block diagram that schematically illustrates a system 20 for communications and customer relationship management, which uses a CRM matching and analysis tool 22 in accordance with an embodiment of the present invention. System 20 serves an organization 24, which includes many users 26, in its interaction with customers 28. Much of this interaction has to do with communication via public networks 30, such as the Internet, and the elements of system 20 are typically (although not necessarily) connected physically to one another via the Internet. Tool 22 interacts with and serves not only organization 24, but also other organizations 52, 54, 56, . . . , represented as enterprises B, C, D, . . . . Organizations 24, 52, 54, 56, . . . , are independent of one another, in the sense that they typically have different sets of users 26 and customers 28; deal with different lines of products and services; and are unaffiliated with one another in terms of ownership of the organizations.
Users 26 in organization 24 may generate various types of communication records, relating to e-mail, calendar entries, voice calls, and video conferences, for example, and tool 22 may receive and process records of all such types of communications. For the sake of simplicity, however, the present example will focus on e-mails exchanged between user computers 32 and customer computers 34 via network 30. Such e-mails are transmitted and received via a mail server 36, which may be either a dedicated physical server maintained in organization 24 or a remote server (in a cloud-based service, for example) that is maintained by a service provider. In the pictured example, user computers 32 communicate with mail server 36 and other components of system 20 via an enterprise network 38. Alternatively or additionally, user computers 32 may include mobile devices, such as smart phones and tablets, as well as home computers, which are used outside the confines of organization 24 and communicate with mail server 36 and other system components via public network 30 and/or other private networks. In any case, mail server 36 maintains its communication records in a memory 40, such as disk storage, typically in the form of a database.
A CRM server 42 maintains CRM records for organization 24 in a memory 44, typically in the form of a CRM database. As in the case of the mail server, CRM server 42 may comprise a dedicated server within organization 24, as shown in FIG. 1, or may be maintained remotely, possibly as a cloud-based service.
Matching and analysis tool 22 comprises a processor 46, which communicates with mail server 36 and CRM server 42, as well as with other organizations 52, 54, 56, . . . , typically (although not necessarily) via network 30. Processor 46 is coupled to a memory 48, which stores both program code, for carrying out the functions described herein, and data collected from servers 36 and 42. The functions of tool 22 are described in detail hereinbelow.
Processor 46 typically comprises one or more general-purpose computer processors, which are programmed in software to carry out the functions that are described herein. The software program code may be stored in memory 48, typically in tangible, non-transitory storage media, such as optical, magnetic, or electronic memory media. Tool 22 may also comprise a user interface 50, as well as other appropriate communication and computing components that are known in the art. Although tool 22 is shown in FIG. 1 as a single physical unit, the functions of tool 22 may alternatively be distributed over multiple computers and may be implemented in a cloud-based service. Further additionally or alternatively, some or all of the functions of tool may be integrated with CRM server 42 or mail server 36.
FIG. 2 is a flow chart that schematically illustrates a method for automated CRM data entry performed by tool 22, in accordance with an embodiment of the present invention. The method includes two stages: a learning stage 60 and a matching stage 62. In learning stage 60, tool 22 collects and analyzes data from mail server 36 and CRM server 42 in order to identify features and compute the appropriate weights to assign to each. These weights are then applied in matching stage 62 in order to assign communication records to CRM records. This assignment may take the form of a Customer Communication Graph (CCG), which represents the full matching result of the stage 62 across all entities. The CCG is a bipartite graph containing two setsâU: all communication entities, and V: all CRM Records. An edge e exists between vertices u and v if and only if u and v are found to match in stage 62.
Stages 60 and 62 may be performed in alternation, whereby the results of learning stage 60 are updated from time to time (for example, once a week or once a month) using new inputs that have been collected in the interim, and the updated weights are then applied in refining the matching results in stage 62.
The results of automatic assignment of communication records to CRM records in stage 62 may be used in proposing enhancements to the CRM system maintained by organization 24, at a CRM enhancement stage 64. These enhancements may include, for example:
Learning stage 60 begins with collection of CRM records from memory 44 and communication records from memory 40, at a data collection step 70. Typically, these records are held in memory 48 during processing. Processor 46 identifies some of these records as training data, including specifically those communication records regarding which entries have already been made in corresponding CRM records. For example, processor 46 may find e-mail entries that users 26 have assigned to particular opportunities and consequently entered them manually in appropriate CRM records, in order to keep track of e-mail communications that they have conducted with customers 28 or other parties regarding opportunities listed by CRM server 42. In such cases, tool 22 may be confident that the e-mails in question are correctly assigned to the corresponding CRM records. As another example, processor 46 may identify e-mail entries that were made automatically by tool and were later modified by users 26, and may incorporate these entries in the training set, as well.
Processor 46 identifies and computes values of features of the communication records and CRM records in the training set, at a feature identification step 72. A âfeatureâ in this context refers to an attribute shared by a communication record (such as an e-mail) and a particular CRM record. The value of the feature indicates, for each candidate CRM record to which a particular communication record may be assigned, the degree to which the attributes in question match. For example, each e-mail/CRM record pair may have features indicating whether the âtoâ field, âfromâ field, or âccâ field of the e-mail contains the name of the user who is the âownerâ of the CRM record in question. (The term ânames,â as used in the context of the present description and in the claims, should be understood as including e-mail addresses within its scope.) The values of these features are binaryâone or zero to indicate whether or not the fields match. Other features, such as match scores computed over the textual content of the e-mail message, may have continuous values, but processor 46 may normalize and binarize them, as well.
Processor 46 typically computes an entire vector of different feature values for each communication/CRM record pair that is identified in the training set. Details of further features that may be incorporated in the feature vector and methods for their computation are described hereinbelow with reference to FIG. 3.
Using the feature vectors computed at step 72, processor 46 applies an automatic learning process to assign weights to all the features in the vector, at a weight learning step 74. The weight of each feature indicates the relative correlation between the value of the feature and the likelihood of a match between the communication record and the candidate CRM record for which the feature is computed. In other words, assuming that the feature vector is a vector of binary values, a strong positive weight for a given feature means that a match is likely when the feature has the value one, while a negative weight indicates that a match is unlikely in such a case. (For example, the inventors have found that although features indicating that the âtoâ or âfromâ field of an e-mail matches the âownerâ of a CRM record typically have strong positive weights, the feature indicating a match between the âccâ field of the e-mail and the âownerâ should generally receive a negative weight.)
As explained above, learning step 74 actually comprises three sub-steps: a global learning step 76, a company learning step 78, and a user learning step 80. These steps are typically (although not necessarily) carried out serially, so that the results of step 76 serve as the basis for step 78, and the results of step 78 serve as the basis for step 80. Details of step 74 and its sub-steps are presented in FIG. 4. Upon completion of step 74, processor 46 has computed three sets of weights: global weights over all organizations sampled by tool 22; company weights {ci} for each organization; and user weights {ui} for each user in each of the organizations. Assuming the feature vector contains an array of n feature values, there will similarly be n weights in each set, i.e., i=1, . . . , n. The weights are typically normalized to a predefined range, such as [â1, 1].
In matching stage 62, processor 46 computes the feature vectors for communication records that have not yet been classified and stored in the CRM database in memory 44, at a vector evaluation step 82. The feature values are computed using the same criteria as were applied in step 72. For each communication record, processor 46 computes feature vectors with respect to a number of candidate CRM records that are identified as likely matches, or possibly with respect to all CRM records of the organization to which the communication record belongs.
In order to choose the candidate CRM records for a given e-mail, processor 46 may, for example, extract all domains (company.com) from the addresses of the email recipients and text (including recipients of all other e-mails in the same thread). The processor may also extract named entities from the e-mail text. Processor 46 then chooses as candidates those CRM records that have a relation to at least one of the domains or entities extracted. (For example, if the email was sent to someone@intel.com, all opportunities related to all Intel accounts may be selected for matching). The matching process is not applied to e-mails that are not relevant to customer relationships, such as e-mails sent by robots (automatic e-mails), spam, and internal emails.
For each feature vector (f1, . . . , fn) of each communication record/CRM record pair, processor 46 computes a score s, using the weights found at step 74:
s=ÎŁi=1n(gi+ci+ui)fi ââ(1)
Processor 46 finds the pair of records to match if the corresponding score s exceeds a predefined threshold. The threshold may be set empirically, in order to maximize the accuracy of assignment, i.e., to achieve the desired balance between false-positive assignments and false-negative, missed matches.
For each match identified at step 82, tool 22 adds a new communication entry in the corresponding CRM record held by server 42, at an entry creation step 84. Typically, processor 46 transmits a message over network 30 to CRM server 42, instructing server 42 to create the entry in the appropriate record in memory 44, as though one of users 26 had submitted such an instruction. The updated entry will then be available to users 26 for subsequent reference, as well as to tool 22 for use at step 64. In addition, if a user finds that some aspect of an entry of this sort in a CRM record is incorrect, the user may correct the entry, for example, by deleting it, editing it (making changes in the record), or moving the entry to a different record. CRM server 42 keeps track of such changes and reports them to tool 22 at the next iteration through step 70.
FIG. 3 is a block diagram that schematically illustrates a method for construction of a feature vector 130 between a communication record, such as an e-mail 90, and a candidate CRM record 110, in accordance with an embodiment of the present invention. The types and features of the CRM records that are relevant in this regard are listed below in Appendix A.
As a preliminary step, as noted above, in order to choose the e-mails to be matched to CRM records, processor 46 of tool 22 identifies the e-mails that are related directly to customer accounts, opportunities and leads, and discards other e-mails (such as automatic, spam, and internal e-mails) from further processing. In addition, to enhance the accuracy of matching between the chosen e-mails and CRM records, processor 46 cleanses the CRM and e-mail data that are to be used in the matching process. These aspects of the present embodiment are described below in Appendix B.
Processor 46 then extracts the following data from each e-mail 90 that is chosen for processing:
Processor 46 extracts the following data from each CRM record 110:
The above communication and CRM data are listed by way of example, and additional features may similarly be extracted and applied in the matching process, as will be apparent to those skilled in the art after reading the present disclosure.
In order to choose the candidate CRM records 110 to which a given e-mail 90 may be matched, processor 46 sorts the CRM records in relation to the characteristics of the e-mail, for example:
Feature vector 130 may contain elements corresponding to the following features, for example, wherein each vector element receives the value one or zero depending on whether the feature evaluates as true or false:
For e-mails 90 in the training set that is used in step 74 (FIG. 2), a match n flag 132 is maintained to indicate whether the e-mail is correctly matched to CRM record 110. Most commonly, flag 132 is set depending on user input, i.e., an indication by one of users 26 that this particular e-mail should or should not be assigned to this CRM record 110.
For this purpose, tool 22 automatically matches e-mails that were manually entered into the CRM database in memory 44, or sent from CRM server 42, to e-mails found in a user mailbox on mail server 36. This matching may be based on the following logic:
Tool 22 may also match calendar items (such as meetings, calls, n similar fashion, by treating the calendar item as an e-mail. Each field in the calendar item record is treated as an e-mail message field, mutatis mutandis (for example, the meeting subject replaces the e-mail subject, and the meeting participants replace the e-mail recipients). The matching algorithm for e-mails is then applied to the transformed calendar item.
FIG. 4 is a flow chart that schematically shows details of learning step 74, in accordance with an embodiment of the present invention. Processor 46 first computes the global model weights over all organizations served by tool 22, at global learning step 76. In this step, the processor computes respective feature vectors 130 for all e-mail/CRM record pairs in the training set and then finds the respective weights gi of the features that, when inserted into formula (1) above, will provide the widest possible separation between correct assignments and incorrect assignments. In other words, when such weights are applied to feature vectors computed for e-mail/CRM pairs outside the training set, correct assignments of e-mails to CRM records (i.e., assignments corresponding to those that would be made by a human user) will receive high positive scores, while incorrect assignments will receive much lower scores.
Various methods that are known in the art may be used in step 76 (and in subsequent steps 78 and 80) in computing optimal weights using the given training set. The inventors have found that a support vector machine (SVM) gives good results in generating classifiers (in terms of feature weights) that maximize the distance, in feature space, between different CRM records to which e-mails are to be assigned. For example, the Pythonâą scikit-learn SVM package (specifically the linear kernel SVM) can be adapted for this purpose with good results.
Having the global set of weights found at step 76 is useful in efficient learning and in making initial assignments of e-mails to CRM records for organizations newly served by tool 22. Because of the statistical nature of the classification, and in particular due to variations in communication and business practices among different organizations served by tool 22, however, a significant number of incorrect classifications can be expected if only the global weights gi are used. For this reason, processor 46 evaluates the company model weights for each organization at step 78. The company model weights are computed in terms of the difference of each weight ci relative to the corresponding global weight gi. The company weights are computed in like fashion, using an SVM, for example, except that the training set in this case includes only the e-mail/CRM record pairs belonging to the organization in question.
Finally, given both the global and company weights, processor 46 computes user model weights ui for each user at step 80. These weights account for the differences in e-mail and business practices among different users within each organization and are intended to achieve the maximal possible separation among e-mail classifications. Again, the user model weights are computed in terms of the difference of each weight ui relative to the corresponding company weight ci, using only the e-mails belonging to each particular user in computing his or her set of weights. Processor 46 thus applies the SVM for each user individually at this step.
Tool 22 updates its set of training data from time to time, at a benchmarking step 140, based on user entries made at step 84 Processor 46 then repeats steps 76, 78 and 80 in order to refine the weight values and improve the classification results. For example, processor 46 may recompute the global model once a month, the company model once a week, and the user model once a day, depending on the availability of new training data. Recomputation steps may be skipped if no new training data are available.
At benchmarking step 140, tool 22 updates its sets of training data based on user inputs to CRM server 42, and may typically add the following sorts of information:
The appropriate components of learning step 74 are then repeated, and the resulting refined weights are applied in matching new communication records collected at step 84 from mail server 36.
Although the embodiments described herein relate specifically to CRM systems and records, the principles of the present invention may similarly be applied, mutatis mutandis, in enterprise record-keeping systems of other sorts. It will thus be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
Given an e-mail message, tool 22 first identifies its conversation type. The conversation type can be one of the following:
The participants of an e-mail message are defined to be the sender and all the recipients of the message.
A conversation in which all participants are related to the Company. There are several ways of verifying whether an e-mail address (of a participant) is internal:
A conversation between the Company and one of its Accounts (found in the CRM database). A participant's e-mail address is identified as related to an Account if:
A conversation between the Company and one of its Leads (explicitly found in the CRM database). At the time of the e-mail message, the lead status should be ânon-converted.â
A conversation between the Company and participants having e-mail addresses from a single domain only. This domain should not match any known Account domains.
Tool 22 automatically identifies e-mails that are not business-process related. There are two kinds of traits that define such e-mails:
1) Non-Business (Irrelevant) E-Mail Address
Any e-mail conversation that involves an address that is marked as irrelevant (see below) will be marked as an irrelevant message.
An address is marked as irrelevant address according to the following criteria:
2) Non-Business (Irrelevant) E-Mail Message
An e-mail message is marked as irrelevant according to the following criteria:
An e-mail can be of one or more special types. The classification to these special types helps tool 22 to understand the relevancy of an e-mail to a business interaction.
An e-mail message is marked as an automatic âOut of officeâ message according to the following features:
An e-mail message is marked as scheduling message, according to the following features:
An e-mail that matches one of the delivery notification patterns is considered to be a âdelivery status notification.â The list of patterns may be generated from open sources as well as from company proprietary data.
An e-mail is considered as âdraftâ if it appears only in a single user mailbox and is marked as a draft e-mail by the e-mail server.
As part of the matching process, the system determines whether a given list of domains is related to a given textual value (e.g., company name).
Unifying domain Names
As a part of the matching process, a unified domain name is used. The system unifies the domain name by following these steps:
The steps for finding a match between an input string and a domain name:
Tool 22 checks all combinations of abbreviations of these words against the domain name, and whether any abbreviation of at least two characters long is found as a substring of the domain name or whether the domain name is found as a substring of the abbreviation. Tool 22 identifies a match if such combination was found, and the ratio between the lengths of the domain name and the abbreviation is above a certain threshold.
1. A method for data processing, comprising:
obtaining from each organization among multiple different organizations:
a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record comprising multiple CRM fields containing first data; and
a respective second plurality of communication records belonging to the organization, including a set of the communication records that have been assigned to respective CRM records within the first plurality, each communication record comprising multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication;
computing respective reference vectors of feature values for the communication records in the set, each feature value indicating a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned;
computing global weights for the feature values by evaluating the reference vectors computed for all of the different organizations;
computing, for each organization, respective company weights for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization;
computing, for each person belonging to a given organization, respective user weights for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user;
computing further vectors of the feature values for the communication records that are not in the set, and applying the global weights, company weights, and user weights to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records; and
inserting entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
2. The method according to claim 1, wherein computing the further vectors of the feature values comprises computing a respective vector for each communication record and each of one or more candidate CRM records for assignment of the communication record thereto, and wherein assigning the communication record comprises applying the respective vector in deciding whether to assign the communication record to each of the candidate CRM records.
3. The method according to claim 2, wherein applying the respective vector comprises computing a weighted sum over the feature values in the respective vector using the global weights, company weights, and user weights, and assigning the communication record to a candidate CRM record if the weighted sum meets a predefined criterion.
4. The method according to claim 1, wherein computing the global weights, company weights, and user weights comprises applying a support vector machine to the reference vectors of the feature values for the communication records in the set in order to define classifiers that assign the communication records to the CRM records based on the global weights, company weights, and user weights.
5. The method according to claim 1, wherein computing the company weights comprises, after computing the global weights, evaluating the reference vectors computed over the CRM records and communication records belonging to each organization in order to find differences between the global weights and the company weights for the organization.
6. The method according to claim 5, wherein computing the user weights comprises, after computing the company weights for the given organization, evaluating the reference vectors computed over the communication records belonging each person belonging to the given organization in order to find differences between the company weights and the user weights for the person.
7. The method according to claim 1, wherein obtaining the CRM records comprises identifying in the CRM records at least some of the communication records that have been associated with the respective CRM records, and including the identified communication records in the set for use in computing the weights.
8. The method according to claim 1, wherein obtaining the CRM records comprises, after inserting entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto, identifying the inserted entries to which modifications were made, after insertion in the CRM records, by users of the CRM records, and wherein computing the global weights, company weights, and user weights comprises applying the modifications in determining the weights.
9. The method according to claim 8, wherein the modifications comprise deletion of the assigned communication records, reassignment of the assigned communication records to other CRM records, and changes in entries in the CRM records corresponding to the assigned communication records.
10. The method according to claim 1, wherein the vectors comprise feature values indicative of personas occurring both in the given communication record and the respective CRM record.
11. The method according to claim 1, wherein the vectors comprise feature values indicative of a relation between a date of the given communication record and timeline of the respective CRM record.
12. The method according to claim 1, wherein the vectors comprise feature values indicative of company and product names occurring both in the given communication record and the respective CRM record.
13. Data processing apparatus, comprising:
a memory, configured to store records belonging to multiple different organizations, the records comprising, for each organization:
a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record comprising multiple CRM fields containing first data; and
a respective second plurality of communication records belonging to the organization, including a set of the communication records that have been assigned to respective CRM records within the first plurality, each communication record comprising multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication; and
a processor, which is configured to compute respective reference vectors of feature values for the communication records in the set, each feature value indicating a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned,
wherein the processor is configured to compute global weights for the feature values by evaluating the reference vectors computed for all of the different organizations, to compute, for each organization, respective company weights for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization, and to compute, for each person belonging to a given organization, respective user weights for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user, and
wherein the processor is configured to compute further vectors of the feature values for the communication records that are not in the set, to apply the global weights, company weights, and user weights to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records, and to insert entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
14. The apparatus according to claim 13, wherein the further vectors of the feature values comprise a respective vector computed by the processor for each communication record and each of one or more candidate CRM records for assignment of the communication record thereto, and wherein the processor is configured to apply the respective vector in deciding whether to assign the communication record to each of the candidate CRM records.
15. (canceled)
16. The apparatus according to claim 13, wherein the processor is configured to apply a support vector machine to the reference vectors of the feature values for the communication records in the set in order to define classifiers that assign the communication records to the CRM records based on the global weights, company weights, and user weights.
17. The apparatus according to claim 13, wherein the processor is configured, after computing the global weights, to evaluate the reference vectors computed over the CRM records and communication records belonging to each organization in order to find differences between the global weights and the company weights for the organization.
18. (canceled)
19. The apparatus according to claim 13, wherein the processor is configured to identify in the CRM records at least some of the communication records that have been associated with the respective CRM records, and to include the identified communication records in the set for use in computing the weights.
20. The apparatus according to claim 13, wherein the processor is configured, after inserting entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto, to identify the inserted entries to which modifications were made, after insertion in the CRM records, by users of the CRM records, and to apply the modifications in determining the weights.
21. (canceled)
22. The apparatus according to claim 13, wherein the vectors comprise feature values indicative of personas occurring both in the given communication record and the respective CRM record.
23-24. (canceled)
25. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to obtain records belonging to multiple different organizations, the records comprising, for each organization:
a respective first plurality of customer relationship management (CRM) records belonging to the organization, each CRM record comprising multiple CRM fields containing first data; and
a respective second plurality of communication records belonging to the organization, including a set of the communication records that have been assigned to respective CRM records within the first plurality, each communication record comprising multiple communication record fields containing second data relating to a communication and identifying at least a user of the communication,
wherein the instructions cause the computer to compute respective reference vectors of feature values for the communication records in the set, each feature value indicating a degree of correspondence between a second datum in a specified communication record field of a given communication record and a first datum in a specified CRM field of a respective CRM record to which the given communication record has been assigned,
wherein the instructions cause the computer to compute global weights for the feature values by evaluating the reference vectors computed for all of the different organizations, to compute, for each organization, respective company weights for the feature values by evaluating specifically the reference vectors computed over the CRM records and communication records belonging to the organization, and to compute, for each person belonging to a given organization, respective user weights for the feature values by evaluating specifically the reference vectors computed over the communication records that identify the person as the user, and
wherein the instructions cause the computer to compute further vectors of the feature values for the communication records that are not in the set, to apply the global weights, company weights, and user weights to the further vectors in order to assign the communication records that are not in the set to respective ones of the CRM records, and to insert entries in the respective ones of the CRM records to indicate the communication records that have been assigned thereto.
26-36. (canceled)