🔗 Share

Patent application title:

TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING

Publication number:

US20250371437A1

Publication date:

2025-12-04

Application number:

18/680,861

Filed date:

2024-05-31

Smart Summary: Techniques are developed to help machine learning models suggest the best workflow for users of software applications. First, a training data set is created, which includes user features, workflow steps, and a reward metric. Next, different sets of hyperparameters are generated to train various predictive models. The best hyperparameter set is chosen based on how well each model performs. Finally, a machine learning model is trained using the selected hyperparameters and the training data, and then it is put into use. 🚀 TL;DR

Abstract:

Aspects of the present disclosure provide techniques for training and using machine learning models to predict and present an optimal workflow to a user of a software application. An example method generally includes generating a training data set including a plurality of exemplars including features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric. A plurality of hyperparameter sets for training a plurality of predictive models is generated. The plurality of predictive models are trained based on the plurality of hyperparameter sets. A hyperparameter set from the plurality of hyperparameter sets is selected based on performance metrics for each of the plurality of predictive models. A machine learning model is trained based on the selected hyperparameter set and the training data set, and the trained machine learning model is deployed.

Inventors:

Divya BEERAM 4 🇺🇸 Fremont, CA, United States
Nan Jiang 3 🇺🇸 Mountain View, CA, United States
Zhao HU 2 🇺🇸 Fremont, CA, United States
Aakanksha SAH 2 🇺🇸 Thousand Oaks, CA, United States

Piyush CHOUDHARY 2 🇺🇸 San Jose, CA, United States
Yashwanth MUSIBOYINA 2 🇺🇸 San Jose, CA, United States
Siwei (Stephen) YU 2 🇨🇦 Ontario, Canada

Applicant:

Intuit Inc. 🇺🇸 Mountain View, CA, United States

Interested in similar patents?

Get notified when new applications in this technology area are published.

Create Free Alert

Classification:

G06N20/20 » CPC main

Machine learning Ensemble learning

Description

INTRODUCTION

Aspects of the present disclosure relate to machine learning models.

BACKGROUND

Software applications can be consumed on a variety of devices, including desktop computers, laptops, tablets, smartphones, and the like. These applications may be native applications (e.g., applications for which an executable file is built specifically for that platform), web components hosted in a native application, or web applications in which data provided by a user is processed remotely. Generally, these applications implement various workflows which can be decomposed into a plurality of mini jobs (also referred to as workflow steps, sub-workflows, etc.) which can be shown in an arbitrary order. As the number of mini jobs included in a workflow increases, the number of sequences in which these mini jobs can be displayed to a user of the software application may correspondingly increase. For example, for a workflow including three mini jobs, there are six possible sequences; for a workflow including four mini jobs, there are ten possible sequences; for a workflow including five mini jobs, there are fifteen possible sequences.

Different users of the software application may respond differently to different sequences of mini jobs in a workflow. For example, users with certain characteristics or associated with an entity with certain characteristics may respond differently to one sequence of mini jobs than to another sequence of mini jobs (e.g., may complete a workflow if a first sequence of mini jobs is presented to the user but may not complete the workflow if a second sequence of mini jobs is presented to the user).

Accordingly, techniques for presenting effective workflow sequences to a user of a software application are needed.

BRIEF SUMMARY

Certain embodiments provide a computer-implemented method for training predictive models to predict and present an optimal workflow to a user of a software application. An example method generally includes generating a training data set including a plurality of exemplars. Each exemplar generally includes features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric associated with the user of the software application and the sequence of workflow steps. A plurality of hyperparameter sets for training a plurality of predictive models is generated, with the plurality of predictive models being trained to identify a sequence of workflow steps to present to users of the software application. The plurality of predictive models are trained based on the plurality of hyperparameter sets. A hyperparameter set from the plurality of hyperparameter sets is selected based on performance metrics for each of the plurality of predictive models. A machine learning model is trained based on the selected hyperparameter set and the training data set, and the trained machine learning model is deployed.

Certain embodiments provide a computer-implemented method for using a predictive model to predict and present an optimal workflow to a user of a software application. An example method generally includes receiving, from a user of a software application, a request to execute a workflow in the software application. Using a predictive model and features associated with the user of the software application, a workflow sequence that maximizes a reward metric for the user of the software application is generated. The generated workflow sequence is executed.

Other embodiments provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.

The following description and the related drawings set forth in detail certain illustrative features of one or more embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The appended figures depict certain aspects of the one or more embodiments and are therefore not to be considered limiting of the scope of this disclosure.

FIG. 1 depicts an example computing environment in which predictive machine learning models are trained and used to predict optimal workflows to present to users of a software application based on a reward metric, according to embodiments of the present disclosure.

FIG. 2 illustrates example operations for training machine learning models based on a plurality of hyperparameter sets to predict optimal workflows to present to users of a software application based on a reward metric, according to embodiments of the present disclosure.

FIG. 3 illustrates example operations for deploying a workflow sequence to a user of a software application using a machine learning model trained to predict an optimal workflow for the user of the software application, according to embodiments of the present disclosure.

FIG. 4 illustrates an example system on which embodiments of the present disclosure can be performed.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.

DETAILED DESCRIPTION

As discussed, software applications may implement workflows as a series of mini jobs that can be presented to a user of these software applications in order to allow the user to perform the workflow. In many cases, these workflows may be order-invariant or at least partially order-invariant, such that a user may perform any sequence of mini jobs in order to complete the workflow. However, different users may react differently to different sequences of mini jobs. For example, when a user is using a software application for the first time, the wide range of features and presented recommendations may not be conducive to allowing the user to use the software application efficiently and effectively. Further, in many cases, the software application may not have sufficient information from historical user activity to effectively customize the application or the order in which workflow sequences are presented to the user such that the user can efficiently and effectively use the software application.

Because the software application may not have sufficient information to allow for customization of the user experience and the order in which the mini jobs of a workflow are presented to the user, the software application may present workflows to a user of the software application according to an a priori defined sequence of mini jobs or to a randomly selected sequence of mini jobs. In doing so, the software application may not effectively present a workflow to a user of the software application that would address the user's preferences and thus allow the user to efficiently and effectively use the software application. Further, many machine learning models that are used in software applications to customize the behavior of these software applications are descriptive models that predict future behavior. While predicting future behavior may be useful in various tasks such as fraud detection, autocompletion, or the like, these descriptive models generally result in the execution of reactive actions that address what has previously occurred as opposed to proactive actions that potentially have an effect on future outcomes.

Embodiments of the present disclosure provide techniques for training and using machine learning models to predict sequences of workflows in a software application that are likely to allow a user of the software application to efficiently and effectively use the software application. In particular embodiments, multiple machine learning models may be trained using techniques described herein based on unique sets of hyperparameters and other training parameters, and the multiple machine learning models may be examined to identify a model and hyperparameter set which results in optimal inference performance such that the identified hyperparameter set may be by used to train a machine learning model that is ultimately deployed.

As discussed in further detail herein, a reward metric may be defined for use in optimizing the workflow presented to the user of the software application. This reward metric may, for example, be based on a difference between a value of a parameter for users who have not completed the workflow and a value of the parameter for users who have completed the workflow, such that the reward metric serves as a proxy for some underlying or corresponding metric. To train each of the multiple machine learning models, sequences of mini jobs of a workflow may be randomly presented to users to gather training data used to train the machine learning model (e.g., to identify a workflow sequence for a given user's attributes that maximizes the reward metric for the user). After a sufficient number of samples have been gathered for the training data set, the multiple machine learning models may be trained to generate a workflow sequence that, as discussed, maximizes the reward metric for the user. The multiple trained machine learning models may be examined to determine which model (and, consequently, which hyperparameter set) results in optimal inference performance, and the selected hyperparamter set may be used to train a machine learning model, and the trained trained machine learning model may be deployed for use within the application.

By training such an optimized machine learning model with an optimized hyperparameter set as described herein, aspects of the present disclosure may dynamically generate workflow sequences in an optimal manner for different users of the software application based on user-specific prioritization of different mini jobs within the workflow, which, as discussed, generally allows for the generation and presentation of user interfaces and workflow steps that are likely to result in the user being able to effective and efficiently use the software application. Thus, aspects of the present disclosure may reduce the number of requests for new user interfaces generated by a user of the software application resulting from, for example, the user of a software application hunting for a feature, which may reduce the amount of computing resources (e.g., messaging bandwidth, power, etc.) consumed in rendering user interfaces and displaying information in a software application relevant to a specific user of the software application. Still further, aspects of the present disclosure may allow for relevant portions of a workflow to be proactively presented to a user of a software application, which may reduce the amount of user navigation through different portions of a workflow and also reduce the amount of computing resources (e.g., messaging bandwidth, power, etc.) consumed in rendering user interfaces and displaying information in a software application relevant to a specific user of the software application. By training and evaluating multiple machine learning models with different hyperparameter sets in order to select a model having the most optimized hyperparameter set for such a task, techniques described herein further avoid inaccuracies and resource inefficiencies that would otherwise be associated with use of a suboptimal machine learning model (e.g., trained with a suboptimal hyperparameter set).

Example Generating Workflow Sequences in Software Applications Using Machine Learning Models

FIG. 1 illustrates an example computing environment 100 in which machine learning models are trained and used to identify an optimal workflow for a user of a software application based on maximization of a reward metric, according to aspects of the present disclosure. As illustrated, computing environment 100 includes an application server 110, a client device 120, and a user data repository 130.

Application server 110 is generally representative of a computing system, such as a server, a cloud compute instance, or the like, which can train machine learning models and hosts a software application including a machine learning model that may be accessed by users of a client device 120 in the computing environment 100. As illustrated, application server 110 includes a workflow sequence generator 112, an application 114, and a predictive model trainer 116.

Generally, the workflow sequence generator 112 allows for the creation of a training data set by generating random sequences of workflow sequences for users of the application 114 to use to complete a workflow during an initial stage of deployment of the application 114 on the application server 110. In some aspects, the workflow sequence generator 112 can associate cach variation of a workflow sequence (e.g., cach unique ordering of mini jobs in a workflow) with a unique workflow sequence index and randomly select a workflow sequence index to present to a user of the application 114. In some aspects, the workflow sequence generator 112 can randomly generate a sequence by randomly selecting mini jobs to include in a sequence. Random selection of a sequence by the workflow sequence generator 112 may continue until a machine learning model that predicts an optimal sequence for a user of the application 114 is trained, or until a threshold number of samples is acquired (e.g., at least x samples per unique sequence of mini jobs).

After the workflow sequence generator 112 generates a workflow sequence for a user of the software application, the workflow sequence generator 112 outputs information identifying the generated sequence to the application 114. The application 114 outputs the workflow sequence to the application 122 executing on the client device 120. In response, the application 114 receives user-provided data and other interaction data which can be committed to the user data repository 130 and used to train multiple machine learning models (also referred to as predictive models) to predict an optimal workflow for a user of the application 114 based on maximization (or conversely, minimization) of a reward metric defined for the application 114 or the workflow thereof.

During execution of the application 114, a reward metric may be monitored for each user who has been presented a workflow sequence for execution. Generally, the reward metric may be linked to a metric measured for users who have completed the workflow sequence and a corresponding metric for users who have not completed the workflow sequence. In some aspects, the reward metric may be an a priori defined value ri for each action in a set of actions associated with the workflow sequence, such that completion of a mini job in a workflow sequence is associated with a reward metric of ri=N, N∈, and non-completion of the mini job in the workflow sequence is associated with a reward metric of ri=0. In some aspects, the reward metric may differ for each mini job in a workflow sequence, with some mini jobs in a workflow sequence (e.g., mini jobs associated with high user retention or a demonstrated history of being associated with user value, mini jobs that are not commonly completed by users of the software application or show a significant difference in performance users between users who have completed a mini job and users who have not completed a mini job) being assigned higher values than other mini jobs (e.g., mini jobs that are commonly completed by users of the software application). In another example, consider an accountancy application in which users have the ability to classify or otherwise assign categories to transactions recorded in an accountancy ledger. A reward metric may be based, for example, on revenue or profitability metrics for users who have completed a transaction categorization workflow for one or more transactions recorded in the application and revenue or profitability metrics for users who have not completed a transaction categorization workflow for any transaction recorded in the application. It should be recognized that the foregoing are merely examples of reward metrics which can be monitored and logged in order to train and/or refine a predictive model, and the use and logging of other reward metrics for generating a training data set for a predictive model may be contemplated.

As users execute workflows, the application 114 can log historical user activity data and other user data and commit the user data to the user data repository 130 for the predictive model trainer 116 to use in training machine learning models to predict an optimal workflow for other users of the application 114. Generally, after the initial data acquisition process is completed (e.g., to generate a training data set used to train the machine learning models), a total number of samples N_smay be recorded across the N variants of the workflow. Each workflow variant may be presented 1/N_stimes during the initial data acquisition process. By randomly generating and presenting workflow variants to users of the application 114, the workflow sequence generator 112 and the application 114 can ensure that a sufficiently large training data set is available for the predictive model trainer 116 to train predictive models.

In some aspects, the training data set generated based on the logged historical user activity data and other user data committed to the user data repository 130 may be a series of n-tuples including a set of features associated with a user of the application 114, information identifying the workflow sequence presented to the user, information identifying the actions performed by the user, and a reward metric associated with the actions performed by the user. Generally, the features associated with the user of the application may include features which are relevant to a specific workflow and which are available in the user data repository 130 (or other data repositories) to include in an n-tuple. The features may include static features, such as user profile information that is fixed a priori, and dynamic features, such as user activity within the application 114 (e.g., clickstream data, search history, other time-series data describing how the user has previously used the application, etc.). In an accountancy application, for example, the features associated with the user of the application 114 may include static features such as information identifying the industry classification for the user's organization, organization size features (e.g., number of employees, revenue or profit metrics, etc.), age data, and/or the like, and/or dynamic features (e.g., user activity data, as discussed above).

The predictive model trainer 116 uses the historical user activity data committed to the user data repository 130 to train predictive models, each of which is configured to allow the workflow sequence generator 112 and/or application 114 to proactively predict an optimal workflow for a user of the software application. As discussed, an optimal workflow for a user of a software application may be a workflow that results in the maximization, or at least optimization, of a reward metric, where the reward metric serves as a proxy metric that measures (or at least indicates) a likelihood that the user will be able to use the application 114 efficiently and effectively when presented with a given workflow sequence.

Generally, the predictive model trainer 116 can train a predictive model as a causal model that ingests the user features and variants of workflow sequences and outputs information identifying the workflow sequence the optimizes the reward metric. Generally, the optimization of the reward metric may be an optimization of the sum of a reward derived from each mini job (also referred to as an “action”) performed by the user of the application 114, assuming that the user performs each action in the identified workflow sequence.

In some aspects, the predictive model trainer 116 can train a given machine learning model using uplift modeling techniques. By using uplift modeling techniques, aspects of the present disclosure can model the increment impact of each action performed by a user of the application 114. To train a machine learning model using uplift modeling techniques, the model may be trained as a single-learner uplift model so that the training data set generated by the application 114 is not split, causing model accuracy to decrease due to data scarcity. The learner used in the sing-learner uplift model may be a tree-based model, such as a gradient boosting tree or the like, which results in a model that maps user features and a variant of a workflow step to a total predicted reward. In other aspects, the predictive model trainer 116 can train a given machine learning model as a long-short term memory (LSTM) model that account for timing relationships between actions performed within the application 114, deep learning models, ensemble models, or other machine learning models that can predict an optimal workflow sequence for the user of the application 114 (e.g., the workflow sequence that results in the highest expected total reward, assuming user completion of each action or mini job within the workflow sequence).

In some aspects, to train a machine learning model, the predictive model trainer 116 can generate a number S of seeds {S_i} for each of a plurality of hyperparameter combinations in a hyperparameter search space {H_i} randomly for use in generating hyperparameters and other parameters for training the machine learning model. For example, for a tree-based machine learning model, the hyperparameter sets H in the search space may be a set of parameters including a tree depth, a number of trees, a gamma (or regularizing) parameter, and the like.

For each hyperparameter set H in the hyperparameter search space {H_i}, the predictive model trainer 116 may train S models using the seeds in {S_i} as a seed for a randomizer that selects various configuration parameters for a machine learning model. These configuration parameters may include, for example, parameters defining the testing, training, and validation splits from a training data set, a gradient boosting split strategy, and other parameters that can be randomly generated. As a result, the predictive model trainer may train S*{H_i} models out of which a hyperparameter set can be selected for use in ultimately training the model deployed by the predictive model trainer 116 for use in identifying an optimal workflow to present to a user of a software application.

To determine which hyperparameters are to be used to train the machine learning model deployed by the predictive model trainer 116, performance metrics may be defined for performant models and non-performant models. In one example, a performant model may be associated with a t-statistic p-value that is less than a high performance threshold value, and a non-performant model may be associated with a t-statistic p-value that is less than a low performance threshold value. The t-statistic may be calculated, for example, based on an average reward over all points in the training data set predicted by a machine learning model and an average reward over points in the training data set where the predicted workflow sequence generated by the machine learning model matches a baseline workflow sequence or baseline policy for generating a workflow sequence for a user of the software application.

Based on the t-statistic p-value for each of the S*{H_i}, the predictive model trainer 116 can identify which hyperparameter set of the {H_I} hyperparameter sets results in a causal model that has the highest inference performance. Within a hyperparameter set H∈{H_i}, the ratio of performant models and the ratio of non-performant models may be calculated. The ratio of performant models may be the ratio of the number of models having a t-statistic p-value less than the high performance threshold value and having no negative statistical measurements to the total number of models S, and the ratio of non-performant models may be the ratio of the number of models having a t-statistic p-value less than the low performance threshold value and having at least one negative statistical measurement to the total number of models S. Because a hyperparameter set should not result in models that perform poorly, the predictive model trainer 116 can eliminate hyperparameter sets in {H_i} that has a ratio of non-performant models above 0 or some other threshold value. Similarly, a hyperparameter set should result in a large number of models that are performant and thus have high predictive value; thus, the predictive model trainer 116 can identify, from the hyperparameter sets in {H_i} that have not been eliminated for having a ratio of non-performant models exceeding a non-performant model threshold, the hyperparameter set H* having the highest ratio of performant models to total models. The selected hyperparameters set H* may subsequently be used to train the machine learning model, using the techniques discussed above, and the machine learning model trained using the selected hyperparameter set H* may be deployed to the workflow sequence generator 112 and/or application 114 for use in identifying an optimal workflow for a user of the software application.

The predictive model trainer 116 can deploy the trained machine learning model to the workflow sequence generator 112 and/or the application 114 for subsequent workflow sequence generation for users of the application 114. When a user uses the application 114 (or specific portions thereof), the workflow sequence generator 112 and/or application 114 can use the machine learning model to generate a workflow sequence that maximizes the user's reward metric, assuming the completion of each action or mini job within the workflow sequence (though not necessarily in order of completion). That is, the trained machine learning model may model an outcome (e.g., the total reward metric generated by performing cach action within a given workflow sequence) as a function of user features and a workflow sequence. The model may seek the variant of the workflow sequence that maximizes the outcome (e.g., the total reward metric) and return the variant of the workflow sequence that maximizes the outcome.

The machine learning models trained by the predictive model trainer 116 and deployed to one or both of the workflow sequence generator 112 and/or the application 114 for workflow sequence generation may be used in a variety of points within the application 114. In one example, the machine learning models trained by the predictive model trainer 116 can be used when a new user begins using the application 114. After the user has provided some basic user information (which can be used as feature inputs into the machine learning model), the machine learning model can predict which variant of an initial attachment or enrolment workflow that results in the maximization of a reward metric (or is likely to maximize the reward metric). The identified variant of the attachment or enrolment workflow sequence may be executed by one or both of the application 114 and/or the application 120 executing on the client device 120 (which is representative of a variety of client devices which can access an application 114 executing on a remote server, such as a smartphone, a tablet computer, a desktop computer, or the like). In another example, the machine learning models trained by the predictive model trainer 116 may be used when a user begins using a new portion of the application or otherwise uses features that the user has not used before and/or which may be new to the user (e.g., in a more fully featured version of the application 114 to which the user may have upgraded).

Example Methods for Training Machine Learning Models to Predict Optimal Workflow Sequences in Software Applications

FIG. 2 illustrates example operations 200 that may be performed to train machine learning models to predict optimal workflows to present to users of a software application based on a reward metric, according to embodiments of the present disclosure. Operations 200 may be performed by any computing device which can train and use one or more machine learning models to predict an optimal workflow for a user of a software application based on a training data set of captured user data, such as the application server 110 illustrated in FIG. 1.

As illustrated, operations 200 begin at block 210 with generating a training data set including a plurality of exemplars. Generally, each exemplar includes features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric associated with the user of the software application and the sequence of workflow steps. As discussed, the training data set may be generated based on random presentation of sequences of workflow steps (or mini jobs) to users of the software application over a period of time. The training data set may, in some aspects, include an equal, or at least roughly equal, number of samples for each possible sequence of workflow steps which can be presented to a user of the software application.

At block 220, the operations 200 proceed with generating a plurality of hyperparameter sets for training a plurality of predictive models for identifying a sequence of workflow steps to present to users of the software application. In some aspects, the plurality of hyperparameter sets may include hyperparameter sets in a hyperparameter search space {H_i}, with the hyperparameter search space being a multidimensional space including a dimension for each of a plurality of hyperparameters used to train a model. For example, for a tree-based model, the hyperparameter search space may include dimensions for tree depth, a number of trees, gamma (tree split value), and the like.

In some aspects, generating the plurality of hyperparameter sets includes generating, for each respective hyperparameter set of the plurality of hyperparameter sets, a respective random seed value for separating the training data set into a training subset and a validation subset.

At block 230, the operations 200 proceed with training the plurality of predictive models based on the plurality of hyperparameter sets.

At block 240, the operations 200 proceed with selecting a hyperparameter set from the plurality of hyperparameter sets based on performance metrics for each of the plurality of predictive models.

In some aspects, selecting the hyperparameter set from the plurality of hyperparameter sets comprises calculating, for each respective predictive model of the plurality of predictive models, one or more error metrics between an average reward over a plurality of sequences of workflow steps and an average reward for outputs of the respective predictive model matching a defined baseline policy.

In some aspects, selecting the hyperparameter set comprises discarding hyperparameter sets associated with predictive models having a predictive value below a threshold value with at least one error metric of the one or more error metrics being a negative value.

In some aspects, selecting the hyperparameter set comprises selecting the hyperparameter set having a ratio of performant models to total models exceeding a threshold value and no nonperformant models.

At block 250, the operations 200 proceed with training a machine learning model based on the selected hyperparameter set and the training data set.

At block 260, the operations 200 proceed with deploying the machine learning model.

Example Methods for Deploying Optimal Workflow Sequences in Software Applications Using Machine Learning Models

FIG. 3 illustrates example operations 300 for deploying a workflow sequence to a user of a software application using a machine learning model trained to predict an optimal workflow for the user of the software application, according to embodiments of the present disclosure. Operations 300 may be performed by any computing device which can use one or more machine learning models to predict an optimal workflow for a user of a software application based on a training data set of captured user data, such as the application server 110 illustrated in FIG. 1.

As illustrated, operations 300 begin at block 310, with receiving, from a user of a software application, a request to initiate a workflow in the software application.

In some aspects, the request to initiate the workflow in the software application may be received implicitly as part of an initialization process for the user in the software application. The initialization process for the user may, for example, be a process that is executed when the user uses the application for the first time. In another example, the initialization process may be a process that is executed when the user uses a feature within the software application for the first time.

In some aspects, the request to initiate the workflow in the software application may be an explicit request to execute a specific workflow in the software application.

At block 320, the operations 300 proceed with generating, using a predictive model and features associated with the user of the software application, a workflow sequence that maximizes a reward metric for the user of the software application.

In some aspects, the predictive model may be a machine learning model trained to output a predicted workflow sequence based on user features, with the predicted workflow sequence maximizing the reward metric for the user of the software application. The user features which may be input into the machine learning model may include static features associated with the user of the workflow and dynamic features associated with the user of the workflow. The static features may include, for example, features derived from a priori defined data associated with the user, such the size and age of an organization with which the user is associated, The dynamic features may include, for example, time-series data associated with user activity within the application, such as a search history, clickstream history, or the like.

In some aspects, the reward metric may be a cumulative reward metric calculated over each step in the predicted workflow sequence. The cumulative reward metric may be generated using a common reward value assigned to each step (or mini job) in the workflow. In some aspects, the cumulative reward metric may be generated using a unique reward value that is assigned to each respective step in the workflow. In some aspects, the reward metric may correspond to a predicted increase in a user metric assuming user completion of each step in the workflow, such as a predicted increase in revenue for the user's organization or the like.

At block 330, the operations 300 proceed with executing the generated workflow sequence.

Example System for Generating and Rendering User Interfaces in Software Applications Using Machine Learning Models

FIG. 4 illustrates an example system 400 in which user interface definitions are generated in response to receipt of an input query for data from a software application using machine learning models. System 400 may correspond to the application server 110 illustrated in FIG. 1. In some aspects, system 400 may perform the methods as described with respect to FIG. 2.

As shown, system 400 includes a central processing unit (CPU) 402, one or more I/O device interfaces 404 that may allow for the connection of various I/O devices 414 (e.g., keyboards, displays, mouse devices, pen input, etc.) to the system 400, network interface 406 through which system 400 is connected to network 490 (which may be a local network, an intranet, the internet, or any other group of computing devices communicatively connected to each other), a memory 408, and an interconnect 412.

CPU 402 may retrieve and execute programming instructions stored in the memory 408. Similarly, the CPU 402 may retrieve and store application data residing in the memory 408. The interconnect 412 transmits programming instructions and application data, among the CPU 402, I/O device interface 404, network interface 406, and memory 408.

CPU 402 is included to be representative of a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like.

Memory 408 is representative of a volatile memory, such as a random access memory, or a nonvolatile memory, such as nonvolatile random access memory, phase change random access memory, or the like. As shown, memory 408 includes a workflow sequence generator 420, an application 430, a predictive model trainer 440, and a user data repository 450.

Workflow sequence generator 420 generally corresponds to the workflow sequence generator 112 illustrated in FIG. 1. As discussed, the workflow sequence generator 420 may operate differently depending on whether a machine learning model that predicts optimal workflows to execute in the application 430 has been trained and deployed. If a machine learning model that predicts optimal workflows has not been trained and deployed, the workflow sequence generator 420 can randomly generate a workflow sequence for a requested workflow and instruct the application 430 to execute the randomly generated workflow sequence. If a machine learning model that predicts optimal workflows has been trained and deployed, the workflow sequence generator 420 can use the machine learning model to identify an optimal workflow for a user of the software application based on user features and a reward maximization strategy. The user features may include static features associated with the user or the user's organization and dynamic features associated with the user. These static features, in one example, may include features derived from a priori defined data associated with the user, such the size and age of an organization with which the user is associated, The dynamic features may include, for example, time-series data associated with user activity within the application, such as a search history, clickstream history, or the like.

Application 430 generally corresponds to the application 114 illustrated in FIG. 1. Generally, the application 114 presents a workflow sequence generated by the workflow sequence generator 420 to a user of the application (e.g., via transmission of a user interface defining the workflow sequence to a client device, such as the client device 120 illustrated in FIG. 1, via the network interface 406). During execution of the workflow, the application 430 captures user data which is committed to the user data repository 450 (which corresponds to the user data repository 130 illustrated in FIG. 1) for use in generating a training data set that can be used to train the machine learning models used by the workflow sequence generator 420 to identify a workflow sequence which results in the highest predicted reward metric value for the user of the application.

Predictive model trainer 440 generally corresponds to the predictive model trainer 116 illustrated in FIG. 1. The predictive model trainer 440 generally generates a training data set of user data and a workflow sequence presented to a user of the software application, mapped to a reward metric corresponding to a total reward obtained by the user assuming completion of the workflow. The predictive model trainer trains the machine learning model to be a prescriptive model using techniques such as uplift modeling techniques. After training the machine learning model, the predictive model trainer 440 deploys the machine learning model for use in predicting the optimal workflow to present to a user of the application 430.

Note that FIG. 4 is just one example of a system, and other systems including fewer, additional, or alternative components are possible consistent with this disclosure.

Example Clauses

Implementation Examples Are Described in the Following Numbered Clauses

Clause 1: A processor-implemented method, comprising: generating a training data set including a plurality of exemplars, each exemplar including features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric associated with the user of the software application and the sequence of workflow steps; generating a plurality of hyperparameter sets for training a plurality of predictive models for identifying a sequence of workflow steps to present to users of the software application; training the plurality of predictive models based on the plurality of hyperparameter sets; selecting a hyperparameter set from the plurality of hyperparameter sets based on performance metrics for cach of the plurality of predictive models; training a machine learning model based on the selected hyperparameter set and the training data set; and deploying the machine learning model.

Clause 2: The method of Clause 1, wherein generating the plurality of hyperparameter sets comprises generating, for cach respective hyperparameter set of the plurality of hyperparameter sets, a respective random seed value for separating the training data set into a training subset and a validation subset.

Clause 3: The method of any of Clauses 1 or 2, wherein: the predictive models comprise tree-based uplift models, and each hyperparameter set of the plurality of hyperparameter sets comprise a tree depth parameter, a number of trees parameter, and a tree split value parameter.

Clause 4: The method of any of Clauses 1 through 3, wherein training the plurality of predictive models comprises training a model to maximize the reward metric.

Clause 5: The method of any of Clauses 1 through 4, wherein selecting the hyperparameter set from the plurality of hyperparameter sets comprises calculating, for each respective predictive model of the plurality of predictive models, one or more error metrics between an average reward over a plurality of sequences of workflow steps and an average reward for outputs of the respective predictive model matching a defined baseline policy.

Clause 6: The method of Clause 5, wherein selecting the hyperparameter set further comprises discarding hyperparameter sets associated with predictive models having a predictive value below a threshold value with at least one error metric of the one or more error metrics being a negative value.

Clause 7: The method of any of Clauses 1 through 6, wherein selecting the hyperparameter set comprises selecting the hyperparameter set having a ratio of performant models to total models exceeding a threshold value and no nonperformant models.

Clause 8: A system, comprising: a memory having executable instructions stored thereon; and a processor configured to execute the executable instructions to perform the methods of any one of Clauses 1 through 7.

Clause 9: A system, comprising: means for performing the methods of any one of Clauses 1 through 7.

Clause 10: A computer-readable medium having instructions stored thereon which, when executed by a processor, performs the methods of any one of Clauses 1 through 7.

Additional Considerations

The preceding description is provided to enable any person skilled in the art to practice the various embodiments described herein. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.

As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).

As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.

The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.

The various illustrative logical blocks, modules and circuits described in connection with the present disclosure may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

A processing system may be implemented with a bus architecture. The bus may include any number of interconnecting buses and bridges depending on the specific application of the processing system and the overall design constraints. The bus may link together various circuits including a processor, machine-readable media, and input/output devices, among others. A user interface (e.g., keypad, display, mouse, joystick, etc.) may also be connected to the bus. The bus may also link various other circuits such as timing sources, peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further. The processor may be implemented with one or more general-purpose and/or special-purpose processors. Examples include microprocessors, microcontrollers, DSP processors, and other circuitry that can execute software. Those skilled in the art will recognize how best to implement the described functionality for the processing system depending on the particular application and the overall design constraints imposed on the overall system.

If implemented in software, the functions may be stored or transmitted over as one or more instructions or code on a computer-readable medium. Software shall be construed broadly to mean instructions, data, or any combination thereof, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Computer-readable media include both computer storage media and communication media, such as any medium that facilitates transfer of a computer program from one place to another. The processor may be responsible for managing the bus and general processing, including the execution of software modules stored on the computer-readable storage media. A computer-readable storage medium may be coupled to a processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. By way of example, the computer-readable media may include a transmission line, a carrier wave modulated by data, and/or a computer readable storage medium with instructions stored thereon separate from the wireless node, all of which may be accessed by the processor through the bus interface. Alternatively, or in addition, the computer-readable media, or any portion thereof, may be integrated into the processor, such as the case may be with cache and/or general register files. Examples of machine-readable storage media may include, by way of example, RAM (Random Access Memory), flash memory, ROM (Read Only Memory), PROM (Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), registers, magnetic disks, optical disks, hard drives, or any other suitable storage medium, or any combination thereof. The machine-readable media may be embodied in a computer-program product.

A software module may comprise a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media. The computer-readable media may comprise a number of software modules. The software modules include instructions that, when executed by an apparatus such as a processor, cause the processing system to perform various functions. The software modules may include a transmission module and a receiving module. Each software module may reside in a single storage device or be distributed across multiple storage devices. By way of example, a software module may be loaded into RAM from a hard drive when a triggering event occurs. During execution of the software module, the processor may load some of the instructions into cache to increase access speed. One or more cache lines may then be loaded into a general register file for execution by the processor. When referring to the functionality of a software module, it will be understood that such functionality is implemented by the processor when executing instructions from that software module.

The following claims are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.

Claims

What is claimed is:

1. A processor-implemented method, comprising:

generating a training data set including a plurality of exemplars, each exemplar including features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric associated with the user of the software application and the sequence of workflow steps;

generating a plurality of hyperparameter sets for training a plurality of predictive models for identifying a sequence of workflow steps to present to users of the software application;

training the plurality of predictive models based on the plurality of hyperparameter sets;

selecting a hyperparameter set from the plurality of hyperparameter sets based on performance metrics for each of the plurality of predictive models;

training a machine learning model based on the selected hyperparameter set and the training data set; and

deploying the machine learning model.

2. The method of claim 1, wherein generating the plurality of hyperparameter sets comprises generating, for each respective hyperparameter set of the plurality of hyperparameter sets, a respective random seed value for separating the training data set into a training subset and a validation subset.

3. The method of claim 1, wherein:

the predictive models comprise tree-based uplift models, and

each hyperparameter set of the plurality of hyperparameter sets comprise a tree depth parameter, a number of trees parameter, and a tree split value parameter.

4. The method of claim 1, wherein training the plurality of predictive models comprises training a model to maximize the reward metric.

5. The method of claim 1, wherein selecting the hyperparameter set from the plurality of hyperparameter sets comprises calculating, for each respective predictive model of the plurality of predictive models, one or more error metrics between an average reward over a plurality of sequences of workflow steps and an average reward for outputs of the respective predictive model matching a defined baseline policy.

6. The method of claim 5, wherein selecting the hyperparameter set further comprises discarding hyperparameter sets associated with predictive models having a predictive value below a threshold value with at least one error metric of the one or more error metrics being a negative value.

7. The method of claim 1, wherein selecting the hyperparameter set comprises selecting the hyperparameter set having a ratio of performant models to total models exceeding a threshold value and no nonperformant models.

8. A processor-implemented method, comprising:

receiving, from a user of a software application, a request to execute a workflow in the software application;

generating, using a predictive model and features associated with the user of the software application, a workflow sequence that maximizes a reward metric for the user of the software application, the predictive model comprising one of a plurality of predictive models trained based on a set of hyperparameters resulting in a set of predictive models having a highest level of performance; and

executing the generated workflow sequence.

9. The method of claim 8, wherein the features associated with the user of the software application comprise at least one of static features defining characteristics of the user of the software application or dynamic features associated with user activity within the software application.

10. The method of claim 8, wherein the reward metric comprises a cumulative reward metric calculated over each step in the generated workflow sequence.

11. The method of claim 8, wherein the reward metric comprises a total revenue associated with completion of the workflow.

12. A processing system, comprising:

at least one memory having executable instructions stored thereon; and

one or more processors configured to execute the executable instructions to cause the processing system to:

generate a training data set including a plurality of exemplars, each exemplar including features associated a user of a software application, a sequence of workflow steps presented to the user of the software application, and a reward metric associated with the user of the software application and the sequence of workflow steps;

generate a plurality of hyperparameter sets for training a plurality of predictive models for identifying a sequence of workflow steps to present to users of the software application;

train the plurality of predictive models based on the plurality of hyperparameter sets;

select a hyperparameter set from the plurality of hyperparameter sets based on performance metrics for each of the plurality of predictive models;

train a machine learning model based on the selected hyperparameter set and the training data set; and

deploy the machine learning model.

13. The processing system of claim 12, wherein to generate the plurality of hyperparameter sets, the one or more processors are configured to cause the processing system to generate, for each respective hyperparameter set of the plurality of hyperparameter sets, a respective random seed value for separating the training data set into a training subset and a validation subset.

14. The processing system of claim 12, wherein:

the predictive models comprise tree-based uplift models, and

each hyperparameter set of the plurality of hyperparameter sets comprise a tree depth parameter, a number of trees parameter, and a tree split value parameter.

15. rocessing system of claim 12, wherein to train the plurality of predictive models, the one or more processors are configured to cause the processing system to train a model to maximize the reward metric.

16. The processing system of claim 12, wherein to select the hyperparameter set from the plurality of hyperparameter sets, the one or more processors are configured to cause the processing system to calculate, for each respective predictive model of the plurality of predictive models, one or more error metrics between an average reward over a plurality of sequences of workflow steps and an average reward for outputs of the respective predictive model matching a defined baseline policy.

17. The processing system of claim 16, wherein to select the hyperparameter set, the one or more processors are further configured to cause the processing system to discard hyperparameter sets associated with predictive models having a predictive value below a threshold value with at least one error metric of the one or more error metrics being a negative value.

18. The processing system of claim 12, wherein to select the hyperparameter set, the one or more processors are configured to cause the processing system to select the hyperparameter set having a ratio of performant models to total models exceeding a threshold value and no nonperformant models.

Resources

Images & Drawings included:

Fig. 01 - TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING — Fig. 01

Fig. 02 - TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING — Fig. 02

Fig. 03 - TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING — Fig. 03

Fig. 04 - TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING — Fig. 04

Fig. 05 - TRAINING PREDICTIVE MODELS BASED ON REWARD SIGNALS AND HYPERPARAMETER SEARCHING — Fig. 05

Sources:

United States Patent and Trademark Office - verify current appl. status at the USPTO↗

Recent applications in this class:

» 20250371440 2025-12-04
ASYNCHRONOUS GENERATION OF PROVISIONING DATA STRUCTURES AND PROVISIONING TASKS
» 20250371439 2025-12-04
MODEL TRAINING METHOD, MEDIUM, AND ELECTRONIC DEVICE
» 20250371438 2025-12-04
EXPLANATION OF ENSEMBLE MODEL OUTPUT
» 20250371436 2025-12-04
METHOD AND SYSTEM FOR PROVIDING LOCAL AND GLOBAL RECOURSE WITH COUNTERFACTUAL RULES
» 20250363426 2025-11-27
SECURE DATA DESTRUCTION AND TRANSFER SYSTEM WITH ENHANCED AGENT ENCLAVE FOR SAFEGUARDING STORED DECISIONS AND INFERENCES AND METHOD THEREOF
» 20250363425 2025-11-27
MACHINE LEARNING MODEL ANALYSIS AND CLASSIFICATION
» 20250363424 2025-11-27
DYNAMIC REPROVISIONING OF MACHINE LEARNING MODEL LAYERS
» 20250363423 2025-11-27
EXCHANGE MODELER USING AN EXCHANGE PROTECTION ARCHITECTURE
» 20250363422 2025-11-27
COLLABORATIVE MACHINE LEARNING MODEL GENERATION FOR POTENTIAL ACTION SELECTION
» 20250356268 2025-11-20
Secure Container Framework for Embedded AI Micro-Models with Lifecycle and Reasoning