US20250391037A1
2025-12-25
19/229,047
2025-06-05
Smart Summary: An information processing system helps convert hidden data from a world model into accurate physical measurements. It includes a part that finds connections between these physical measurements based on pairs of hidden variables from different times. These pairs are used as input in a regression model. The output shows how the physical quantities relate to each other over time. This technique aims to ensure that the physical quantities obtained are correct and stable. 🚀 TL;DR
An object is to provide a technique capable of stably obtaining a correct physical quantity in a technique of converting a latent variable output from a world model into a physical quantity. An information processing apparatus includes a relationship derivation unit that derives, for an object to be inferred or predicted in a world model, a relationship between physical quantities from a pair of latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input, and having the relationship between the physical quantities respectively corresponding to the different times as an output.
Get notified when new applications in this technology area are published.
G06T7/251 » CPC main
Image analysis; Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
G06T7/75 » CPC further
Image analysis; Determining position or orientation of objects or cameras using feature-based methods involving models
G06T2207/10016 » CPC further
Indexing scheme for image analysis or image enhancement; Image acquisition modality Video; Image sequence
G06T7/246 IPC
Image analysis; Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
G06T7/73 IPC
Image analysis; Determining position or orientation of objects or cameras using feature-based methods
This application is based upon and claims the benefit of priority from Japanese patent application No. 2024-100776, filed on Jun. 21, 2024, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates to an information processing apparatus, an information processing method, and an information processing program.
A world model is known as a technique for inferring or predicting a state of an object included as a subject in a moving image. Examples of documents in which the world model is disclosed include Patent Literature 1.
Patent Literature 1: Japanese Unexamined Patent Application Publication No. 2023-143222
In the world model, the state of the object to be inferred or predicted is represented (encoded) as a latent variable. Therefore, in order to perform information processing using the world model, it is necessary to convert the latent variable output from the world model into a physical quantity such as a position or a pose. However, in a configuration using a regression model that converts a latent variable at a certain time into a physical quantity at the time (particularly, a physical quantity with respect to an absolute coordinate system), a regression result (the physical quantity obtained by the conversion) is unstable in some cases. As an example, in a case where an object has a symmetrical shape, an object coordinate system is unstable, resulting in an unstable regression result. The instability of the regression result is manifested as, for example, discontinuity of the regression result (a difference between physical quantities corresponding to two times is not reduced even if a difference between the two times is reduced and a difference between states corresponding to the two times is reduced).
The present disclosure has been made in view of the above problem, and an example object thereof is to provide a technique capable of stably obtaining a correct physical quantity in a technique of converting a latent variable output from a world model into a physical quantity.
An information processing apparatus according to an example aspect of the present disclosure includes a relationship derivation unit for deriving, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
An information processing apparatus according to a first example aspect of the present disclosure includes a model generation unit for generating a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output, wherein the model generation unit generates the regression model in such a manner that a correspondence between the input and the output of the regression model best approximates a correspondence between a pair of latent variables output from the world model and a relationship between the physical quantities acquired from a data set.
An information processing method according to a second example aspect of the present disclosure includes deriving, by a processor, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
An information processing method according to a third example aspect of the present disclosure includes generating, by a processor, a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output, wherein the processor generates the regression model in such a manner that a correspondence between the input and the output of the regression model best approximates a correspondence between a pair of latent variables output from the world model and a relationship between the physical quantities acquired from a data set.
An information processing program according to a fourth example aspect of the present disclosure causes a processor to execute a relationship derivation process of deriving, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
An information processing program according to a fifth example aspect of the present disclosure causes a processor to execute a model generation process of generating a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output, wherein, in the model generation process, the processor generates the regression model in such a manner that a correspondence between the input and the output of the regression model best approximates a correspondence between a pair of latent variables output from the world model and a relationship between the physical quantities acquired from a data set.
According to an example aspect of the present disclosure, there is an exemplary effect that the technique capable of stably obtaining the correct physical quantity can be provided in the technique of converting the latent variable output from the world model into the physical quantity.
The above and other aspects, features and advantages of the present disclosure will become more apparent from the following description of certain exemplary embodiments when taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram illustrating a configuration of an information processing apparatus according to the present disclosure;
FIG. 2 is a flowchart illustrating a flow of an information processing method according to the present disclosure;
FIG. 3 is a block diagram illustrating a configuration of an information processing apparatus according to the present disclosure;
FIG. 4 is a flowchart illustrating a flow of an information processing method according to the present disclosure;
FIG. 5 is a block diagram illustrating a configuration of an information processing apparatus according to the present disclosure;
FIG. 6 is a flowchart illustrating a flow of an information processing method according to the present disclosure;
FIG. 7 is a block diagram illustrating a configuration of an information processing apparatus according to the present disclosure;
FIG. 8 is a flowchart illustrating a flow of an information processing method according to the present disclosure;
FIG. 9 is a block diagram illustrating a configuration of an information processing apparatus according to the present disclosure;
FIG. 10 is a flowchart illustrating a flow of an information processing method according to the present disclosure;
FIG. 11 is a block diagram illustrating a configuration of a computer that functions as the information processing apparatus according to the present disclosure; and
FIG. 12 is a schematic view illustrating operation of a world model used by the information processing apparatus according to the present disclosure.
Hereinafter, embodiments will be exemplified. However, the present disclosure is not limited to example embodiments described below, and various alterations can be made within the scope described in the claims. For example, embodiments obtained by appropriately combining techniques (some or all of things or methods) adopted in the following example embodiments can also be included in the scope of the present disclosure. In addition, embodiments obtained by appropriately omitting some of the techniques adopted in the following example embodiments can also be included in the scope of the present disclosure. In addition, effects mentioned in the following example embodiments are examples of effects expected in the example embodiments, and do not define the extension of the present disclosure. That is, embodiments that do not achieve the effects mentioned in the following example embodiments can also be included in the scope of the present disclosure.
In the example embodiments described below, a world model M0 is used. Therefore, the world model M0 will be described with reference to FIG. 12 before describing the example embodiments. FIG. 12 is a schematic diagram illustrating the operation of the world model M0.
The world model is a model that is constructed by machine learning using limited information of the outside world (real world or virtual world) and approximates a structure of the outside world. In the present disclosure, a world model M0 constructed by object-centric representation learning is considered. In the object-centric representation learning, the world model M0 representing a state of an object Sj as a latent variable is constructed using a moving image I including objects S1, S2, . . . , and Sm as subjects. Here, m is any natural number equal to or more than 1. The moving image I includes an image I1 corresponding to a time t1, an image I2 corresponding to a time t2 (t1<t2), . . . , and an image In corresponding to a time tn (tn−1<tn). Here, n is any natural number equal to or more than 2. Each image Ii is a still image and is also called a frame or a frame image. Here, i is each a natural number of equal to or more than 1 and equal to or less than n. In a case where the world model M0 is a model that approximates a structure of the real world, the moving image I may be, for example, a live-action video representing the real world. In addition, in a case where the world model M0 is a model that approximates a structure of a virtual world, the moving image I can be, for example, a computer graphics (CG) video representing the virtual world. In the world model M0 constructed by the object-centric representation learning, regarding each of the objects Sj included in the world model M0, a state (a physical quantity such as a position or a pose) at each time ti is represented (encoded) as a latent variable Zi(j). Hereinafter, a set of the objects S1, S2, . . . , and Sm included as the subjects in the moving image I, that is, a set of the objects S1, S2, . . . , and Sm included in the world model M0 is also referred to as an “object group {S1, S2, . . . , Sm}”.
The world model M0 derives a latent variable Zi(j), which is a latent variable related to an object Sj and corresponding to a time ti, from an image Ii corresponding to the time t1 and a latent variable Zi−1(j), which is a latent variable related to the object Sj and corresponding to a time ti−1, in an inference period. Here, j is any natural number of equal to or more than 1 and equal to or less than m. The world model M0 can perform the above inference on the respective objects Sj in parallel. In this case, an output of the world model M0 at each time ti included in the inference period is a set {Zi(j)|j=1, 2, . . . , m} of a latent variable Zi(1) related to the object S1, a latent variables Zi(2) related to the object S2, . . . , and a latent variable Zi(m) related to the object Sm.
In addition, the world model M0 derives a latent variable Zi(j), which is a latent variable related to an object Sj and corresponding to a time ti, from an image Ii corresponding to the time ti and a latent variable Zi−1(j), which is a latent variable related to the object Sj and corresponding to a time ti−1, in a prediction period following the inference period. The world model M0 can perform the above prediction on the respective objects Sj in parallel. In this case, an output of the world model M0 at each time ti included in the prediction period is a set {Zi(j)|j=1, 2, . . . , m} of the latent variable Zi(1) related to the object S1, the latent variables Zi(2) related to the object S2, . . . , and the latent variable Zi(m) related to the object Sm.
Note that examples of the known world model include ViMON, OP3, G-SWM, and GATSBI. However, the world model M0 that can be used in the example embodiments described below is not limited thereto. Any world model can be used in each of the example embodiments described below as long as the world model represents a state of an object as a latent variable.
As an example without limiting the present disclosure, the world model M0 can be utilized for, for example, (1) inference and prediction of motion of an object in a virtual space in a computer game, (2) inference and prediction of motion of an object in a real space in a physical simulation, (3) inference and prediction of an object (for example, an obstacle) in a real space in automated driving or control of a mobile body (an automobile, a ship, an aircraft, or the like), (4) inference and prediction of an object (for example, a workpiece) in a real space in control of a robot arm, and the like.
A first example embodiment, which is an example of an embodiment, will be described in detail with reference to the drawings. The present example embodiment is a basic form of each example embodiment described below. Note that an application range of each technique adopted in the present example embodiment is not limited to the present example embodiment. That is, each technique adopted in the present example embodiment can also be adopted in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs. In addition, each technique illustrated in the drawings referred to for describing the present example embodiment can also be employed in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs.
A configuration of an information processing apparatus 1 will be described with reference to FIG. 1. FIG. 1 is a block diagram illustrating the configuration of the information processing apparatus 1.
As illustrated in FIG. 1, the information processing apparatus 1 includes a relationship derivation unit 11.
The relationship derivation unit 11 is a means for deriving, from a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ (ti<ti′) output from the world model M0, a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Note that the two times ti and ti′ may be two adjacent (continuous) times, such as a time t1 and a time t2, or may be two non-adjacent (non-continuous) times, such as the time t1 and a time t3.
The relationship derivation unit 11 uses a regression model M1 to derive the relationship ρi,i′(j) from the pair (Zi(j), Zi′(j)). An input of the regression model M1 is the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) output from the world model M0. An output of the regression model M1 is the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j). Note that the pair (Zi(j), Zi′(j)) of two latent variables Zi(j) and Zi′(j) may be input to the regression model M1 as one variable obtained by connecting these two latent variables Zi(j) and Zi′(j). For example, in a case where each of the two latent variables Zi(j) and Zi′(j) is represented by a three-dimensional vector, the pair (Zi(j), Zi′(j)) of two latent variables Zi(j) and Zi′(j) may be input to the regression model M1 as a six-dimensional vector obtained by connecting the two three-dimensional vectors.
As an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a position rj(ti) of the object Sj at that time ti. In this case, the relationship ρi,i′(j) may be a displacement from the position rj(ti) of the object Sj at the time ti to a position rj(ti) of the object Sj at the time ti′. In this case, the position rj(ti), the position rj(ti), and the relationship ρi,i′(j) are each represented by a three-dimensional vector, and a relationship of rj(ti′)=ρi,i′(j)+rj(ti) holds among these vectors.
In addition, as an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a posture qj(ti) of the object Sj at that time ti.
In this case, the relationship ρi,i′(j) may be a pose change from the pose qj(ti) of the object Sj at the time ti to a pose qj(ti) of the object Sj at the time ti′. In this case, the pose qj(ti), the pose qj(ti), and the relationship ρi,i′(j) are each represented by a quaternion, and a relationship of qj(ti′)=ρi,i′(j)*qj(ti) holds among these quaternions.
In addition, as an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a combination of the position rj(ti) and the pose qj(ti) of the object Sj at the time ti. In this case, the relationship ρi,i′(j) can be a combination of the displacement from the position rj(ti) of the object Sj at the time t1 to the position rj(ti) of the object Sj at the time ti′ and the pose change from the pose qj(ti) of the object Sj at the time ti to the pose qj(ti) of the object Sj at the time ti′.
Note that, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) may be used as the input of the regression model M1. In this case, the relationship derivation unit 11 derives the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) from the difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) using the regression model M1.
In addition, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi′(j)−Zi(j) may be used as the input of the regression model M1. In this case, the relationship derivation unit 11 derives the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) from the set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi(j)−Zi(j) using the regression model M1.
A flow of an information processing method S1 will be described with reference to FIG. 2. FIG. 2 is a flowchart illustrating the flow of the information processing method S1.
As illustrated in FIG. 2, the information processing method S1 includes a relationship derivation process S11. Note that the information processing method S1 is executed by the information processing apparatus 1 or a computer, for example.
The relationship derivation process S11 is a process for deriving a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ (ti<ti′) output from the world model M0, a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Here, the two times ti and ti′ may be two continuous times, such as a time t1 and a time t2, or may be two non-continuous times, such as the time t1 and a time t3. Note that the relationship derivation process S11 is executed, for example, by the relationship derivation unit 11 of the information processing apparatus 1 or by a processor of the computer.
In the relationship derivation process S11, the regression model M1 is used to derive the relationship ρi,i′(j) from the pair (Zi(j), Zi′(j)). An input of the regression model M1 is the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) output from the world model M0. An output of the regression model M1 is the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j). Note that the pair (Zi(j), Zi′(j)) of two latent variables Zi(j) and Zi′(j) may be input to the regression model M1 as one variable obtained by connecting these two latent variables Zi(j) and Zi′(j). For example, in a case where each of the two latent variables Zi(j) and Zi′(j) is represented by a three-dimensional vector, the pair (Zi(j), Zi′(j)) of two latent variables Zi(j) and Zi′(j) may be input to the regression model M1 as a six-dimensional vector obtained by connecting the two three-dimensional vectors.
Note that, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) may be used as the input of the regression model M1. In this case, in the relationship derivation process S11, the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) is derived from the difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) using the regression model M1.
In addition, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j) and the difference Zi′(j)−Zi(j) may be used as the input of the regression model M1. In this case, in the relationship derivation process S11, the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) is derived from the set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi′(j)−Zi(j) using the regression model M1.
Note that, as an example without limiting the present disclosure, FIG. 2 illustrates a flow of the information processing method S1 implemented in a case where i′=i+1, that is, in a case where a relationship ρi,i+1(j) between physical quantities oi(j) and oi+1(j) corresponding to two times t1 and ti+1 adjacent to each other is derived in the relationship derivation process S11. In this case, the relationship derivation process S11 is repeated n−1 times, and relationships ρ1,2(j), ρ2,3(j), . . . , and ρn−1,n(j) are sequentially derived.
Note that a time interval between the time ti and the time ti′ is preferably short (fine) in order to enhance the stability of the regression. This is because, as the time interval between the time ti and the time ti′ becomes longer, the uniqueness of the regression decreases, for example, it becomes more difficult to distinguish between pose changes of an object having the rotational symmetry (for example, distinguish between two rotations of no rotation and 180° rotation of the object having the rotational symmetry). In this regard, a configuration for deriving the relationship ρi,i+1(j) between the physical quantities oi(j) and oi+1(j) corresponding to the two adjacent times ti and ti+1 is the best mode.
In the information processing apparatus 1 and the information processing method S1, the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) corresponding to the different times ti and ti′ is adopted instead of a configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti. Therefore, regression that does not depend on a coordinate system representing a physical quantity is possible. Therefore, the instability of a regression result can be reduced. That is, the relationship ρi,i′(j) between correct physical quantities oi(j), oi′(j) corresponding to the times ti and ti′ can be stably obtained.
In addition, in the configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti, there may be a problem that the operation of a non-learning object (an object not included as a subject in an image used for learning of the regression model) cannot be guaranteed because an object coordinate system is not defined. On the other hand, such a problem hardly occurs in the information processing apparatus 1 and the information processing method S1 since the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the different times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) at the times ti and ti′ is adopted.
As an example without limiting the present disclosure, the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) obtained by the information processing apparatus 1 can be utilized for, for example, (1) inference and prediction of motion of an object in a virtual space in a computer game, (2) inference and prediction of motion of an object in a real space in a physical simulation, (3) inference and prediction of an object (for example, an obstacle) in a real space in automated driving or control of a mobile body (an automobile, a ship, an aircraft, or the like), (4) inference and prediction of an object (for example, a workpiece) in a real space in control of a robot arm, and the like. In the case of application to the automated driving of a mobile body, it is possible to predict a displacement and a pose change of an obstacle (a person, an animal, another mobile body, or the like) using the information processing apparatus 1 and to automatically drive the mobile body such that the mobile body does not collide with the obstacle. In addition, in the case of application to the control of a robot arm, it is possible to predict a displacement and a pose change of a workpiece using the information processing apparatus 1 and to control the robot arm such that a hand provided at a distal end of the robot arm reaches the workpiece.
A second example embodiment, which is an example of an embodiment, will be described in detail with reference to the drawings. Components having the same functions as the components described in the above-described example embodiments are denoted by the same reference signs, and the description thereof will be appropriately omitted. Note that an application range of each technique adopted in the present example embodiment is not limited to the present example embodiment. That is, each technique adopted in the present example embodiment can also be adopted in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs. In addition, each technique illustrated in each of the drawings referred to for describing the present example embodiment can be employed in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs.
Next, a configuration of an information processing apparatus 1A will be described with reference to FIG. 3. FIG. 3 is a block diagram illustrating the configuration of the information processing apparatus 1A.
As illustrated in FIG. 3, the information processing apparatus 1A is obtained by adding a latent variable derivation unit 12 to the information processing apparatus 1 (see the first example embodiment).
The latent variable derivation unit 12 is a means for deriving latent variables Zi(j) and Zi(j) corresponding to two different times ti and ti′ using the world model M0 for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. In a case where a time ti is included in an inference period, the latent variable derivation unit 12 obtains a latent variable Zi(j) corresponding to the time ti by inputting an image Ii corresponding to the time ti and a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. On the other hand, in a case where a time ti is included in a prediction period, the latent variable derivation unit 12 obtains a latent variable Zi(j) corresponding to the time ti by inputting a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the prediction period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained.
The relationship derivation unit 11 of the information processing apparatus 1A derives a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two different times ti and ti′ from a pair (Zi(j), Zi′(j)) of the latent variables Zi(j) and Zi′(j) corresponding to the two times ti and ti′ derived by the latent variable derivation unit 12.
A flow of an information processing method SIA will be described with reference to FIG. 4. FIG. 4 is a flowchart illustrating the flow of the information processing method S1A.
As illustrated in FIG. 4, the information processing method S1A is obtained by adding a latent variable derivation process S12 to the information processing method S1 (see the first example embodiment). Note that the information processing method SIA is executed by the information processing apparatus 1A or a computer, for example.
The latent variable derivation process S12 is executed before the relationship derivation process S11 as illustrated in FIG. 4.
The latent variable derivation process S12 is a process for deriving latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ using the world model M0 for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. In a case where a time ti is included in an inference period, in the latent variable derivation process S12, a latent variable Zi(j) corresponding to the time ti is obtained by inputting an image Ii corresponding to the time ti and a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. On the other hand, in a case where a time ti is included in a prediction period, in the latent variable derivation process S12, a latent variable Zi(j) corresponding to the time ti is obtained by inputting a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. Note that the latent variable derivation process S12 is executed, for example, by the latent variable derivation unit 12 of the information processing apparatus 1A or by a processor of the computer.
In the relationship derivation process S11 of the information processing method S1A, a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two different times ti and ti′ is derived from a pair (Zi(j), Zi′(j)) of the latent variables Zi(j) and Zi′(j) corresponding to the two times ti and ti′ derived in the latent variable derivation process S12.
Note that, as an example without limiting the present disclosure, FIG. 4 illustrates a flow of the information processing method SA implemented in a case where i′=i+1, that is, in a case where a relationship ρi,i+1(j) between physical quantities oi(j) and oi+1(j) corresponding to two adjacent times ti and ti+1 is derived in the relationship derivation process S11. In this case, the relationship derivation process S11 is repeated n−1 times, and relationships ρ1,2(j), ρ2,3(j), . . . , and ρn−1,n(j) are sequentially derived.
In the information processing apparatus 1A and the information processing method SIA, the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) corresponding to the different times ti and ti′ is adopted instead of a configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti. Therefore, regression that does not depend on a coordinate system representing a physical quantity is possible. Therefore, the instability of a regression result can be reduced. That is, the relationship ρi,i′(j) between correct physical quantities oi(j), oi′(j) corresponding to the different times ti and ti′ can be stably obtained.
In addition, in the configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti, there may be a problem that the operation of a non-learning object (an object not included as a subject in an image used for learning of the regression model) cannot be guaranteed because an object coordinate system is not defined. On the other hand, such a problem hardly occurs in the information processing apparatus 1 and the information processing method S1 since the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the different times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) at the times ti and ti′ is adopted.
A third example embodiment, which is an example of an embodiment, will be described in detail with reference to the drawings. Components having the same functions as the components described in the above-described example embodiments are denoted by the same reference signs, and the description thereof will be appropriately omitted. Note that an application range of each technique adopted in the present example embodiment is not limited to the present example embodiment. That is, each technique adopted in the present example embodiment can also be adopted in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs. In addition, each technique illustrated in each of the drawings referred to for describing the present example embodiment can be employed in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs.
A configuration of an information processing apparatus 1B will be described with reference to FIG. 5. FIG. 5 is a block diagram illustrating the configuration of the information processing apparatus 1B.
As illustrated in FIG. 5, the information processing apparatus 1B is obtained by adding a physical quantity calculation unit 13, an initial physical quantity acquisition unit 14, and a coordinate conversion unit 15 to the information processing apparatus 1 (see the first example embodiment). Note that the information processing apparatus 1B may further include the latent variable derivation unit 12, similarly to the information processing apparatus 1A (see the second example embodiment).
The physical quantity calculation unit 13 is a means for calculating a physical quantity oi′(j) corresponding to a time tj′ from a physical quantity oi(j) corresponding to a time tj calculated by the physical quantity calculation unit 13 and a relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the two times tj and tj′ derived by the relationship derivation unit 11 for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Both the physical quantity oi(j) corresponding to the time ti referred to by the physical quantity calculation unit 13 and the physical quantity oi′(j) corresponding to the time ti calculated by the physical quantity calculation unit 13 are relative physical quantities with respect to a physical quantity o1(j) corresponding to the initial time t1.
As an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a position rj(ti) of the object Sj at that time ti. In this case, the relationship ρi,i′(j) may be a displacement from the position rj(ti) of the object Sj at the time ti to a position rj(ti′) of the object Sj at the time ti′. In this case, the position rj(ti), the position rj(ti′), and the relationship ρi,i′(j) are each represented by a three-dimensional vector. In this case, the physical quantity calculation unit 13 sets a position rj(t1) to 0 (vector), calculates a position rj(t2) according to rj(t2)=ρ1,2(j)+rj(t1), calculates a position rj(t3) according to rj(t3)=ρ2,3(j)+rj(t2), . . . , and calculates the position rj(ti′) according to rj(ti′)=ρi′-1,i′(j)+rj(ti′−1). Therefore, the position rj(ti′)=ρi′−1,i′(j)+ρi′−2,i′−1(j)+ . . . +ρ1,2(j) at each time ti′ calculated by the physical quantity calculation unit 13 is a relative position with respect to the position rj(t1) at the initial time t1.
In addition, as an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a pose qj (ti) of the object Sj at that time ti. In this case, the relationship ρi,i′(j) may be a pose change from the pose qj (ti) of the object Sj at the time ti to a pose qj (ti′) of the object Sj at the time ti′. In this case, the pose qj (ti), the pose qj(ti′), and the relationship ρi,i′(j) are each represented by a quaternion. In this case, the physical quantity calculation unit 13 sets a pose qj(t1) to 1 (quaternion), calculates a pose qj(t2) according to qj(t2)=ρ1,2(j)*qj(t1), calculates a pose qj(t3) according to qj (t3)=ρ2,3(j)*qj(t2), . . . , and calculates the pose qj (ti′) according to qj(ti′)=ρi′−1,i′(j)*qi′−1 (t1). Therefore, the pose qj(ti′)=ρi′−1,i′(j)*ρi′−2,i′−1(j)* . . . *ρ1,2(j) at each time to calculated by the physical quantity calculation unit 13 is a relative pose with respect to the pose qj(t1) at the initial time t1.
The initial physical quantity acquisition unit 14 is a means for acquiring a physical quantity O1(j) corresponding to the initial time t1 for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm}. The physical quantity O1(j) corresponding to the initial time t1 acquired by the initial physical quantity acquisition unit 14 is an absolute physical quantity.
As an example without limiting the disclosure, in a case where the physical quantity oi(j) corresponding to each time ti is the position rj(ti) of the object Sj at the time t1, the initial physical quantity acquisition unit 14 acquires an absolute position Rj (t1) (vector) of the object Sj at the initial time t1.
In addition, as an example without limiting the disclosure, in a case where the physical quantity oi(j) corresponding to each time ti is the pose qj(ti) of the object Sj at the time ti, the initial physical quantity acquisition unit 14 acquires an absolute pose Qj (t1) (quaternion) of the object Sj at the initial time t1.
The coordinate conversion unit 15 is a means for converting the relative physical quantity oi′(j) corresponding to the time tj′ calculated by the physical quantity calculation unit 13 into the absolute physical quantity Oi′(j) corresponding to the time tj′ by using the absolute physical quantity O1(j) corresponding to the initial time t1 acquired by the initial physical quantity acquisition unit 14 for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm}.
As an example without limiting the disclosure, in a case where the physical quantity oi(j) corresponding to each time ti is the position rj(ti) of the object Sj at the time ti, the coordinate conversion unit 15 converts a relative position ri′(j) at the time tj′ calculated by the physical quantity calculation unit 13 into an absolute position Ri′(j)=R1(j)+ri′(j) at the time tj′ by using an absolute position R1(j) at the initial time t1 acquired by the initial physical quantity acquisition unit 14.
In addition, as an example without limiting the disclosure, in a case where the physical quantity oi(j) corresponding to each time ti is the pose qj(ti) of the object Sj at the time ti, the coordinate conversion unit 15 converts a relative pose qi′(j) corresponding to the time tj′ calculated by the physical quantity calculation unit 13 into an absolute pose Qi′(j)=Q1(j)*qi′(j) at the time tj′ by using an absolute pose Q1(j) at the initial time t1 acquired by the initial physical quantity acquisition unit 14.
Note that the information processing apparatus 1B may be configured to (1) exclusively output the relative physical quantity oi(j) corresponding to each time ti, (2) exclusively output the absolute physical quantity Oi(j) corresponding to each time ti, (3) output both the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti, or (4) output a physical quantity selected by a user out of the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti.
In the case of exclusively outputting the relative physical quantity oi(j) corresponding to each time ti, the initial physical quantity acquisition unit 14 and the coordinate conversion unit 15 can be omitted from the information processing apparatus 1B. In addition, in the case of outputting the physical quantity selected by the user out of the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti, it is preferable to add, to the information processing apparatus 1B, a switching means for switching a physical quantity to be output to the physical quantity selected by the user.
A flow of an information processing method S1B will be described with reference to FIG. 6. FIG. 6 is a flowchart illustrating the flow of the information processing method S1B.
As illustrated in FIG. 6, the information processing method S1B is obtained by adding a physical quantity calculation process S13, an initial physical quantity acquisition process S14, and a coordinate conversion process S15 to the information processing method S1 (see the first example embodiment). The information processing method S1B may further include the latent variable derivation process S12, similarly to the information processing method SIA (see the second example embodiment). Note that the information processing method S1B is executed by the information processing apparatus 1B or a computer, for example.
As illustrated in FIG. 6, the physical quantity calculation process S13, the initial physical quantity acquisition process S14, and the coordinate conversion process S15 are executed after the relationship derivation process S11. However, the physical quantity calculation process S13 and the initial physical quantity acquisition process S14 are executed in any order. That is, the initial physical quantity acquisition process S14 may be executed after the physical quantity calculation process S13 is executed, or the physical quantity calculation process S13 may be executed after the initial physical quantity acquisition process S14 is executed. Alternatively, the physical quantity calculation process S13 and the initial physical quantity acquisition process S14 may be executed in parallel.
The physical quantity calculation process S13 is a process for calculating a physical quantity oi′(j) corresponding to a time tj′ from a physical quantity oi(j) corresponding to a time tj calculated in the physical quantity calculation process S13 of the previous cycle and a relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the two times tj and tj′ derived in the relationship derivation process S11 of the current cycle for each object Sj included in an object group {S1, S2, . . . , Sm} or a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Both the physical quantity oi(j) corresponding to the time ti referred to in the physical quantity calculation process S13 and the physical quantity oi′(j) corresponding to the time ti′ calculated in the physical quantity calculation process S13 are relative physical quantities with respect to a physical quantity o1(j) corresponding to the initial time t1. Note that the physical quantity calculation process S13 is executed, for example, by the physical quantity calculation unit 13 of the information processing apparatus 1B or by a processor of the computer.
The initial physical quantity acquisition process S14 is a process for acquiring a physical quantity O1(j) corresponding to the initial time t1 for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm}. The physical quantity O1(j) corresponding to the initial time t1 acquired in the initial physical quantity acquisition process S14 is an absolute physical quantity. Note that the initial physical quantity acquisition process S14 is executed, for example, by the initial physical quantity acquisition unit 14 of the information processing apparatus 1B or by the processor of the computer.
The coordinate conversion process S15 is a process for converting the relative physical quantity oi′(j) corresponding to the time tj′ calculated in the physical quantity calculation process S13 into the absolute physical quantity Oi(j) corresponding to the time tj′ using the absolute physical quantity O1(j) corresponding to the initial time t1 acquired in the initial physical quantity acquisition process S14 for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm}. Note that the coordinate conversion process S15 is executed, for example, by the coordinate conversion unit 15 of the information processing apparatus 1B or by the processor of the computer.
Note that, as an example without limiting the present disclosure, FIG. 6 illustrates a flow of the information processing method S1B implemented in a case where i′=i+1, that is, in a case where a relationship ρi,i+1(j) between physical quantities oi(j) and oi+1(j) corresponding to two adjacent times ti and ti+1 is derived in the relationship derivation process S11. In this case, the relationship derivation process S11 is repeated n−1 times, and an absolute physical quantity O2(j), an absolute physical quantities O3(j), . . . , and an absolute physical quantity On(j) are sequentially calculated.
Note that the information processing method SIB may be configured to (1) exclusively output the relative physical quantity oi(j) corresponding to each time ti, (2) exclusively output the absolute physical quantity Oi(j) corresponding to each time ti, (3) output both the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti, or (4) output a physical quantity selected by a user out of the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti.
In the case of exclusively outputting only the relative physical quantity oi(j) corresponding to each time ti, the initial physical quantity acquisition process S14 and the coordinate conversion process S15 can be omitted from the information processing method S1B. In addition, in the case of outputting the physical quantity selected by the user out of the relative physical quantity oi(j) and the absolute physical quantity Oi(j) corresponding to each time ti, it is preferable to add, to the information processing method S1B, a switching process of switching a physical quantity to be output to the physical quantity selected by the user.
In the information processing apparatus 1B and the information processing method S1B, the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) corresponding to the different times ti and ti′ is adopted instead of a configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti. Therefore, regression that does not depend on a coordinate system representing a physical quantity is possible. Therefore, the instability of a regression result can be reduced. As a result, correct physical quantities oi′(j) and Oi′(j) corresponding to the time ti′ can be stably obtained.
In addition, in the configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti, there may be a problem that the operation of a non-learning object (an object not included as a subject in an image used for learning of the regression model) cannot be guaranteed because an object coordinate system is not defined. On the other hand, such a problem hardly occurs in the information processing apparatus 1B and the information processing method S1B since the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the different times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) at the times ti and ti′ is adopted.
As an example without limiting the present disclosure, the physical quantities oi′(j) and Oi′(j) obtained by the information processing apparatus 1B can be utilized for, for example, (1) inference and prediction of motion of an object in a virtual space in a computer game, (2) inference and prediction of motion of an object in a real space in a physical simulation, (3) inference and prediction of an object (for example, an obstacle) in a real space in automated driving or control of a mobile body (an automobile, a ship, an aircraft, or the like), (4) inference and prediction of an object (for example, a workpiece) in a real space in control of a robot arm, and the like. In the case of application to the automated driving of a mobile body, it is possible to predict a position and a pose of an obstacle (a person, an animal, another mobile body, or the like) using the information processing apparatus 1 and to automatically drive the mobile body such that the mobile body does not collide with the obstacle. In addition, in the case of application to the control of a robot arm, it is possible to predict a position and a pose of a workpiece using the information processing apparatus 1 and to control the robot arm such that a hand provided at a distal end of the robot arm reaches the workpiece.
A fourth example embodiment, which is an example of an embodiment, will be described in detail with reference to the drawings. The present example embodiment is a basic form of each example embodiment described below. Note that an application range of each technique adopted in the present example embodiment is not limited to the present example embodiment. That is, each technique adopted in the present example embodiment can also be adopted in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs. In addition, each technique illustrated in the drawings referred to for describing the present example embodiment can also be employed in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs.
A configuration of an information processing apparatus 2 will be described with reference to FIG. 7. FIG. 7 is a block diagram illustrating the configuration of the information processing apparatus 2.
As illustrated in FIG. 7, the information processing apparatus 2 includes a model generation unit 21.
The model generation unit 21 is a means for generating the regression model M1 for deriving, from a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ (ti<ti′) output from the world model M0, a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Note that the two times ti and ti′ may be two adjacent (continuous) times, such as a time t1 and a time t2, or may be two non-adjacent (non-continuous) times, such as the time t1 and a time t3.
In order to generate the regression model M1, the model generation unit 21 uses a data set DS including the physical quantity oi(j) corresponding to each time ti for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm}. The model generation unit 21 trains the regression model M1 such that a correspondence between input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ acquired from the world model M0 and a relationship ρi,i′(j) of physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ acquired from the data set DS.
Note that a physical quantity corresponding to each time ti constituting the data set DS may be a relative physical quantity oi(j) or an absolute physical quantity Oi(j). This is because a relationship between absolute physical quantities Oi(j) and Oi′(j) corresponding to the two times ti and ti′ coincides with the relationship ρi,i′(j) between the relative physical quantities oi(j) and oi′(j) corresponding to the two times t1 and ti′.
In order to enhance the stability of the regression, it is preferable to generate the regression model M1 in which the correspondence between input and output satisfies the following condition (1). In addition, in order to enhance the stability of the regression and further reduce an accumulation error, it is preferable to generate the regression model M1 in which the correspondence between the input and output satisfies the following condition (2) in addition to the following condition (1).
Condition (1): The correspondence of input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi+1(j)) of latent variables Zi(j) and Zi+1(j) corresponding to two adjacent times ti and ti+1 acquired from the world model M0 and a relationship ρi,i+1(j) between physical quantities oi(j) and oi+1(j) corresponding to the two times ti and ti+1 acquired from the data set DS.
Condition (2): The correspondence of input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi+1(j) of latent variables Zi(j) and Zi+k(j) corresponding to two non-adjacent times ti and ti+k acquired from the world model M0 and a relationship ρi,i+k(j) between physical quantities oi(j) and oi+k(j) corresponding to the two times ti, ti+k acquired from the data set DS. Here, k is a natural number equal to or more than 2.
As an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a position rj(ti) of the object Sj at that time ti. In this case, the relationship ρi,i′(j) may be a displacement from the position rj(ti) of the object Sj at the time ti to a position rj(ti′) of the object Sj at the time ti′. In this case, the position rj(ti), the position rj(ti′), and the relationship ρi,i′(j) are each represented by a three-dimensional vector, and a relationship of rj(ti′)=ρi,i′(j)+rj(ti) holds among these vectors. Therefore, the model generation unit 21 can calculate the relationship ρi,i′(j) according to, for example, ρi,i′(j)=rj(ti′)−rj(ti).
In addition, as an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a pose qj(ti) of the object Sj at that time ti. In this case, the relationship ρi,i′(j) may be a pose change from the pose qj (ti) of the object Sj at the time ti to a pose qj (ti′) of the object Sj at the time ti′. In this case, the pose qj(ti), the pose qj(ti′), and the relationship ρi,i′(j) are each represented by a quaternion, and a relationship of qj(ti′)=ρi,i′(j)*qj(ti) holds among these quaternions. Therefore, the model generation unit 21 can calculate the relationship ρi,i′(j) according to, for example, ρi,i′(j)=qj (ti′)/qj(ti).
In addition, as an example without limiting the disclosure, the physical quantity oi(j) corresponding to each time ti may be a combination of the position rj(ti) and the pose qj(ti) of the object Sj at the time ti. In this case, the relationship ρi,i′(j) can be a combination of the displacement from the position rj(ti) of the object Sj at the time ti to the position rj(ti′) of the object Sj at the time ti′ and the pose change from the pose qj(ti) of the object Sj at the time t1 to the pose qj (ti′) of the object Sj at the time ti′.
Note that, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) may be used as the input of the regression model M1. In this case, the model generation unit 21 generates the regression model M1 for deriving the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) from the difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j).
In addition, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j) and the difference Zi′(j)−Zi(j) may be used as the input of the regression model M1. In this case, the model generation unit 21 generates the regression model M1 for deriving the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) from the set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi′(j)−Zi(j).
A flow of an information processing method S2 will be described with reference to FIG. 8. FIG. 8 is a flowchart illustrating the flow of the information processing method S2.
As illustrated in FIG. 8, the information processing method S2 includes a model generation process S21. Note that the information processing method S2 is executed by the information processing apparatus 2 or a computer, for example.
The model generation process S21 is a process for generating the regression model M1 for deriving, from a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ (ti<ti′) output from the world model M0, a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. Note that the two times ti and ti′ may be two adjacent (continuous) times, such as a time t1 and a time t2, or may be two non-adjacent (non-continuous) times, such as the time t1 and a time t3. Note that the model generation process S21 is executed, for example, by the model generation unit 21 of the information processing apparatus 2 or by a processor of the computer.
In the model generation process S21, the data set DS including the physical quantity oi(j) corresponding to each time ti is used for each object Sj included in the object group {S1, S2, . . . , Sm} or for the specific object Sj selected from the object group {S1, S2, . . . , Sm} in order to generate the regression model M1. In the model generation process S21, the regression model M1 is trained such that a correspondence between input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ acquired from the world model M0 and a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ acquired from the data set DS.
Note that a physical quantity corresponding to each time ti constituting the data set DS may be a relative physical quantity oi(j) or an absolute physical quantity Oi(j). This is because a relationship between absolute physical quantities Oi(j) and Oi′(j) corresponding to the two times ti and ti′ coincides with the relationship ρi,i′(j) between the relative physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′.
In addition, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) may be used as the input of the regression model M1. In this case, the regression model M1 for deriving the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) from the difference Zi′(j)−Zi(j) between the latent variables Zi(j) and Zi′(j) is generated in the model generation process S21.
In addition, instead of using the pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) as the input of the regression model M1, a set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi′(j)−Zi(j) may be used as the input of the regression model M1. In this case, in the model generation process S21, the regression model M1 for deriving the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) is generated from the set (Zi(j), Zi′(j), Zi′(j)−Zi(j)) of the latent variable Zi(j), the latent variable Zi′(j), and the difference Zi′(j)−Zi(j).
In the information processing apparatus 2 and the information processing method S2, instead of the configuration of generating the regression model that regresses the physical quantity oi(j) corresponding to the time ti from the latent variable Zi(j) corresponding to the time ti, the configuration of generating the regression model M1 that regresses the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the times ti and ti′ from the latent variables Zi(j) and Zi′(j) for the different times ti and ti′ is adopted. Therefore, it is possible to generate the regression model M1 that does not depend on a coordinate system representing a physical quantity. Therefore, it is possible to reduce the instability of a regression result in a case where the regression model is used.
In addition, in the configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti, there may be a problem that the operation of a non-learning object (an object not included as a subject in an image used for learning of the regression model) cannot be guaranteed because an object coordinate system is not defined. On the other hand, such a problem hardly occurs in the information processing apparatus 2 and the information processing method S2 since the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the different times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) at the times ti and ti′ is adopted.
A fifth example embodiment, which is an example of an embodiment, will be described in detail with reference to the drawings. Components having the same functions as the components described in the above-described example embodiments are denoted by the same reference signs, and the description thereof will be appropriately omitted. Note that an application range of each technique adopted in the present example embodiment is not limited to the present example embodiment. That is, each technique adopted in the present example embodiment can also be adopted in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs. In addition, each technique illustrated in each of the drawings referred to for describing the present example embodiment can be employed in the other example embodiments included in the present disclosure within a range in which no particular technical problem occurs.
Next, a configuration of an information processing apparatus 2A will be described with reference to FIG. 9. FIG. 9 is a block diagram illustrating the configuration of the information processing apparatus 2A.
As illustrated in FIG. 9, the information processing apparatus 2A is obtained by adding a latent variable derivation unit 22 to the information processing apparatus 2 (see the fourth example embodiment).
The latent variable derivation unit 22 is a means for deriving latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ using the world model M0 for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. In a case where a time ti is included in an inference period, the latent variable derivation unit 12 obtains a latent variable Zi(j) corresponding to the time ti by inputting an image Ii corresponding to the time ti and a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. On the other hand, in a case where a time ti is included in a prediction period, the latent variable derivation unit 12 obtains a latent variable Zi(j) corresponding to the time ti by inputting a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti is included in the prediction period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained.
The model generation unit 21 of the information processing apparatus 2A trains the regression model M1 such that a correspondence between input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ acquired from the latent variable derivation unit 22 and a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ acquired from the data set DS.
A flow of an information processing method S2A will be described with reference to FIG. 10. FIG. 10 is a flowchart illustrating the flow of the information processing method S2A.
As illustrated in FIG. 10, the information processing method S2A is obtained by adding a latent variable derivation process S22 to the information processing method S2 (see the fourth example embodiment). Note that the information processing method S2A is executed by, for example, the information processing apparatus 2A or a computer.
As illustrated in FIG. 10, the latent variable derivation process S22 is executed before the model generation process S21.
The latent variable derivation process S22 is a process for deriving latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ using the world model M0 for each object Sj included in an object group {S1, S2, . . . , Sm} or for a specific object Sj selected from the object group {S1, S2, . . . , Sm}. In a case where a time ti is included in an inference period, in the latent variable derivation process S12, a latent variable Zi(j) corresponding to the time ti is obtained by inputting an image Ii corresponding to the time t1 and a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. On the other hand, in a case where a time ti is included in a prediction period, in the latent variable derivation process S12, a latent variable Zi(j) corresponding to the time ti is obtained by inputting a latent variable Zi−1(j) corresponding to a time ti−1 to the world model M0. In addition, in a case where a time ti′ is included in the inference period, a latent variable Zi′(j) corresponding to the time ti′ is similarly obtained. Note that the latent variable derivation process S22 is executed, for example, by the latent variable derivation unit 22 of the information processing apparatus 2A or by a processor of the computer.
In the model generation process S21 of the information processing method S2A, the regression model M1 is trained such that a correspondence between input and output of the regression model M1 best approximates a correspondence between a pair (Zi(j), Zi′(j)) of latent variables Zi(j) and Zi′(j) corresponding to two different times ti and ti′ acquired from the latent variable derivation unit 22 and a relationship ρi,i′(j) between physical quantities oi(j) and oi′(j) corresponding to the two times ti and ti′ acquired from the data set DS.
In the information processing apparatus 2A and the information processing method S2A, instead of the configuration of generating the regression model that regresses the physical quantity oi(j) corresponding to the time ti from the latent variable Zi(j) corresponding to the time ti, the configuration of generating the regression model M1 that regresses the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the times ti and ti′ from the latent variables Zi(j) and Zi′(j) for the different times ti and ti′ is adopted. Therefore, it is possible to generate the regression model M1 that does not depend on a coordinate system representing a physical quantity. Therefore, it is possible to reduce the instability of a regression result in a case where the regression model is used.
In addition, in the configuration in which the physical quantity oi(j) corresponding to the time ti is regressed from the latent variable Zi(j) corresponding to the time ti, there may be a problem that the operation of a non-learning object (an object not included as a subject in an image used for learning of the regression model) cannot be guaranteed because an object coordinate system is not defined. On the other hand, such a problem hardly occurs in the information processing apparatus 2A and the information processing method S2A since the configuration in which the relationship ρi,i′(j) between the physical quantities oi(j) and oi′(j) corresponding to the different times ti and ti′ is regressed from the latent variables Zi(j) and Zi′(j) at the times ti and ti′ is adopted.
Some or all of the functions of the information processing apparatuses 1, 1A, 1B, 2, and 2A (hereinafter, also referred to as “each of the above apparatuses”) may be implemented by hardware such as an integrated circuit (IC chip) or may be implemented by software.
In the latter case, each of the above apparatuses is implemented by, for example, a computer that executes a command of a program which is software for implementing each function. An example of such a computer (hereinafter, referred to as a computer C) is illustrated in FIG. 11. FIG. 11 is a block diagram illustrating a hardware configuration of the computer C functioning as each of the above apparatuses.
The computer C includes at least one processor C1 and at least one memory C2. A program P for causing the computer C to operate as each of the above apparatuses is recorded in the memory C2. In the computer C, the processor C1 reads the program P from the memory C2 and executes the program P to implement each function of each of the above apparatuses.
As the processor C1, for example, a central processing unit (CPU), a graphic processing unit (GPU), a digital signal processor (DSP), a micro processing unit (MPU), a floating point number processing unit (FPU), a physics processing unit (PPU), a tensor processing unit (TPU), a quantum processor, a microcontroller, or a combination thereof can be used. As the memory C2, for example, a flash memory, a hard disk drive (HDD), a solid state drive (SSD), or a combination thereof can be used.
Note that the computer C may further include a random access memory (RAM) for developing the program P at the time of execution and temporarily storing various types of data. In addition, the computer C may further include a communication interface for transmitting and receiving data to and from other apparatuses. The computer C may further include an input/output interface for connecting input/output apparatuses such as a keyboard, a mouse, a display, and a printer.
In addition, the program P can be recorded on a non-transitory tangible recording medium M readable by the computer C. As such a recording medium M, for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The computer C can acquire the program P via such a recording medium M. In addition, the program P can be transmitted via a transmission medium. As such a transmission medium, for example, a communication network, a broadcast wave, or the like can be used. The computer C can also acquire the program P via such a transmission medium.
In addition, each of the above functions of each of the above apparatuses may be implemented by a single processor provided in a single computer, may be implemented by cooperation of a plurality of processors provided in a single computer, or may be implemented by cooperation of a plurality of processors provided in a plurality of computers, respectively. In addition, the program for causing each of the above apparatuses to implement each of the above functions may be stored in a single memory provided in a single computer, may be stored in a distributed manner in a plurality of memories provided in a single computer, or may be stored in a distributed manner in a plurality of memories provided in a plurality of computers, respectively.
The whole or part of the example embodiments disclosed above can be described as, but not limited to, the following supplementary notes.
An information processing apparatus including a relationship derivation means for deriving, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
The information processing apparatus according to Supplementary Note 1, further including a latent variable derivation means for deriving the latent variable using the world model,
The information processing apparatus according to Supplementary Note 1, further including a physical quantity calculation means for calculating the physical quantity corresponding to a later time out of the different times from the physical quantity corresponding to an earlier time out of the different times and the relationship between the physical quantities derived by the relationship derivation means.
The information processing apparatus according to Supplementary Note 3, wherein
The information processing apparatus according to Supplementary Note 3, wherein
The information processing apparatus according to any one of Supplementary Notes 3 to 5, further including a coordinate conversion means for referring to an absolute physical quantity corresponding to an initial time and converting the physical quantity at the later time out of the different times, the physical quantity being relative and calculated by the physical quantity calculation means, into the physical quantity that is absolute and corresponds to the later time out of the different times.
An information processing apparatus including a model generation means for generating a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output,
The information processing apparatus according to Supplementary Note 7, further including a latent variable derivation means for deriving the latent variables using the world model,
An information processing method including deriving, by a processor, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
An information processing method including generating, by a processor, a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output,
A non-transitory computer-readable medium storing an information processing program causing a processor to execute a relationship derivation process of deriving, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
A non-transitory computer-readable medium storing an information processing program for causing a processor to execute a model generation process of generating a regression model that uses, for an object included in a world model representing a state of the object as a latent variable, a pair of the latent variables corresponding to different times output from the world model as an input and uses a relationship between physical quantities respectively corresponding to the different times as an output,
An information processing apparatus including at least one processor,
Note that the information processing apparatus may further include a memory. In addition, the memory may store a program for causing the at least one processor to execute the process.
An information processing apparatus including at least one processor, wherein
Note that the information processing apparatus may further include a memory. In addition, the memory may store a program for causing the at least one processor to execute the process.
While the present disclosure has been particularly shown and described with reference to example embodiments thereof, the present disclosure is not limited to these example embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the claims. And each embodiment can be appropriately combined with at least one of embodiments. Each of the drawings or figures is merely an example to illustrate one or more example embodiments. Each figure may not be associated with only one particular example embodiment, but may be associated with one or more other example embodiments. As those of ordinary skill in the art will understand, various features or steps described with reference to any one of the figures can be combined with features or steps illustrated in one or more other figures, for example to produce example embodiments that are not explicitly illustrated or described. Not all of the features or steps illustrated in any one of the figures to describe an example embodiment are necessarily essential, and some features or steps may be omitted. The order of the steps described in any of the figures may be changed as appropriate.
1. An information processing apparatus comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions to derive, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
2. The information processing apparatus according to claim 1, wherein the processor is further configured to execute the instructions to:
derive the latent variable using the world model, and
derive the relationship between the physical quantities from a pair of the derived latent variables.
3. The information processing apparatus according to claim 1, wherein the processor is further configured to execute the instructions to calculate the physical quantity corresponding to a later time out of the different times from the physical quantity corresponding to an earlier time out of the different times and the derived relationship between the physical quantities.
4. The information processing apparatus according to claim 3, wherein
the physical quantities are positions of the object,
the processor is further configured to execute the instructions to:
derive a vector representing a displacement from the position of the object at the earlier time out of the different times to the position of the object at the later time out of the different times; and
add a vector representing the position of the object at the earlier time out of the different times and the derived vector representing the displacement to calculate a vector representing the position of the object at the later time out of the different times.
5. The information processing apparatus according to claim 3, wherein
the physical quantities are poses of the object,
the processor is further configured to execute the instructions to:
derive a quaternion representing a pose change from the pose of the object at the earlier time out of the different times to the pose of the object at the later time out of the different times; and
multiply a quaternion representing the pose of the object at the earlier time out of the different times by the derived quaternion representing the pose change to calculate a quaternion representing the pose of the object at the later time out of the different times.
6. The information processing apparatus according to claim 3, wherein the processor is further configured to execute the instructions to:
refer to an absolute physical quantity corresponding to an initial time; and
convert the calculated physical quantity at the later time out of the different times, the physical quantity being relative, into the physical quantity that is absolute and corresponds to the later time out of the different times.
7. An information processing method comprising deriving, by a processor, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.
8. A non-transitory computer-readable medium storing an information processing program causing a processor to execute a relationship derivation process of deriving, for an object included in a world model representing a state of the object as a latent variable, a relationship between physical quantities from a pair of the latent variables using a regression model having the pair of latent variables corresponding to different times output from the world model as an input and having the relationship between the physical quantities respectively corresponding to the different times as an output.