🔗 Permalink

Patent application title:

GENERATING A STRATIGRAPHIC TRAINING LIBRARY

Publication number:

US20260103972A1

Publication date:

2026-04-16

Application number:

19/357,432

Filed date:

2025-10-14

Smart Summary: A new method helps create a library of images of tiny fossils called calcareous nannofossils. It starts by collecting many images from different geological sources from a specific time period. Then, it picks a smaller group of images that includes examples from both areas rich in oil and those that aren't. This selected group is used to build a training library. The goal is to help machine learning models accurately identify and classify these fossils in various geological environments. 🚀 TL;DR

Abstract:

A method for curating a machine learning training library of calcareous nannofossil images for automated biostratigraphic analysis. The method includes receiving a plurality of images of various source materials from a target geologic time period. A subset of images is selected, comprising images from both a hydrocarbon-rich region and a hydrocarbon-poor region. A training library is generated based on this subset, enabling the development of machine learning models capable of accurate fossil identification and classification across diverse geological settings.

Inventors:

Emily Lillian Browning 1 🇺🇸 Houston, TX, United States
David Bord 1 🇺🇸 Spring, TX, United States
Ruairi Pierce Dunne 1 🇬🇧 London, United Kingdom
Attila Balazs 1 🇬🇧 Egham, United Kingdom

Assignee:

BP CORPORATION NORTH AMERICA INC. 403 🇺🇸 Houston, TX, United States

Applicant:

BP Corporation North America Inc. 🇺🇸 Houston, TX, United States

Interested in similar patents?

Get notified when new applications in this technology area are published.

Create Free Alert

Classification:

E21B44/00 » CPC main

Automatic control, surveying or testing

E21B44/00 » CPC main

Automatic control systems specially adapted for drilling operations, i.e. self-operating systems which function to carry out or modify a drilling operation without intervention of a human operator, e.g. computer-controlled drilling systems ; Systems specially adapted for monitoring a plurality of drilling variables or conditions

E21B2200/20 » CPC further

Special features related to earth drilling for obtaining oil, gas or water Computer models or simulations, e.g. for reservoirs under production, drill bits

E21B2200/22 » CPC further

Special features related to earth drilling for obtaining oil, gas or water Fuzzy logic, artificial intelligence, neural networks or the like

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a non-provisional patent application which claims benefit of U.S. provisional patent application No. 63/707,435 filed Oct. 15, 2024, and entitled “Generating a Stratigraphic Training Library,” which is hereby incorporated herein by reference in its entirety for all purposes.

BACKGROUND

The field of biostratigraphy plays a role in hydrocarbon exploration and production, allowing for the determination of the relative ages of subsurface rock formations and their depositional environments. Calcareous nannofossils, microscopic fossils of marine algae, serve as biostratigraphic markers due to their distinct morphologies and evolutionary patterns.

Conventionally, biostratigraphic analysis has relied on manual microscopic examination and identification of these nannofossils by highly trained specialists. This manual process, while effective, is time-consuming, labor-intensive, and subject to inter-observer variability. The scarcity of experienced specialists further compounds these challenges, potentially creating bottlenecks in project timelines and impeding efficient decision-making.

Advancements in machine learning and computer vision have spurred efforts to automate aspects of biostratigraphic analysis. However, developing robust machine learning models for calcareous nannofossil identification has proven difficult. The three-dimensional (3D) nature of these fossils, coupled with variations in preservation and orientation, presents challenges for image recognition algorithms. Thus, conventional attempts have yielded limited accuracy, particularly at the species level, hindering their practical application in the oil and gas industry.

Another limitation in conventional approaches lies in the quality and diversity of training data. Studies have generally relied on relatively small and homogenous image sets, sometimes sourced from a single geographic region or geologic time period. Such reliance limits the models'ability to generalize and perform accurately across a broader range of samples and depositional environments.

SUMMARY

In an embodiment, a method includes receiving a plurality of images of a plurality of source materials, where the source materials are from a target geologic time period and include a set of nannofossil species. The method further includes selecting a subset of the plurality of images, where the subset comprises images of source material from a first target geographic region and a second target geographic region, where the first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold. Additionally, the method further includes generating a training library comprising the subset.

In an embodiment, an electronic device including one or more processors and a memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the electronic device to be configured to receive a plurality of images of a plurality of source materials, where the source materials are from a target geologic time period and include a set of nannofossil species. The electronic device is further configured to select a subset of the plurality of images, wherein the subset comprises images of source material from a first target geographic region and a second target geographic region, where the first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold. Additionally, the electronic device is configured to generate a training library comprising the subset.

In an embodiment, a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors of an electronic device, cause the electronic device to be configured to receive a plurality of images of a plurality of source materials, where the source materials are from a target geologic time period and include a set of nannofossil species. The electronic device is further configured to select a subset of the plurality of images, wherein the subset comprises images of source material from a first target geographic region and a second target geographic region, where the first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold. Additionally, the electronic device is configured to generate a training library comprising the subset.

Embodiments described herein include a combination of features and characteristics intended to address various shortcomings associated with certain prior devices, systems, and methods. The foregoing has outlined rather broadly the features and technical characteristics of the disclosed embodiments in order that the detailed description that follows may be better understood. The various characteristics and features described above, as well as others, will be readily apparent to those skilled in the art upon reading the following detailed description, and by referring to the accompanying drawings. It should be appreciated that the conception and the specific embodiments disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes as the disclosed embodiments. It should also be realized that such equivalent constructions do not depart from the spirit and scope of the principles disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

FIG. 1A is a schematic level diagram that illustrates examples of samples for an analysis system constructed and operating in accordance with an embodiment of the disclosure;

FIG. 1B is a block diagram for an analysis system for analyzing samples in accordance with an embodiment of the disclosure;

FIG. 1C is a block diagram for a computing device suitable for use in a analysis system for analyzing samples in accordance with an embodiment of the disclosure;

FIG. 2 is a visual representation of biostratigraphic zonations in accordance with an embodiment of the disclosure;

FIG. 3A is a set of images depicting microscopic specimen at various focal lengths in accordance with an embodiment of this disclosure;

FIG. 3B is a set of images depicting microscopic specimen in various morphological orientations in accordance with an embodiment of this disclosure;

FIG. 3C is a set of images depicting microscopic specimen under various light sources in accordance with an embodiment of this disclosure;

FIG. 4 is a flow diagram for a method for curating a machine learning training library for automated biostratigraphic analysis in accordance with an embodiment of the disclosure; and

FIG. 5 is a block diagram of a computer system configured to implement one or more embodiments described herein.

DETAILED DESCRIPTION

It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or yet to be developed. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

Thus, while several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled may be directly coupled or may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

The field of biostratigraphy, particularly when applied in the oil and gas industry, facilitates the determination of the age and depositional environment of subsurface formations. This information may help identify potential hydrocarbon reservoirs and improve drilling strategies. Biostratigraphy conventionally relies on manual analysis (e.g., by highly trained specialists) of microfossils, such as calcareous nannofossils. Such manual analysis is labor-intensive, time-consuming, and prone to subjective interpretation.

Calcareous nannofossils are microscopic, calcium carbonate platelets produced by marine algae. Their distinct morphologies and evolutionary patterns make them useful biostratigraphic markers. The identification and classification of nannofossils generally require extensive expertise and experience. Specialists must be able to recognize subtle differences in morphology, sometimes in varying preservation states and orientations.

Accurate biostratigraphic analysis is useful to understand subsurface geology and improving exploration and production activities. The manual nature of conventional biostratigraphic analysis presents several challenges. The limited availability of highly skilled specialists may create workflow bottlenecks and delay decision-making. Subjectivity in nannofossil identification may lead to inconsistencies in results and uncertainty in stratigraphic interpretations. Such subjectivity may also have implications for drilling decisions and resource estimates. Further, the time required for manual analysis may be a cost driver in exploration and production activities. Thus, accelerating biostratigraphic workflows may lead to financial savings, as well as improved decisions related to hydrocarbon exploration, drilling, and/or production.

Machine learning and/or computer vision techniques may be applied to at least partially automate biostratigraphic analysis. However, even applying machine learning and/or computer vision techniques may have limitations due to the complexity of nannofossil identification and the lack of quality training data. Publicly available studies have achieved some success in genus-level identification, but species-level classification remains challenging. This is at least partly due to the subtle differences between species and the variability in nannofossil preservation. Additionally, previous attempts to apply machine learning to calcareous nannofossil analysis have been limited by the quality and quantity of training data. These studies generally used small image sets from limited geographic locations, leading to models with insufficient accuracy for industrial applications.

Embodiments of the present disclosure address these challenges by providing a two-step machine-learning approach for the automated detection and classification of calcareous nannofossils, which utilizes a method for curating a stratigraphic training library. Training libraries generally include datasets containing a plurality of data. Machine learning and computer vision models are trained on datasets. To improve upon such models, the quantity of the data in the datasets, as well as curating the datasets that are used to train the models, may be of importance.

In some examples, a stratigraphic training library may include numerous (e.g., thousands, tens of thousands, or more) images of calcareous nannofossils. Training datasets may include images that represent a range of species, preservation states, orientations, and light sources. Curating the stratigraphic training library may include decisions about which data to include in the stratigraphic training library and labeling the data with information to enhance model training and performance. Curating the stratigraphic training library may also include an expert-driven process of selecting, collecting, and organizing nannofossil images into the stratigraphic training library for the machine learning model.

Curating the stratigraphic training library may rely on multiple approaches. A hydrocarbon exploration strategy may be directed to a target geographic region (e.g., the Gulf of Mexico (GoM)) and a target geologic time period (e.g., Miocene), and thus training images may be chosen from the geographic region and/or geologic time period. Focusing on a particular geographic region and/or geologic time period may also result in improved access to relevant well samples and existing expertise. Additionally, calcareous nannofossils from certain geologic time periods may be relatively widespread across multiple geographic regions, and thus focusing on those geologic time period(s) may result in more universally applicable training data. For example, Miocene calcareous nannofossils are widespread globally, making developed models potentially applicable to other regions beyond the GoM, increasing their potential impact and return on investment. Thus, well data and core samples from the GoM Miocene may provide an ample source of training data that is useful for building accurate machine-learning models.

In some examples, curating the stratigraphic training library may involve receiving a plurality of images of a plurality of source materials. The source materials may be from a target geologic time period and include a set of nannofossil species. A subset of images of source material from a first target geographic region and a second target geographic region may be selected. The first target geographic region may be one where hydrocarbon amount is greater than a particular threshold, while the second target geographic region may be one where hydrocarbon amount is below the same or another threshold. In this manner, the first target geographic region may be one of interest to hydrocarbon exploration, while the second target geographic region is of less interest to hydrocarbon exploration. The stratigraphic training library may be generated based on the subset images. The inclusion of target geographic region-specific well samples may allow the models to be trained on specimens representative of the local environment, improving their applicability to the target geographic region.

In some examples, curating the stratigraphic training library may involve combining additional non-region-specific source materials. For example, the stratigraphic training library may incorporate images from sources including well-preserved deep-sea (e.g., International Ocean Drilling Program (IODP)) cores and outcrop samples, as well as target geographic region-specific (e.g., less well-preserved GoM) and entity-specific (e.g., from a particular company) well samples. Well-preserved deep-sea cores included in the source materials may differ from the target geographic region or target geologic time period of interest. This may allow the models to handle variability in nannofossil preservations encountered in real-world scenarios. The use of deep-sea cores, rich in diverse and well-preserved nannofossils, may increase the efficiency of data collection and labeling.

In some examples, curating the stratigraphic training library may involve selecting a sufficiently large number of species for training and establishing a minimum number of specimens per species to achieve a desired model accuracy. The selection of a sufficiently large number (e.g., greater than 100) species for training may be based on the biostratigraphic significance and prevalence of the species in the target geographic region and geologic time period (e.g., GoM Miocene), allowing for the models to be focused on relevant markers. Biostratigraphic significance refers to the usefulness of a fossil or fossil assemblage in determining the relative age and correlating rock strata. Fossils with high biostratigraphic significance generally possess several characteristics, for example, abundance, distinct morphology, short stratigraphic range, and widespread geographic distribution.

Establishing a minimum number of specimens per species may also allow sufficient training data for each species, enhancing model accuracy and confidence in predictions. For example, training with a selected group of species, rather than all species, may provide acceptable results for a model to achieve a target accuracy. The accuracy of the models may be generally correlated to the number of specimen images for each species available in the training data. A graphical user interface (GUI) may be used to display accuracy of the models for each species and guide future training efforts, allowing for continuous improvement of the models.

In some examples, curating the stratigraphic training library may involve including images of marker species in the training library. Marker species are species that possess characteristics that make them especially useful for biostratigraphic correlation and age dating. Including images of marker species may involve partial age filtering for a more specific dataset that aligns with the stratigraphic ranges under analysis. Age filtering may aid in species identification. For example, a pre-processing step may filter the species list based on their age ranges to reduce misclassifications and refine model detections. Models may be trained to detect species across a particular geological record (e.g., Neogene). However, these species may have varying occurrences ranging from long-spanning to shorter, intermittent ranges. Since not all species used for training are present in every sample, it may be useful to mitigate misclassifications caused by differences in their geological occurrences, which may be used to limit misclassifications of species outside the sample age. This may be especially noticeable with look-alike species of near similar appearance but from different geologic ages. This filtering process may involve predefining an “age window” that estimates the age range of the samples being analyzed and pruning a list of species for the machine learning models to match to a more manageable subset that aligns with the stratigraphic range of the samples under analysis. A selection of specific marker species may be exempted from the filtering process to improve their detection. This filtering approach may enhance the accuracy of species identification and facilitate more accurate stratigraphic interpretations.

In some examples, curating the stratigraphic training library may involve grouping similar-looking species (aka, “look-alike” species) that share broad morphologies to allow for a more manageable dataset. The three-dimensional (3D) nature of nannofossils, combined with their random arrangement on the microscope slide, may lead to a large volume of ambiguous images that lack the diagnostic features needed to accurately identify a species or to distinguish between multiple species. While deciphering the fossils in these images may be accurately matched at a genus level (e.g., based on the fossil outline), a decline in precision may occur when attempting to classify the fossil at a species level. Some species within a genus may share similarities in their overall shape, leading to “noise” in the model results because the models may assign such ambiguous images across several species. Broader species categories (aka “bins”) for specific groups prone to such issues may be used to help remedy this issue. For example, each “bin” may include 2 to 4 species that share broad morphologies, allowing the models to classify similar-looking species into fewer manageable and logical groups without compromising their biostratigraphic significance. This approach may increase positive high-confidence classifications made by the models while also reducing the time spent examining ambiguous images of similar-looking species.

In some examples, curating the stratigraphic training library may involve incorporating training images from multiple focal lengths to further account for the 3D nature of nannofossils, incorporating top and side views of the same species to capture a fuller morphological range, and selecting training images from various light sources to account for different viewing conditions. For example, training the models with images from high, middle, and low focus levels may allow the models to recognize nannofossils in their natural 3D form, overcoming the limitations of automated imaging systems with fixed focus. The inclusion of both top and side views of select species may enhance the morphological range of the models, improving the ability of the models to identify them in different orientations. Training with images under various light sources (e.g., both cross-polarized light (XPL) and plain light (PL)) may allow for the models to recognize nannofossils in the different lighting conditions used in standard biostratigraphic workflows.

In some examples, curating the stratigraphic training library may involve utilizing a biostratigraphic zonation framework as a blueprint for guiding data collection and labeling, improving the alignment of the model with industry practices. Biostratigraphic zonation frameworks (e.g., the BP Gulf of Mexico Neogene Astronomically Tuned Time Scale (BP GNATTS)) record the ages and/or relative stratigraphic order of select species origination and/or extinction either globally and/or regionally. Biostratigraphic zonation may be a post-processing step used to filter metadata, for example, attaching a non-overlapping time range to every fossil image and filtering out particular fossil age ranges. Conventional approaches have not made such use of biostratigraphic zonation frameworks as a foundation for building a machine learning tool. As an example, BP GNATTS, with its age resolution and standardization, has potential global applicability, suggesting that models trained on this framework may be deployed in other regions, expanding their impact beyond the GoM. If high-impact species are deemed necessary for replication, the biostratigraphic zonation framework (e.g., BP GNATTS) and calcareous nannofossil specialist based resolutions may be prioritized for training.

In some examples, training images may be at least partly chosen and labeled by a biostratigrapher and/or an expert data labeler (e.g., having experience in the target geographic region). Expert data labelers may provide oversight of the training data, such as to improve quality and increase consistency in labeling of training images. In some examples, the proposed approach to training may be designed to approximate the expertise of a mid-career nannofossil specialist and, more particularly, one who is familiar with the Gulf of Mexico (GoM) region, which has a relatively complex geological history. Selection of expert analysts with experience in the target geographic region (e.g., GoM) nannofossils and the entity's (e.g., BP's) fossil concepts may provide for higher quality and consistent labeling of the training data. The involvement of multiple labelers may provide for a diverse dataset, capturing a wider range of interpretations and reducing potential biases. Additionally, a phased labeling approach, targeting different scales of biostratigraphic application, may allow for iterative model training and refinement so that the models may be deployed to meet various needs. For example, a first labeling phase may have a global application, focusing on imaging a broad interval (e.g., the entire Miocene); a second labeling phase may have a regional application, focusing on specific stratigraphic intervals and targeting specific species; and a third labeling phase may have a reservoir-scale application, focusing on species associated with an entity's reservoir intervals.

In some examples, model training may be enhanced with automatic data collection and a GUI to accelerate data collection and model retraining. The implementation of automated data collection and a GUI may increase the efficiency of data collection and model retraining, allowing for quicker iterations and improvements. Such automated workflows may allow for the collection of larger amounts of data, further enhancing the accuracy and robustness of the models. The ability to retrain models with a high quantity (e.g., thousands) of images in a matter of a given time period (e.g., days or weeks) may facilitate continuous improvement and adaptation to new data. These and other embodiments are described more fully below, with reference made to the accompanying figures.

FIG. 1A illustrates, at a high level, the acquiring of samples 104 and the analysis of the samples according to principles disclosed herein. Embodiments of the present disclosure may be especially beneficial in analyzing samples from sub-surface formations that are important in the production of oil and gas. As such, FIG. 1A illustrates environments 100 from which samples 104 to be analyzed by analysis system 102 may be obtained, according to various implementations. In these illustrated examples, samples 104 may be obtained from terrestrial drilling system 106 or from marine (ocean, sea, lake, etc.) drilling system 108, either of which is utilized to extract resources such as hydrocarbons (oil, natural gas, etc.), water, and the like. The samples 104 may be cutting samples and/or core samples. Optimization or improvement of oil and gas production operations is largely influenced by the structure and material properties of the rock formations into which terrestrial drilling system 106 or marine drilling system 108 is drilling or has drilled in the past.

The manner in which samples 104 are obtained, and the physical form of those samples, may vary widely. Examples of samples 104 useful in connection with embodiments disclosed herein include whole core samples, side wall core samples, outcrop samples, drill cuttings, rock samples. As illustrated in FIG. 1A, the environment 100 includes analysis system 102 that is configured to analyze images 128 (FIG. 1B) of samples 104 in order to determine the material properties of the corresponding sub-surface rock.

FIG. 1B illustrates, in a generic fashion, the constituent components of the analysis system 102 that analyzes images 128. In a general sense, analysis system 102 includes imaging device 122 for obtaining two-dimensional (2D) or three-dimensional (3D) images, as well as other representations, of samples 104, such images and representations including details of the internal structure of the samples 104. The particular type, construction, or other attributes of imaging device 122 may correspond to that of any type of device capable of producing an image representative of the internal structure of sample 104. The imaging device 122 generates one or more images 128 of sample 104, and forwards those images 128 to a computing device 120. The images 128 produced by imaging device 122 may be generated from a plurality of two-dimensional (2D) sections of sample 104.

FIG. 1C generically illustrates the architecture of computing device 120 in analysis system 102 according to various embodiments. In this example architecture, computing device 120 includes one or more processors 152, which may be of varying core configurations and clock frequencies as available in the industry. The memory resources of computing device 120 for storing data and/or program instructions for execution by the one or more processors 152 include one or more memory devices 154 serving as a main memory during the operation of computing device 120, and one or more storage devices 160, for example realized as one or more of non-volatile solid-state memory, magnetic or optical disk drives, or random-access memory. One or more peripheral interfaces 156 are provided for coupling to corresponding peripheral devices such as displays, keyboards, mice, touchpads, touchscreens, printers, and the like. Network interfaces 158, which may be in the form of Ethernet adapters, wireless transceivers, serial network components, etc. are provided to facilitate communication between computing device 120 via one or more networks such as Ethernet, wireless Ethernet, Global System for Mobile Communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), Universal Mobile Telecommunications System (UMTS), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE), and the like. In this example architecture, processors 152 are shown as coupled to components 154, 156, 158, and 160 by way of a single bus; of course, a different interconnection architecture such as multiple, dedicated, buses and the like may be incorporated within computing device 120.

While illustrated as a single computing device, computing device 120 may include several computing devices cooperating together to provide the functionality of a computing device. Likewise, while illustrated as a physical device, computing device 120 may also represent abstract computing devices such as virtual machines and “cloud”computing devices.

As shown in the example implementation of FIG. 1C, the computing device 120 includes software programs 162 including one or more operating systems, one or more application programs, and the like. According to embodiments, software programs 162 include program instructions corresponding to testing tool 130 (FIG. 1B), implemented as a standalone application program, as a program module that is part of another application or program, as the appropriate plug-ins or other software components for accessing testing tool software on a remote computer networked with computing device 120 via network interfaces 158, or in other forms and combinations of the same.

The program memory storing the executable instructions of software programs 162 corresponding to the functions of testing tool 130 may physically reside within computing device 120 or at other computing resources accessible to computing device 120, i.e. within the local memory resources of memory devices 154 and storage devices 160, or within a server or other network-accessible memory resources, or distributed among multiple locations. In any case, this program memory constitutes a non-transitory computer-readable medium that stores executable computer program instructions, according to which the operations described in this specification are carried out by computing device 120, or by a server or other computer coupled to computing device 120 via network interfaces 158 (e.g., in the form of an interactive application upon input data communicated from computing device 120, for display or output by peripherals coupled to computing device 120). The computer-executable software instructions corresponding to software programs 162 associated with testing tool 130 may have originally been stored on a removable or other non-volatile computer-readable storage medium (e.g., a DVD disk, flash memory, or the like), or downloadable as encoded information on an electromagnetic carrier signal, in the form of a software package from which the computer-executable software instructions were installed by computing device 120 in the conventional manner for software installation. It is contemplated that those skilled in the art will be readily able to implement the storage and retrieval of the applicable data, program instructions, and other information useful in connection with this embodiment, in a suitable manner for each particular application, without undue experimentation.

The particular computer instructions constituting software programs 162 associated with testing tool 130 may be in the form of one or more executable programs, or in the form of source code or higher-level code from which one or more executable programs are derived, assembled, interpreted or compiled. Any of a number of computer languages or protocols may be used, depending on the manner in which the desired operations are to be carried out. For example, these computer instructions for creating the model according to embodiments may be written in a conventional high-level language such as PYTHON, JAVA, FORTRAN, or C++, either as a conventional linear computer program or arranged for execution in an object-oriented manner. These instructions may also be embedded within a higher-level application. In any case, it is contemplated that those skilled in the art having reference to this description will be readily able to realize, without undue experimentation, embodiments in a suitable manner for the desired installations.

FIG. 2 visually represents biostratigraphic zonations 200, specifically focused on the Miocene epoch within the Neogene period. Biostratigraphy utilizes the presence and distribution of fossil organisms to establish relative ages of rock layers and correlate them across different geographical regions. The image showcases a stratigraphic column 210, a graphical tool commonly used in biostratigraphy to depict the vertical succession of rock units and their associated fossil assemblages.

The stratigraphic column 210 is divided into three primary sections: Late Miocene, Middle Miocene, and Early Miocene. The Late Miocene section encompasses the upper portion of the Miocene epoch, marked by the presence of specific horizons such as M100, M88, M72, and so forth. The numerical designations generally represent distinct horizons, each characterized by a unique assemblage of fossil species. The Middle Miocene section represents the middle portion of the Miocene epoch, featuring horizons such as M54 and M48, as well as portions of M57 and M40. The lowermost part of the stratigraphic column 210 may correspond to the Early Miocene epoch, featuring horizons such as M20 and M5, as well as a portion of M40.

The vertical scale on the left side of the column indicates the age of the horizons and fossil events and represents the progression of geologic time, with younger units at the top and older units at the bottom.

Overall, FIG. 2 illustrates a stratigraphic framework for the Miocene epoch, highlighting certain biozones and their relative positions within the geologic timescale. Such a framework may be useful for biostratigraphic correlation and age dating of rock units, particularly in the context of hydrocarbon exploration and production, where accurate age determination aids in understanding the depositional history and potential of subsurface reservoirs.

In some cases, a visual representation of biostratigraphic zonations may be specifically tailored to a particular region as well, such as the Gulf of Mexico (GoM) Miocene epoch. Like the stratigraphic column 210, the geographically-tailored stratigraphic column may also be segmented into three distinct sections (e.g., Late Miocene, Middle Miocene, and Early Miocene), each representing a subdivision of the Miocene. In the example application to the GoM region, the depicted biozones and their associated fossil assemblages may have been calibrated and refined based on study of the GoM sedimentary record. Thus, the geographically-tailored biostratigraphic zonations may serve as an additional or alternate example illustrating the geological context and stratigraphic framework within which the embodiments described herein may operate.

FIG. 3A depicts images of microscopic specimens at various focal lengths. Images 302, 304, 306 illustrate the concept of capturing “z-stack” images of microscopic specimens, specifically highlighting the impact of focal depth on the visual representation of 3D structures. This technique may be employed to address challenges associated with accurately identifying and classifying calcareous nannofossils, which are inherently 3D objects. Traditional microscopic examination allows for manual adjustment of focus to observe different aspects of the structure of a specimen. However, automated imaging systems (e.g., Olympus VS200 slide scanner) may have limitations in dynamically adjusting focus during image capture.

Such limitations may be overcome by capturing multiple images of the same specimen or region of interest at varying focal depths. The technique generates a stack of images that collectively provide a more comprehensive representation of the 3D structure of the specimen. Examples of various focal depths may include high focus (as shown in image 302), middle focus (as shown in image 304), and low focus (as shown in image 306). As can be seen, opposing faces may look different due to variations in focal view. These variations highlight possible benefits of capturing multiple focal planes to safeguard that certain diagnostic features, which may be obscured at certain focal depths, are adequately represented in the training library for machine learning models.

By incorporating z-stack images, the training library may capture the inherent three-dimensionality of nannofossils, allowing for machine learning models to learn and recognize these organisms more accurately across a range of focal views. Such an approach may enhance the robustness and reliability of the automated biostratigraphic analysis, aiding in replicating or even surpassing the expertise of a human analyst.

FIG. 3B depicts images of microscopic specimens in various morphological orientations. Image 332 and image 334 are presented side-by-side. Image 332 is of a specimen that has been rotated 45 degrees from its original position, while image 334 is of the specimen in its initial, unrotated state. These images demonstrate the optical properties of calcite, the mineral composing nannofossils, by showcasing how the appearance of a specimen may change when viewed at different orientations under a microscope. As can be seen, the specimen in both image 332 and image 334 exhibit a distinct cross-like or star-like pattern, but the intensity and distribution of light within the pattern differ between the two orientations. Thus, differences between image 332 and image 334 demonstrate that the optical properties of calcite may cause variations in the appearance of a nannofossil depending on its orientation. Noting that the appearance of the specimen changes due to the interaction of light with the calcite crystal lattice at different angles underscores a possible benefit of including images of the same nannofossil species captured at different morphological orientations in the training library.

By incorporating images captured at multiple orientations, the training library may account for this inherent variability, enabling the machine learning models to recognize and classify nannofossils more accurately, even when they are not perfectly aligned or oriented on a microscope slide. In traditional microscopy, the analyst manually rotates the microscope stage to view multiple orientations. Stages cannot be rotated in the current automatic microscope imaging system, but training models in multiple orientations allows this constraint to be overcome, and the model is able to recognize specimens that have naturally fallen in either orientation. Such an approach may further enhance the robustness and reliability of the automated biostratigraphic analysis, aiding in replicating or even surpassing the expertise of a human analyst.

FIG. 3C depicts images of microscopic specimens under various light sources. Image 362 and image 364 are presented side-by-side, showcasing the same microscopic field of view under distinct lighting conditions. Image 362 is captured under cross-polarized light (XPL), while image 364 is captured under plain light (PL) (or bright field). The differences in image 362 and image 364 demonstrate the impact of different light sources on the microscopic visualization of calcareous nannofossils.

The microscopic specimen in image 362 is illuminated with XPL. This technique exploits the birefringent properties of calcite, the mineral constituting nannofossils. As a result, the nannofossils depicted may exhibit vibrant colors and intricate patterns, revealing their crystal structure and aiding in species identification. Notably, a prominent nannofossil in the center displays a striking, multicolored pattern.

The microscopic specimen in image 364 is captured under PL, also known as brightfield illumination. This provides a simpler, more direct view of the specimens, emphasizing their overall shape and size. However, the finer details and internal structures may be less apparent compared to the XPL image. The same nannofossils observed in the XPL image are visible, but their appearance is less distinct, lacking the vibrant colors and intricate patterns.

FIG. 3C illustrates the effect of employing multiple light sources in nannofossil identification. For example, certain species or diagnostic features may be more readily discernible under one lighting condition versus another. Therefore, including images captured under both XPL and PL in a training library may enhance a model trained on that library by improving its ability to recognize and classify nannofossils accurately in diverse real-world scenarios.

By incorporating images obtained under both XPL and PL, the training library may account for the variations in nannofossil appearance arising from different lighting conditions commonly used in biostratigraphic analysis. This approach may further enhance the robustness and reliability of the automated biostratigraphic analysis, aiding in replicating or even surpassing the expertise of a human analyst.

FIG. 4 is a flow diagram for a method 400 for curating a machine learning training library for automated biostratigraphic analysis, and its subsequent use in hydrocarbon exploration, according to an embodiment of the disclosure. The method 400 may streamline the process of identifying and classifying calcareous nannofossils. By leveraging a curated training library of nannofossil images, a machine learning model trained on the training library may analyze new images, aiding in the assessment of hydrocarbon potential and informing drilling decisions.

At step 410, the method 400 includes receiving a plurality of images of a plurality of source materials. The source materials are from a target geologic time period and include a set of nannofossil species. In the field of biostratigraphic analysis, the quality and diversity of the training data used to develop machine learning models may be of importance. The ability to accurately identify and classify calcareous nannofossils may depend on the exposure of the model to a comprehensive representation of these organisms'morphologies, preservation states, and contextual occurrences.

Conventionally, training data for machine learning models has been limited to images obtained from a single geographic region or geologic time period, reflecting the immediate needs of a particular project (e.g., in the geographic region). While the resultant model may provide satisfactory results for projects within the single geographic region, the model may be less applicable (or less accurate when applied) to a broader range of samples and depositional environments. For example, training a model on samples only from the Gulf of Mexico may result in that model improperly and/or inaccurately identifying nannofossils in samples from other ocean basins (e.g., Atlantic, Pacific, Mediterranean). In other words, the limited diversity of the training data may lead to models that are prone to misidentification and inaccurate classification, particularly when encountering nannofossils with atypical preservation or from unfamiliar settings.

The plurality of source materials may include samples from a certain region, deep-sea cores with preserved nannofossils, or outcrop research material. The inclusion of source materials from certain regions may provide the machine learning model with direct exposure to the types of nannofossils likely to be encountered in that area of interest. This may enhance the ability of the model to recognize and classify species relevant to the specific geological context of hydrocarbon exploration. Furthermore, the incorporation of deep-sea cores with well-preserved nannofossils may offer a source of high-quality training data. Deep-sea environments may provide enhanced conditions for nannofossil preservation, resulting in specimens with clearly discernible morphological features. This may allow machine learning models to learn the ideal characteristics of various species, facilitating accurate identification even when confronted with less well-preserved specimens from other environments. Additionally, the inclusion of outcrop research material may further broaden the diversity of the training library. Outcrops expose nannofossil-bearing rocks at the Earth's surface, offering a glimpse into a wider range of depositional environments and diagenetic histories. This exposure may help the machine learning models to generalize their learning and recognize nannofossils across various geological contexts.

Astronomical tuning, a technique that utilizes astronomical cycles recorded in sedimentary rocks, offers a way to establish high-resolution age models for stratigraphic sequences. Astronomically tuned cores may be sourced from the International Ocean Discovery Program (IODP), a global research collaboration dedicated to advancing scientific understanding of Earth through ocean drilling. Incorporating preserved astronomically tuned cores into the plurality of source materials used for generating the training library, typically obtained from deep-sea environments, may offer several advantages. Deep-sea sediments provide enhanced conditions for nannofossil preservation due to the reduced impact of erosion, weathering, and diagenetic alteration. These enhanced conditions generally result in specimens with well-defined morphological features, facilitating accurate identification and training of machine learning models. Additionally, astronomical tuning of these cores may provide a high-resolution age model, improving correlation of nannofossil occurrences with specific time intervals. This correlation may enhance the ability of the model to learn the stratigraphic ranges of different species and improve its accuracy in age determination. Further, the use of deep-sea cores from locations distinct from the target geographic region may introduce additional different elements to the training library. This broader representation of nannofossil diversity and preservation states may enhance the generalizability and robustness of the model, enabling it to perform effectively across a wider range of geological settings.

The complex morphology of calcareous nannofossils may be challenging to fully capture and interpret in a single two-dimensional image. Microscopic examination allows for manual adjustment of the focal plane, enabling the observer to visualize different aspects of the structure of the nannofossil in sharp focus. However, automated imaging systems, while offering advantages in terms of speed and efficiency, may have limitations in dynamically adjusting focus during image capture. This may result in images where certain features of the nannofossil are in focus while others are blurred, potentially hindering accurate identification and classification.

To address such challenges, multiple images of the same region, each captured at a different focal length, may be incorporated into the training library. This “z-stacking” approach may provide a more comprehensive representation of the three-dimensional structure of the nannofossil. By training machine learning models on a diverse set of images with varying focal depths, the models may learn and recognize a fuller range of morphological characteristics exhibited by nannofossils. This may enhance the ability of the models to accurately identify and classify these organisms, even when confronted with images where certain features may be out of focus due to the limitations of automated imaging systems.

Further, calcareous nannofossils may exhibit a diversity of shapes and structures. Some species may possess intricate three-dimensional morphologies, with features that may only be fully appreciated when viewed from specific angles or orientations. Microscopic examination may allow for manual rotation of the microscope stage, enabling a human observer to manipulate the specimen and observe it from various perspectives. This dynamic interaction may facilitate the identification of diagnostic features that may be obscured or less apparent in a single, static view. However, automated imaging systems generally lack the capability to rotate specimens during image capture. Such limitation may result in a training library composed primarily of images capturing nannofossils in a single, sometimes random, orientation. Consequently, machine learning models trained on such a library may struggle to recognize and classify species when presented with images exhibiting different morphological orientations.

To overcome such challenges, multiple images of the same nannofossil species, captured at different morphological orientations, may be incorporated into the training library. This approach may replicate the dynamic viewing experience of a human analyst, exposing the machine-learning models to a fuller range of morphological variations that a single species may exhibit. By training on a diverse set of images showcasing different orientations, the models may develop a more comprehensive understanding of the morphology of each species. This may enhance the ability of the model to accurately identify and classify nannofossils, even when confronted with images where the specimen may be tilted, rotated, or otherwise positioned in a non-ideal manner. In traditional microscopy, the analyst manually rotates the microscope stage to view multiple orientations. Stages cannot be rotated in the current automatic microscope imaging system, but training models in multiple orientations allows this constraint to be overcome, and the model is able to recognize specimens that have naturally fallen in either orientation.

Additionally, calcareous nannofossils are composed primarily of calcite, a mineral that exhibits unique optical properties. One such property is birefringence, the ability to split a beam of light into two rays with different refractive indices. This phenomenon may be exploited in microscopy by using cross-polarized light (XPL), which reveals intricate patterns and colors within the nannofossil structure, aiding in species identification. However, while XPL provides insights into the crystallographic properties of nannofossils, it may not always reveal the full spectrum of morphological characteristics necessary for accurate classification. Plain light (PL), also known as brightfield illumination, may offer a more straightforward view of the specimen, highlighting its overall shape, size, and surface features. Microscopic examination generally allows for relatively easy switching between XPL and PL, enabling a human observer to utilize the strengths of both lighting conditions for comprehensive analysis. However, automated imaging systems generally have limitations in dynamically adjusting light sources during image capture. This may result in a training library where certain species are predominantly represented under one lighting condition, potentially hindering the ability of the machine-learning model to recognize and classify them under different illumination scenarios.

To address such challenges, multiple images of the same region of interest (e.g., of a physical specimen), each captured under a different light source (XPL and PL), may be incorporated into the training library. This approach may replicate the flexibility of a human analyst, exposing the machine learning models to the diverse visual information obtainable under different lighting conditions. By training on images captured under both XPL and PL, the models develop a more comprehensive understanding of nannofossil morphology and optical properties. This may enhance the ability of the model to accurately identify and classify these organisms, even when presented with images acquired under different illumination settings commonly used in biostratigraphic workflows.

At step 420, the method 400 includes selecting a subset of the plurality of images. The subset includes images of source material from a first target geographic region and a second geographic region. The first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold. For example, the first target geographic region may be a region with hydrocarbons present and a second target geographic region may be a region without hydrocarbons present.

In the context of hydrocarbon exploration, a challenge is the variability in nannofossil preservation and abundance across different geological settings. Nannofossils found in some hydrocarbon-rich regions (e.g., containing an amount of hydrocarbons greater than a threshold), such as the Gulf of Mexico, may exhibit different characteristics compared to those from hydrocarbon-poor regions (e.g., containing an amount of hydrocarbons less than a threshold), like certain parts of the Caribbean or other deep-sea environments. Training a model solely on images from one type of environment may lead to biased results and poor performance when applied to samples from another region. Such challenges may be addressed through the deliberate selection of a subset of images that encompasses source material from both a first target geographic region, possibly characterized by high hydrocarbon potential, and a second target geographic region, with lower hydrocarbon potential.

By including images from both types of regions, the training library may capture a broader spectrum of nannofossil variability, including variations in preservation, abundance, and associated species assemblages. This may allow the machine learning model to develop a more comprehensive understanding of nannofossil characteristics and their relationship to hydrocarbon presence, possibly leading to improved accuracy and generalizability in biostratigraphic analysis.

The use of specific thresholds for hydrocarbon amounts in defining the first and second target geographic regions may add further refinement to the selection process. For example, the GoM is an oil-rich region that may be considered a region in which hydrocarbon amount is greater than a first threshold. Contrastingly, parts of the Caribbean may be considered oil-poor regions that may satisfy the presence of hydrocarbon amount less than a second threshold. Using thresholds may allow for the training data to include examples from environments with contrasting hydrocarbon potential, enhancing the ability of the model to discriminate between geological settings with varying resource prospects.

Relatedly, a distinction between the Caribbean and the GoM lies in the preservation and abundance of Miocene fossils. Caribbean sites generally exhibit relatively superior preservation and greater abundance of nannofossils due to favorable paleo-oceanographic conditions during the Miocene, which fostered a diverse and thriving ecosystem. The specific Caribbean site chosen for sample collection, while likely devoid of significant hydrocarbon resources, may be selected to access these well-preserved deep-sea cores through the IODP. The IODP primarily operates in areas with low hydrocarbon potential to minimize the risk of encountering oil and gas deposits during drilling.

The core samples obtained from the Caribbean, as opposed to cuttings typically collected from oil and gas wells, tend to exhibit relatively superior preservation and higher fossil abundance. In contrast, cuttings from oil and gas wells, particularly in the GoM, are usually derived from hydrocarbon-bearing formations generally dominated by sands, which are relatively less conducive to fossil preservation. Moreover, shale formations, where nannofossils are typically abundant, are less frequently cored in the GoM due to their lower economic value as reservoirs.

Therefore, the selection of Caribbean samples may serve dual purposes: geographic diversity and superior preservation and abundance. In terms of geographic diversity, such samples may aid in ensuring that the training library encompasses nannofossils from diverse geographic regions, capturing a wider range of species and morphological variations. In terms of superior preservation and abundance, the favorable preservation conditions and higher fossil abundance characteristic of Caribbean deep-sea cores, obtained through IODP drilling, may be leveraged to enhance the quality and representativeness of the training data. Such approaches to data collection, by incorporating both hydrocarbon-rich and hydrocarbon-poor regions with contrasting fossil preservation characteristics, may contribute to the development of more robust and adaptable machine learning models for automated biostratigraphic analysis.

Not all fossil species may be equally valuable for establishing relative ages of rock layers and correlating them across different geographic regions. Some species, referred to as “index fossils” or “marker species,” possess characteristics that make them particularly useful for biostratigraphic analysis. One characteristic may be short stratigraphic rage, where the species existed for a relatively brief period, allowing for precise age determination of the rocks in which they are found. Another characteristic may be widespread geographic distribution, where the species is found in multiple locations, enabling correlation of rock layers across different regions. Abundance may be another characteristic, where the species is relatively common and easily recognizable, making it easier to find and identify in different samples. Yet another characteristic may be distinct morphology, where the species has unique features that distinguish it from other species, allowing for accurate identification.

In the context of machine learning-based nannofossil identification, prioritizing species with high biostratigraphic significance in the training library may be useful for developing models that may accurately replicate the expertise of a human biostratigrapher. By focusing on these key taxa, the model may learn to recognize and classify primarily informative species for age determination and correlation, enhancing the accuracy and reliability of automated biostratigraphic analysis. These aspects may be addressed by specifying that the selection of the subset of images for the training library is based on the biostratigraphic significance of the nannofossil species represented in those images. This may ensure that the model is trained on relevant and informative species for biostratigraphic analysis rather than simply including a random or arbitrary collection of images. The selection process may involve consulting established biostratigraphic zonations, reviewing relevant literature, and leveraging the expertise of experienced biostratigraphers to identify the species that are most valuable for age dating and correlation within the target geographic region and geologic time period.

As discussed, “marker species” hold particular significance due to their unique characteristics and stratigraphic utility. These marker species possess distinct morphologies, limited stratigraphic ranges, and widespread geographic distributions, which may make them invaluable tools for correlating rock layers and establishing precise age determinations. In the context of machine learning-based nannofossil identification, marker species play an elevated role in anchoring the training library and enhancing the accuracy of biostratigraphic analysis. By prioritizing the inclusion of marker species in the training dataset, machine learning models may gain the ability to recognize and classify these taxa with increased confidence, enabling more precise age dating and correlation of subsurface formations.

To address the potential omission of marker species during the initial selection of training data, the subset may further include images of source materials including marker species. This may ensure that, even if marker species were inadvertently excluded during the initial selection process, they are actively sought out and incorporated into the training library.

This inclusion of marker species reinforces the ability of the model to replicate the expertise of a human biostratigrapher, who may prioritize the identification of these taxa for accurate stratigraphic interpretation. By ensuring that the model is trained on a representative sample of marker species, embodiments of this disclosure may enhance the capacity to generate reliable and precise biostratigraphic results, ultimately aiding in informed decision-making for hydrocarbon exploration and production activities.

Microscopic fossils may exhibit subtle morphological differences between species, and their three-dimensional nature, coupled with variations in preservation and orientation, may lead to ambiguous images that lack clear diagnostic features. Such ambiguity may be particularly pronounced in cases where multiple species within a genus share broad similarities in their overall shape or outline. Machine learning models, while capable of recognizing these general patterns, may struggle to confidently differentiate between such “look-alike” species at the species level. This may result in a decline in classification precision, introducing “noise” into the results as the model assigns ambiguous images across multiple closely related species. Such misclassifications may create challenges for users, who then have to manually review and validate the output of the model, undermining the efficiency gains promised by automation.

To overcome such challenges, the subset may further includes images of source materials including a grouped species. The grouped species may include a plurality of nannofossil species that have morphological similarities. Grouped species or “bins” may be seen as an approach to handling look-alike species within the training library. Instead of forcing the model to make definitive distinctions between species with subtle morphological differences, embodiments of this disclosure may allow for the classification of such species into broader, more manageable categories. For example, each “bin” may encapsulate a group of two to four species that share overarching morphological similarities. By classifying ambiguous images into these bins, the model may avoid misclassifications while still preserving the biostratigraphic significance of the data.

At step 430, the method 400 includes generating a training library including the subset. The generation of the training library may involve data organization, data annotation, and data augmentation.

The selected images may be organized and structured in a manner that facilitates efficient access and utilization by the machine learning algorithms. This may involve creating specific file formats, directories, or metadata structures to ensure compatibility with the chosen training framework. The images may be labeled or annotated by expert biostratigraphers, providing the information necessary for supervised learning. This may include identifying the nannofossil species present in each image, as well as any other relevant information such as preservation state, orientation, or imaging conditions. In some cases, the training library may be further enhanced through data augmentation techniques, such as image rotation, flipping, or cropping. This may increase the diversity of the training data and improves the ability of the model to generalize to new, unseen images.

At step 440, the method 400 includes training a machine learning model with the training library. The training process may involve feeding the model with labeled images from the training library, allowing it to iteratively adjust its internal parameters to reduce errors and improve its predictive capabilities. This iterative learning process may aid in transforming the model from a generalized to fine-tuned model and powerful tool for automated biostratigraphic analysis.

At step 450, the method 400 includes receiving an image of a subsurface nannofossil from a prospective region of interest. The process of receiving an image may mark the transition from model development and training to real-world application. It may signify the point at which the technology interacts with new data, demonstrating its ability to generalize its learning and perform effectively on samples collected from potentially hydrocarbon-bearing formations. Receiving an image of a subsurface nannofossil from a prospective region of interest emphasizes the practical application of the trained model in analyzing geological samples obtained from areas being considered for hydrocarbon exploration or production. The use of a subsurface nannofossil highlights that the image depicts a fossil may be embedded within rock layers beneath the Earth's surface, and likely accessed through drilling or coring operations. A prospective region of interest underscores the relevance of the analysis to hydrocarbon exploration, indicating that the image originates from an area with potential for resource discovery.

At step 460, the method 400 includes inputting the received image into the trained machine learning model. Applying the machine learning model to the received image may represents the culmination of the model development and training process, where the predictive capabilities of the model are put to the test in a real-world application. The application of the model may involve an interplay of algorithms and computational processes, designed to extract relevant features from the image, recognize patterns, and generate predictions about the nannofossil species present. The accuracy and reliability of these predictions may be directly influenced by the quality and diversity of the training library, as well as the sophistication of the machine learning algorithms employed.

At step 470, the method 400 includes assessing hydrocarbon prevalence in the prospective region of interest. The ultimate objective of hydrocarbon exploration may be to identify and assess subsurface formations with the potential to contain economically viable accumulations of oil and gas. Biostratigraphic analysis, through the identification and classification of fossils like calcareous nannofossils, provides insights into the age, depositional environment, and potential hydrocarbon source rocks of these formations. Assessing hydrocarbon prevalence in the prospective region of interest based on the biostratigraphic analysis highlights a direct link between the output (e.g., nannofossil identification and classification) of the model and a goal of hydrocarbon exploration.

At step 480, the method 400 includes performing a drilling operation based on the assessment. By performing a drilling operation based on the assessment of hydrocarbon prevalence, the model is applied and leveraged to improve the drilling process, which provides benefits to the field of oil and/or gas production and/or exploration. Performing a drilling operation may encompass a broad range of activities related to the drilling process, including: initiating a new well, modifying an existing well, or terminating a well. The scope of this disclosure is not limited to a particular drilling operation, unless explicitly specified in one or more of the claims.

Referring now to FIG. 5, a computer system 500 suitable for implementing one or more embodiments disclosed herein is shown. Any of the systems and methods disclosed herein can be carried out (e.g., entirely or partially) on a computer or other device comprising a processor (e.g., a desktop computer, a laptop computer, a tablet, a server, a smartphone, or some combination thereof). The computer system 500 includes a processor 502 (which may be referred to as a central processor unit or CPU) that is in communication with memory devices including secondary storage 504, read only memory (ROM) 506, random access memory (RAM) 508, input/output (I/O) devices 510, and network connectivity devices 512. The processor 502 may be implemented as one or more CPU chips.

It is understood that by programming and/or loading executable instructions onto the computer system 500, at least one of the CPUs 502, the RAM 508, and the ROM 506 are changed, transforming the computer system 500 in part into a particular machine or apparatus having the novel functionality taught by the present disclosure. Thus, the RAM 508 and/or the ROM 506 may comprise a non-transitory machine-readable (or computer-readable) medium that may include instructions (which may be referred to herein as machine-readable instructions) that are executable by CPU 502 to provide functionality to computer system 500. Thus, in some embodiments, a machine-readable instructions stored on a memory may be executed on a processor, so as to configured the processor to carry out some or all of the features of the methods described herein (e.g., method 500).

It is fundamental to the electrical engineering and software engineering arts that functionality that can be implemented by loading executable software into a computer can be converted to a hardware implementation by well-known design rules. Decisions between implementing a concept in software versus hardware typically hinge on considerations of stability of the design and numbers of units to be produced rather than any issues involved in translating from the software domain to the hardware domain. Generally, a design that is still subject to frequent change may be preferred to be implemented in software, because re-spinning a hardware implementation is more expensive than re-spinning a software design. Generally, a design that is stable that will be produced in large volume may be preferred to be implemented in hardware (for example in an application-specific integrated circuit (ASIC), or a field-programmable gate array (FPGA)) because for large production runs the hardware implementation may be less expensive than the software implementation. Sometimes a design may be developed and tested in a software form and later transformed, by well-known design rules, to an equivalent hardware implementation in an application specific integrated circuit that hardwires the instructions of the software. In the same manner as a machine controlled by a new ASIC is a particular machine or apparatus, likewise a computer that has been programmed and/or loaded with executable instructions may be viewed as a particular machine or apparatus.

Additionally, after the system 500 is turned on or booted, the CPU 502 may execute a computer program or application. For example, the CPU 502 may execute software or firmware stored in the ROM 506 or stored in the RAM 508. In some cases, on boot and/or when the application is initiated, the CPU 502 may copy the application or portions of the application from the secondary storage 504 to the RAM 508 or to memory space within the CPU 502 itself, and the CPU 502 may then execute instructions of which the application is comprised. In some cases, the CPU 502 may copy the application or portions of the application from memory accessed via the network connectivity devices 512 or via the I/O devices 510 to the RAM 508 or to memory space within the CPU 502, and the CPU 502 may then execute instructions of which the application is comprised. During execution, an application may load instructions into the CPU 502, for example load some of the instructions of the application into a cache of the CPU 502. In some contexts, an application that is executed may be said to configure the CPU 502 to do something, e.g., to configure the CPU 502 to perform the function or functions promoted by the subject application. When the CPU 502 is configured in this way by the application, the CPU 502 becomes a specific purpose computer or a specific purpose machine.

The secondary storage 504 is typically comprised of one or more disk drives or tape drives and is used for non-volatile storage of data and as an over-flow data storage device if RAM 508 is not large enough to hold all working data. Secondary storage 504 may be used to store programs which are loaded into RAM 508 when such programs are selected for execution. The ROM 506 is used to store instructions and perhaps data which are read during program execution. ROM 506 is a non-volatile memory device which typically has a small memory capacity relative to the larger memory capacity of secondary storage 504. The RAM 508 is used to store volatile data and perhaps to store instructions. Access to both ROM 506 and RAM 508 is typically faster than secondary storage 504. The secondary storage 504, the RAM 508, and/or the ROM 506 may be referred to in some contexts as computer readable storage media and/or non-transitory computer readable media.

I/O devices 510 may include printers, video monitors, electronic displays (e.g., liquid crystal displays (LCDs), plasma displays, organic light emitting diode displays (OLED), touch sensitive displays, etc.), keyboards, keypads, switches, dials, mice, track balls, voice recognizers, card readers, paper tape readers, or other well-known input devices.

The network connectivity devices 512 may take the form of modems, modem banks, Ethernet cards, Omni-Path Architecture (OPA), InfiniBand (IB), universal serial bus (USB) interface cards, serial interfaces, token ring cards, fiber distributed data interface (FDDI) cards, wireless local area network (WLAN) cards, radio transceiver cards that promote radio communications using protocols such as code division multiple access (CDMA), global system for mobile communications (GSM), long-term evolution (LTE), worldwide interoperability for microwave access (WiMAX), near field communications (NFC), radio frequency identity (RFID), and/or other air interface protocol radio transceiver cards, and other well-known network devices. These network connectivity devices 512 may enable the processor 502 to communicate with the Internet or one or more intranets. With such a network connection, it is contemplated that the processor 502 might receive information from the network, or might output information to the network (e.g., to an event database) in the course of performing the methods described herein. Such information, which is sometimes represented as a sequence of instructions to be executed using processor 502, may be received from and outputted to the network, for example, in the form of a computer data signal embodied in a carrier wave.

Such information, which may include data or instructions to be executed using processor 502 for example, may be received from and outputted to the network, for example, in the form of a computer data baseband signal or signal embodied in a carrier wave. The baseband signal or signal embedded in the carrier wave, or other types of signals currently used or hereafter developed, may be generated according to several known methods. The baseband signal and/or signal embedded in the carrier wave may be referred to in some contexts as a transitory signal.

The processor 502 executes instructions, codes, computer programs, scripts which it accesses from hard disk, floppy disk, optical disk, solid state drives (SSD) (these various disk-based systems may all be considered secondary storage 504), flash drive, ROM 506, RAM 508, or the network connectivity devices 512. While only one processor 502 is shown, multiple processors may be present. Thus, while instructions may be discussed as executed by a processor, the instructions may be executed simultaneously, serially, or otherwise executed by one or multiple processors. Instructions, codes, computer programs, scripts, and/or data that may be accessed from the secondary storage 504, for example, hard drives, floppy disks, optical disks, and/or other device, the ROM 506, and/or the RAM 508 may be referred to in some contexts as non-transitory instructions and/or non-transitory information.

In an embodiment, the computer system 500 may comprise two or more computers in communication with each other that collaborate to perform a task. For example, but not by way of limitation, an application may be partitioned in such a way as to permit concurrent and/or parallel processing of the instructions of the application. Alternatively, the data processed by the application may be partitioned in such a way as to permit concurrent and/or parallel processing of different portions of a data set by the two or more computers. In an embodiment, virtualization software may be employed by the computer system 500 to provide the functionality of a number of servers that is not directly bound to the number of computers in the computer system 500. For example, virtualization software may provide twenty virtual servers on four physical computers. In an embodiment, the functionality disclosed above may be provided by executing the application and/or applications in a cloud computing environment. Cloud computing may comprise providing computing services via a network connection using dynamically scalable computing resources. Cloud computing may be supported, at least in part, by virtualization software. A cloud computing environment may be established by an enterprise and/or may be hired on an as-needed basis from a third-party provider. Some cloud computing environments may comprise cloud computing resources owned and operated by the enterprise as well as cloud computing resources hired and/or leased from a third-party provider.

In an embodiment, some or all of the functionality disclosed above may be provided as a computer program product. The computer program product may comprise one or more computer readable storage medium having computer usable program code embodied therein to implement the functionality disclosed above. The computer program product may comprise data structures, executable instructions, and other computer usable program code. The computer program product may be embodied in removable computer storage media and/or non-removable computer storage media. The removable computer readable storage medium may comprise, without limitation, a paper tape, a magnetic tape, magnetic disk, an optical disk, a solid-state memory chip, for example analog magnetic tape, compact disk read only memory (CD-ROM) disks, floppy disks, jump drives, digital cards, multimedia cards, and others. The computer program product may be suitable for loading, by the computer system 500, at least portions of the contents of the computer program product to the secondary storage 504, to the ROM 506, to the RAM 508, and/or to other non-volatile memory and volatile memory of the computer system 500. The processor 502 may process the executable instructions and/or data structures in part by directly accessing the computer program product, for example by reading from a CD-ROM disk inserted into a disk drive peripheral of the computer system 500. Alternatively, the processor 502 may process the executable instructions and/or data structures by remotely accessing the computer program product, for example by downloading the executable instructions and/or data structures from a remote server through the network connectivity devices 512. The computer program product may comprise instructions that promote the loading and/or copying of data, data structures, files, and/or executable instructions to the secondary storage 504, to the ROM 506, to the RAM 508, and/or to other non-volatile memory and volatile memory of the computer system 500.

In some contexts, the secondary storage 504, the ROM 506, and the RAM 508 may be referred to as a non-transitory computer readable medium or a computer readable storage media. A dynamic RAM embodiment of the RAM 508, likewise, may be referred to as a non-transitory computer readable medium in that while the dynamic RAM receives electrical power and is operated in accordance with its design, for example during a period of time during which the computer system 500 is turned on and operational, the dynamic RAM stores information that is written to it. Similarly, the processor 502 may comprise an internal RAM, an internal ROM, a cache memory, and/or other internal non-transitory storage blocks, sections, or components that may be referred to in some contexts as non-transitory computer readable media or computer readable storage media. At least some, if not all, of the steps or “blocks” of method 400 shown in FIG. 4, may be executed by the computer system 500 shown in FIG. 5, although it is to be understood that at least some of the steps of method 400 may be executed by systems other than computer system 500.

While several embodiments have been shown and described, modifications thereof can be made by one skilled in the art without departing from the scope or teachings herein. The embodiments described herein are exemplary only and are not limiting. Many variations and modifications of the systems, apparatus, and processes described herein are possible and are within the scope of the disclosure. For example, the relative dimensions of various parts, the materials from which the various parts are made, and other parameters can be varied. Accordingly, the scope of protection is not limited to the embodiments described herein, but is only limited by the claims that follow, the scope of which shall include all equivalents of the subject matter of the claims. Unless expressly stated otherwise, the steps in a method claim may be performed in any order. The recitation of identifiers such as (a), (b), (c) or (1), (2), (3) before steps in a method claim are not intended to and do not specify a particular order to the steps, but rather are used to simplify subsequent reference to such steps.

As such, the preceding discussion is directed to various exemplary embodiments. However, one skilled in the art will understand that the examples disclosed herein have broad application, and that the discussion of any embodiment is meant only to be exemplary of that embodiment, and not intended to suggest that the scope of the disclosure, including the claims, is limited to that embodiment.

Certain terms are used throughout the preceding description and claims to refer to particular features or components. As one skilled in the art will appreciate, different persons may refer to the same feature or component by different names. This document does not intend to distinguish between components or features that differ in name but not function. The drawing figures are not necessarily to scale. Certain features and components herein may be shown exaggerated in scale or in somewhat schematic form and some details of conventional elements may not be shown in interest of clarity and conciseness.

Unless the context dictates the contrary, all ranges set forth herein should be interpreted as being inclusive of their endpoints, and open-ended ranges should be interpreted to include only commercially practical values. Similarly, all lists of values should be considered as inclusive of intermediate values unless the context indicates the contrary.

In the preceding discussion and the claims, the terms “including” and “comprising” are used in an open-ended fashion and thus should be interpreted to mean “including, but not limited to . . . . ” Also, the term “couple” or “couples” is intended to mean either an indirect or direct connection. Thus, if a first device couples to a second device, that connection may be through a direct engagement between the two devices or through an indirect connection established via other devices, components, nodes, and connections. In addition, as used herein, the terms “axial” and “axially” generally mean along or parallel to a particular axis (e.g., a central axis of a body or a port), while the terms “radial” and “radially” generally mean perpendicular to a particular axis. For instance, an axial distance refers to a distance measured along or parallel to the axis, and a radial distance means a distance measured perpendicular to the axis. Any reference to up or down in the description and the claims is made for purposes of clarity, with “up,” “upper,” “upwardly,” “uphole,” or “upstream” meaning toward the surface of the borehole and with “down,” “lower,” “downwardly,” “downhole,” or “downstream” meaning toward the terminal end of the borehole, regardless of the borehole orientation.

As used herein, the terms “approximately,” “about,” “substantially,” and the like mean within 10% (i.e., plus or minus 10%) of the recited value unless otherwise stated. Thus, for example, a recited angle of “about 80 degrees” refers to an angle ranging from 72 degrees to 88 degrees. Where single components, apparatuses, or systems are described as performing functions, multiple such components, apparatuses, or systems may implement the functions.

Thus, while several embodiments have been provided, the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented. Likewise, where single components, apparatuses, or systems are described as performing functions, multiple such components, apparatuses, or systems may implement the functions.

Claims

What is claimed is:

1. A method, comprising:

receiving a plurality of images of a plurality of source materials, wherein the source materials are from a target geologic time period and include a set of nannofossil species;

selecting a subset of the plurality of images, wherein the subset comprises images of source material from a first target geographic region and a second target geographic region, wherein the first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold; and

generating a training library comprising the subset.

2. The method of claim 1, further comprising:

training a machine learning model with the training library;

receiving an image of a subsurface nannofossil from a prospective region of interest;

applying the machine learning model to the received image to generate a biostratigraphic analysis;

assessing hydrocarbon prevalence in the prospective region of interest based on the biostratigraphic analysis; and

adjusting parameters of a drilling operation based on the assessment of hydrocarbon prevalence.

3. The method of claim 1, wherein the plurality of source materials includes preserved astronomically tuned cores.

4. The method of claim 1, wherein selecting the subset is based on biostratigraphic significance of the set of nannofossil species.

5. The method of claim 1, wherein the subset further comprises images of source materials including a marker species.

6. The method of claim 1, wherein the subset further comprises images of source materials including a grouped species, wherein the grouped species include a plurality of nannofossil species that have morphological similarities.

7. The method of claim 1, wherein the plurality of images includes a first image captured at a first focal length, and a second image captured at a second focal length, wherein the first image and the second image are each of a same region of interest within the source materials.

8. The method of claim 1, wherein the plurality of images includes a first image captured at a first morphological orientation, and a second image captured at a second morphological orientation, wherein the first image and the second image are each of a same nannofossil species in the source materials.

9. The method of claim 1, wherein the plurality of images includes a first image captured under a first light source, and a second image captured under a second light source, wherein the first image and the second image are each of a same region of interest within the source materials.

10. An electronic device, comprising:

one or more processors; and

a memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the electronic device to be configured to:

receive a plurality of images of a plurality of source materials, wherein the source materials are from a target geologic time period and include a set of nannofossil species;

select a subset of the plurality of images, wherein the subset comprises images of source material from a first target geographic region and a second target geographic region, wherein the first target geographic region includes hydrocarbon amount greater than a first threshold and the second target geographic region includes hydrocarbon amount less than a second threshold; and

generate a training library comprising the subset.

11. The electronic device of claim 10, wherein the memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the electronic device to be further configured to:

train a machine learning model with the training library;

receive an image of a subsurface nannofossil from a prospective region of interest;

apply the machine learning model to the received image to generate a biostratigraphic analysis;

assess hydrocarbon prevalence in the prospective region of interest based on the biostratigraphic analysis; and

adjust parameters of a drilling operation based on the assessment of hydrocarbon prevalence.

12. The electronic device of claim 10, wherein the plurality of source materials includes preserved astronomically tuned cores.

13. The electronic device of claim 10, wherein selecting the subset is based on biostratigraphic significance of the set of nannofossil species.

14. The electronic device of claim 10, wherein the subset further comprises images of source materials including a marker species.

15. The electronic device of claim 10, wherein the subset further comprises images of source materials including a grouped species, wherein the grouped species include a plurality of nannofossil species that have morphological similarities.

16. The electronic device of claim 10, wherein the plurality of images includes a first image captured at a first focal length, and a second image captured at a second focal length, wherein the first image and the second image are each of a same region of interest within the source materials.

17. The electronic device of claim 10, wherein the plurality of images includes a first image captured at a first morphological orientation, and a second image captured at a second morphological orientation, wherein the first image and the second image are each of a same nannofossil species in the source materials.

18. The electronic device of claim 10, wherein the plurality of images includes a first image captured under a first light source, and a second image captured under a second light source, wherein the first image and the second image are each of a same region of interest within the source materials.

19. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors of an electronic device, cause the electronic device to be configured to:

receive a plurality of images of a plurality of source materials, wherein the source materials are from a target geologic time period and include a set of nannofossil species;

generate a training library comprising the subset.

20. The non-transitory computer-readable medium storing instructions of claim 19, wherein the stored instructions, when executed by the one or more processors, cause the electronic device to be further configured to:

train a machine learning model with the training library;

receive an image of a subsurface nannofossil from a prospective region of interest;

apply the machine learning model to the received image to generate a biostratigraphic analysis;

assess hydrocarbon prevalence in the prospective region of interest based on the biostratigraphic analysis; and

adjust parameters of a drilling operation based on the assessment of hydrocarbon prevalence.

21. The non-transitory computer-readable medium of claim 19, wherein the plurality of source materials includes preserved astronomically tuned cores.

22. The non-transitory computer-readable medium of claim 19, wherein selecting the subset is based on biostratigraphic significance of the set of nannofossil species.

23. The non-transitory computer-readable medium of claim 19, wherein the subset further comprises images of source materials including a marker species.

24. The non-transitory computer-readable medium of claim 19, wherein the subset further comprises images of source materials including a grouped species, wherein the grouped species include a plurality of nannofossil species that have morphological similarities.

25. The non-transitory computer-readable medium of claim 19, wherein the plurality of images includes a first image captured at a first focal length, and a second image captured at a second focal length, wherein the first image and the second image are each of a same region of interest within the source materials.

26. The non-transitory computer-readable medium of claim 19, wherein the plurality of images includes a first image captured at a first morphological orientation, and a second image captured at a second morphological orientation, wherein the first image and the second image are each of a same nannofossil species in the source materials.

27. The non-transitory computer-readable medium of claim 19, wherein the plurality of images includes a first image captured under a first light source, and a second image captured under a second light source, wherein the first image and the second image are each of a same region of interest within the source materials.

Resources

Images & Drawings included:

Fig. 01 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 01

Fig. 02 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 02

Fig. 03 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 03

Fig. 04 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 04

Fig. 05 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 05

Fig. 06 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 06

Fig. 07 - GENERATING A STRATIGRAPHIC TRAINING LIBRARY — Fig. 07

Sources:

United States Patent and Trademark Office - verify current appl. status at the USPTO↗

Recent applications in this class:

» 20260103973 2026-04-16
Drilling Optimization Method
» 20260085602 2026-03-26
IMPROVED WELLBORE CONTROL AND MODELS USING IMAGE DATA SYSTEMS AND METHODS
» 20260078663 2026-03-19
TUBULAR RUNNING OPERATIONS WITH FREQUENCY SPECTRUM ANALYSIS
» 20260071535 2026-03-12
SYSTEMS AND METHODS FOR CONTROLLING A DOWNHOLE TOOL IN A DOWNHOLE ENVIRONMENT
» 20260071534 2026-03-12
SYSTEM AND METHOD OF GENERATING AND IMPLEMENTING OUTPUTS FOR A DRILLING RIG USING A WELL CONSTRUCTION KNOWLEDGE MINING SYSTEM BASED ON WELL CONSTRUCTION DATA
» 20260071533 2026-03-12
TREATMENT FLUID SELECTION
» 20260063026 2026-03-05
FLARE IMAGING
» 20260063025 2026-03-05
AI APPROACH FOR DRILLING OPERATION INSIGHTS
» 20260063024 2026-03-05
SYSTEMS AND METHODS FOR IMPROVED WEIGHT-ON-BIT ACCURACY DURING DRILLING
» 20260063023 2026-03-05
COILED TUBING DRILLING ADVISOR

Recent applications for this Assignee:

» 20260105388 2026-04-16
ADIABATIC QUANTUM ALGORITHM FOR ELECTRIC VEHICLE CHARGING STATION SELECTION
» 20260098976 2026-04-09
SYSTEMS AND METHODS FOR GENERATING SYNTHETIC SEISMIC ATTRIBUTES FROM WELL LOGS
» 20260072416 2026-03-12
FOSSIL FUEL PRODUCTION OPTIMIZATION
» 20260063223 2026-03-05
CHOKE AND CONTROL VALVES
» 20250383464 2025-12-18
Noise Attenuation Methods Applied During Simultaneous Source Deblending and Separation
» 20250354064 2025-11-20
SYSTEMS AND METHODS FOR PREDICTING ONE OR MORE PARAMETERS OF PETROLEUM COKE BASED ON ONE OR MORE PARAMETERS OF AN ASSOCIATED COKE PRODUCTION SYSTEM
» 20250334046 2025-10-30
SYSTEMS AND METHODS FOR FORECASTING FUTURE EXCURSIONS IN HYDROCARBON PROCESSING SYSTEMS USING SENSOR DATA
» 20250308642 2025-10-02
GENERATIVE SYSTEMS AND METHODS FOR PRODUCING CONFORMATIONAL ISOMERS USING HYBRID GENERATIVE ADVERSARIAL NETWORKS
» 20250291078 2025-09-18
METHODS AND APPARATUS FOR STOCHASTIC SEISMIC DATA INVERSION BY ENFORCING THE LOW FREQUENCY MODEL
» 20250244493 2025-07-31
METHODS AND APPARATUS FOR ESTIMATING SEISMIC DEPTH UNCERTAINTY