Patent application title:

SYSTEMS AND METHODS FOR ANALYSIS OF SAMPLES ASSOCIATED WITH NONALCOHOLIC FATTY LIVER DISEASE

Publication number:

US20250327125A1

Publication date:
Application number:

18/870,228

Filed date:

2023-06-01

Smart Summary: New systems and methods have been developed to analyze biological samples related to non-alcoholic fatty liver disease. These tools help identify specific markers, known as biomarkers, that are linked to the disease. By recognizing these molecular signatures, researchers can better understand the condition. This information can support drug discovery and improve treatment options for patients. Overall, the goal is to enhance research and care for those affected by non-alcoholic fatty liver disease. 🚀 TL;DR

Abstract:

Provided herein are systems and methods for analysis of biological samples to identify biomarkers associated with non-alcoholic fatty liver disease. For example, provided herein are molecular signatures that find use in characterizing samples to facilitate research, drug discovery, and treatment associated with nonalcoholic fatty liver disease.

Inventors:

Applicant:

Interested in similar patents?

Get notified when new applications in this technology area are published.

Classification:

C12Q2600/156 »  CPC further

Oligonucleotides characterized by their use Polymorphic or mutational markers

C12Q2600/158 »  CPC further

Oligonucleotides characterized by their use Expression markers

C12Q1/6883 »  CPC main

Measuring or testing processes involving enzymes, nucleic acids or microorganisms ; Compositions therefor; Processes of preparing such compositions involving nucleic acids; Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Nos. 63/347,799, filed Jun. 1, 2022, and 63/377,471, filed Sep. 28, 2022, the contents of which are herein incorporated by reference in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under DK107904 awarded by the National Institutes of Health. The government has certain rights in the invention.

SEQUENCE LISTING STATEMENT

The contents of the electronic sequence listing titled UM-39791-601.xml (Size: 27,011,446 bytes; and Date of Creation: May 31, 2023) is herein incorporated by reference in its entirety.

FIELD

Provided herein are systems and methods for analysis of biological samples to identify biomarkers associated with nonalcoholic fatty liver disease. For example, provided herein are molecular signatures that find use in characterizing samples to facilitate research, drug discovery, and treatment associated with nonalcoholic fatty liver disease.

BACKGROUND

Nonalcoholic fatty liver disease (NAFLD) is the most common liver disease worldwide and has no effective treatments. NAFLD is heritable.

With rising obesity rates, the prevalence of nonalcoholic fatty liver disease (NAFLD) has increased to epidemic proportions. NAFLD is caused by the deposition of excess fat in the liver (not due to alcohol), and can lead to advanced liver diseases including inflammation, fibrosis/cirrhosis (scarring), and hepatocellular carcinoma (HCC; liver cancer). NAFLD is also associated with metabolic diseases including dyslipidemia, hypertension, cardiovascular disease, and diabetes, though causal relationships have yet to be established. More than 90% of severely obese individuals suffer from advanced NAFLD, which is associated with a shorter lifespan. The disease imposes an annual direct medical cost of about $103 billion in the United States and will soon become the leading indication for liver transplantation in this country. The causes of NAFLD are poorly understood, and there are presently no effective treatments, making NAFLD treatment a large unmet medical need.

NAFLD is heritable and has identified variants associated with disease. However, these variants explain only about 20% of the heritability. What is needed are systems and methods to better analyze the disease to facilitate drug discovery and disease prevention and treatment.

SUMMARY

Provided herein are systems and methods for analysis of biological samples to identify biomarkers associated with nonalcoholic fatty liver disease (NAFLD). For example, provided herein are molecular signatures that find use in characterizing samples to facilitate research, drug discovery, and treatment associated with nonalcoholic fatty liver disease.

In experiments conducted during the development of the invention, the largest genome-wide association meta-analysis of imaging and diagnostic code measured NAFLD to date was carried out. We identified a number of genome-wide significant NAFLD associated variants, a significant NAFLD associated gene, and confirmed ten additional, previously published liver function test (LFT) and NAFLD associated variants. These variants, and the genes and pathways they highlight, provide new insights into the pathogenesis of NAFLD, identify subtypes of disease, and create new genetic marker panels that can identify individuals at higher genetic risk of advanced liver disease and that facilitate research, drug discovery, and treatment of patients suffering from NAFLD.

For example, new NAFLD associated variants at TOR1B (Torsin Family 1 Member B), FTO (FTO Alpha-Ketoglutarate Dependent Dioxygenase), COBLL1 (Cordon-Bleu WH2 Repeat Protein Like 1)/GRB14 (Growth Factor Receptor Bound Protein 14), INSR (Insulin Receptor), SREBF1 (Sterol regulatory element-binding transcription factor 1), and PNPLA2 (Patatin Like Phospholipase Domain Containing 2), as well as reproducible NAFLD associated variants at APOE (Apolipoprotein E), MARC1 (Mitochondrial Amidoxime Reducing Component 1), GCKR (Glucokinase Regulator), TM6SF2 (Transmembrane 6 Superfamily Member 2), PNPLA3 (Patatin Like Phospholipase Domain Containing 3), GPAM (Glycerol-3-Phosphate Acyltransferase, Mitochondrial), TRIB1 (Tribbles Pseudokinase 1), MTTP (Microsomal Triglyceride Transfer Protein), ADH1B (Alcohol Dehydrogenase 1B (Class I), Beta Polypeptide), PTPRD (Protein Tyrosine Phosphatase Receptor Type D), andTMC4 (Transmembrane Channel Like 4)/MBOAT7 (Membrane Bound O-Acyltransferase Domain Containing 7), were identified.

Genes implicated by these variants play a role in mitochondrial, very-low-density lipoprotein (VLDL), cholesterol, and de novo lipogenesis processes. PheWAS analyses reveal at least seven subtypes of NAFLD. Genetic predisposition to NAFLD causally predisposes to cirrhosis and genetic predisposition to higher body mass index and waist circumference causally predisposes to NAFLD. Individuals at the top 10% and 1% of genetic risk have 3- to 6-fold increased risk of NAFLD, cirrhosis, and hepatocellular carcinoma. These genetic variants identify subtypes of disease, improve estimates of disease risk, and guide development of targeted therapeutics as well as identifying subject for appropriate interventions and preventative strategies.

For example, in some embodiments compositions, kits, systems, and methods are provided for analyzing the one or more variants. Variants are detected directly or indirectly. In some embodiments, direct methods comprise use of a molecular assay such as a hybridization assay (e.g., using one or more allele-specific primers or probes), a sequencing assay, a microarray, a cleavage assay, or the like. In some embodiments, indirect methods comprising detection of variants in linkage equilibrium with a variant, detections of altered gene expression relative to wild-type, or the like.

In some embodiments, the methods comprise analyzing a biological sample from a subject for one or more variants. In some embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, all) of the variants rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358 and mutations in MTTP are detected. In some embodiments, one or more of these variants is detected in combination with one or more other variants. In some such embodiments, the total number of variants detected or analyzed is less than 500, less than 200, less than 100, less than 50, or less than 25. In some embodiments, at least 10 of the listed variants are analyzed. In some embodiments, at least fifteen of the variants listed are analyzed. In some embodiments, at least 20 of the variants listed are analyzed. In some embodiments, only variants from the listed variants are analyzed. In other embodiments, additional variants not listed are analyzed in combination with one or more of the listed variants.

Any suitable sample may be used that contains nucleic acid amenable to analysis. In some embodiments, the biological sample is selected from the group consisting of blood, serum, plasma, saliva, tissue, hair, semen, and urine. In some embodiments, a biological sample is obtained from a subject suspected of having nonalcoholic fatty liver disease. Such suspicion may arise from any of any number of factors including, but not limited to, family history, obesity, signs or symptoms of disease, and a positive imaging or diagnostic test suggesting disease.

Also provided herein are methods of managing nonalcoholic fatty liver disease, comprising: analyzing a biological sample from a subject for one or more of the variants from the list of rs738408, rs58542926, rs429358, rs 1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP; generating a fatty liver disease risk score based on the presence or absence of said variants; and treating the subject with a nonalcoholic fatty liver disease intervention if said risk score indicates a predisposition to or presence of nonalcoholic fatty liver disease. In some embodiments, the risk score is calculated using an algorithm that accounts for each of the analyzed variants. In some embodiments, the risk score further is based on one or more of blood count, liver enzyme test data, liver function test data, hepatitis A test data, hepatitis C test data, celiac disease screening test data, fasting blood sugar, hemoglobin AIC data, and lipid profile data. In some embodiments, the risk score further is based on one or more of; age, gender, and/or body composition. In some embodiments, the risk score further is based on one or more of abdominal ultrasound data, computerized tomography (CT) scanning data, magnetic resonance imaging (MRI) data, transient elastography data, and magnetic resonance elastography data. In some embodiments, the treating comprises applying a weight loss regime. In some embodiments, the treating comprises liver transplantation. In some embodiments, the treating comprises administration of a pharmaceutical agent. In some embodiments, the pharmaceutical agent is one or more of: an essential phospholipid (e.g., polyenylphosphatidylcholine); an anti-diabetic agent (e.g., insulin, metformin, pioglitazone, glucagon-like peptide-1 (GLP-1) agonists, sodium-glucose cotransporter-2 (SGLT-2) inhibitors, thiazolidinediones (TZD), obeticholic acid, ursodeoxycholic acid, RG-125); a dietary supplement (e.g., vitamin E, silymarin, S-adenosyl-L-methionine (SAMe), glutathione, glycyrrhizic acid); an antifibrotic agent (e.g., RAS blockers such as angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II receptor blockers (ARBs), pentoxifylline, larsucosterol, galectin-3 inhibitors, cenicriviroc); and an anti-obesity agent (e.g., sibutramine).

Further provided herein are systems (e.g., kits, reactions mixtures, etc.) comprising: a set or reagents that specifically detect one or more variants from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP. In some embodiments, the system detects a total of less than 500, less than 200, less than 100, less than 50, or less than 25 variants. In some embodiments, the reagents comprise one or more primers or probe specific for the variants (e.g., primers or probes useful in allele-specific PCR or similar assays). In some embodiments, the reagents comprising nucleic acid sequence reagents. In some embodiments, the reagents comprise a microarray (e.g., a hybridization based microarray).

Also provided herein is a non-transitory computer-readable storage medium comprising an instruction, wherein when the instruction is run by at least one computer processor, wherein the at least one processor performs operations comprising one or more or each of the steps: a) receiving data identifying the presence or absence of a variant in a biological sample from at least one of rs738408, rs58542926, rs429358, rs 1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP; b) generating a nonalcoholic fatty acid liver disease risk score from the data; and c) displaying or reporting said risk score. The displaying may comprise generating a written or electronic report for use by a physician, a researcher, a patients, or any other desired format.

Further provided herein are methods of diagnosing fatty liver disease or predisposition to fatty liver disease comprising: analyzing a biological sample from a subject for one or more variant from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP

BRIEF DESCRIPTION OF FIGURES

FIG. 1 shows the characteristics of a subset of GOLDPlus genome-wide significant variants in GOLD ancestry-based cohorts. For each variant the characteristics are shown for the GOLD ancestry-based analysis including: associated gene, NAFLD increasing effect allele (EA), effect allele frequency (EAF), effect/beta and 95% confidence interval, Cochran's Q heterogeneity 12 metric and heterogeneity p-value, EA p-value (P), and sample size (N). Results are for meta-analysis of GOLD European ancestry (red), African ancestry (blue), Hispanic ancestry (green), Chinese ancestry (purple), and all ancestries pooled (black).

FIG. 2 shows the effects of NAFLD associated variants on other human diseases and traits. Associations between NAFLD associated variants and diseases are shown as Z-scores in the heatmap. White horizontal bars between the groups in the heatmaps were used to separate each k-means cluster. Red indicates that the NAFLD-increasing allele has increased association with the disease/trait, blue indicates decreased association, and white indicates no significant association. A horizontal bar atop the heatmap corresponds to overall groupings of the disease/traits in the key. Gray boxes on the vertical axis indicate the overall protein localization of the genes in each cluster.

FIGS. 3A-3C show the associations between NAFLD polygenic risk score with NAFLD, cirrhosis, and HCC in an independent cohort. Association between percentile of GOLDPlus NAFLD polygenic risk score on the independent MGI cohort on NAFLD (FIG. 3A), cirrhosis (FIG. 3B), or HCC (FIG. 3C). All results are depicted as odds ratios for NAFLD, cirrhosis, or HCC relative to individuals in the 0-10th percentile of polygenic risk score, adjusted for sex, age, age2, and PCs 1-10. Error bars represent 95% confidence intervals.

FIG. 4 shows GOLDPlus NAFLD measures meta-analysis study design.

FIGS. 5A-5Q are LocusZoom plots of index GOLDPlus Significant Variants. Index variant is labeled in purple and when applicable exonic variant in LD with index variant is labeled in red and 1000G EUR ancestry linkage disequilibrium structure utilized is used. (FIG. 5A) rs738408-PNPLA3, (FIG. 5B) rs58542926-TM6SF2, (FIG. 5C) rs429358-APOE, (FIG. 5D) rs1260326-GCKR, (FIG. 5E) rs28601761-TRIB1, (FIG. 5F) rs4918722-GPAM, (FIG. 5G) rs2807834-MARC1, (FIG. 5H) rs7661964-MTTP, (FIG. 5I) rs7029757-TOR1B, (FIG. 5J) rs1229984-ADH1B, (FIG. 5K) rs17817449-FTO, (FIG. 5L) rs79953491-COBLL1, (FIG. 5M) rs112630404-INSR, (FIG. 5N) rs626283-TMC4/MBOAT7, (FIG. 50) rs4561528-SREBF1, (FIG. 5P) rs10756038-PTPRD, and (FIG. 5Q) rs140201358-PNPLA2.

FIG. 6 shows European GOLDPlus NAFLD measures meta-analysis schematic.

FIG. 7 shows characteristics of GOLDPlus genome-wide significant variants in GOLD ancestry-based cohorts. For each variant the characteristics are shown for the GOLD ancestry-based analysis including: associated gene, NAFLD increasing effect allele (EA), effect allele frequency (EAF), effect/beta and 95% confidence interval, Cochran's Q heterogeneity 12 metric and heterogeneity p-value, EA p-value (P), and sample size (N). Results are for meta-analysis of GOLD European ancestry (red), African ancestry (blue), Hispanic ancestry (green), Chinese ancestry (purple), and all ancestries pooled (black).

FIG. 8 shows characteristics of GOLDPlus genome-wide significant variants in GOLD sex-specific cohorts. For each variant the characteristics are shown for the GOLD sex-specific analysis including: associated gene, NAFLD increasing effect allele (EA), effect allele frequency (EAF), effect/beta and 95% confidence interval, Cochran's Q heterogeneity 12 metric and heterogeneity p-value, EA p-value (P), and sample size (N). Results are for meta-analysis of GOLD cohort males (blue), females (red), and pooled sexes (black).

FIG. 9 shows DEPICT analysis of biological enrichment of NAFLD associated variants. Physiological system, cell, and tissue enrichment of NAFLD associated genetic variants. Height of the bar represents-log10p-value. Orange shading represents statistical significance at false discovery rate (FDR)<0.05.

FIG. 10 show K-Means clustering of PheWAS results for NAFLD associated variants. Grid shows variant cluster assignment for K-means clusters of k=4, k=5, k=6, and k=7. Variants assigned to each cluster are shown in the color-coded legends.

FIGS. 11A-11D shows two-sample Mendelian randomization analysis for casual associations between NAFLD and fibrosis/cirrhosis and esophageal varices. Effect size is shown by a red point and 95% confidence interval by a red line for MR EGGER and inverse variance weighted methods for (FIG. 11A) NAFLD exposure (GOLD cohort, N=10 instruments) and K74: fibrosis/cirrhosis outcome (UKBB) and (FIG. 11B) NAFLD exposure (GOLD cohort, N=10 instruments) and 185: esophageal varices outcome (UKBB). The crosshairs on the plots in FIGS. 11C and 11D represent the 95% confidence intervals for each SNP-NAFLD or SNP-outcome association for (FIG. 11C) NAFLD exposure (GOLD cohort, N=10 instruments) and K74: fibrosis/cirrhosis outcome (UKBB) and (FIG. 11D) NAFLD exposure (GOLD cohort, N=10 instruments) and 185: esophageal varices outcome (UKBB).

FIGS. 12A-12D show two-sample Mendelian randomization analysis for casual associations between BMI, waist circumference, and NAFLD. Effect size is shown by a red point and 95% confidence interval by a red line for MR EGGER and inverse variance weighted methods for (FIG. 12A) waist circumference GWAS (UKBB, N=302 instruments (independent SNPs p-value <5E-08)) and GOLD cohort outcome (FIG. 12B) BMI GWAS (UKBB, N=315 instruments (SNPs p-value <5E-08)) and GOLD cohort outcome. The crosshairs on the plots in FIGS. 12C and 12D represent the 95% confidence intervals for each SNP-NAFLD or SNP-outcome association for (FIG. 12C) waist circumference GWAS (UKBB, N=211 instruments) and GOLD cohort outcome and (FIG. 12D) BMI GWAS (UKBB, N=283 instruments) and GOLD cohort outcome.

FIGS. 13A and 13B show convolutional neural network schematic for UKBB MRI liver imaging (PCC values). Scatter plot of predicted UKBB MRI-PDFF values versus “true” UKBB MRI-PDFF values (as determined by Perspectum Diagnostics). Pearson correlation coefficients are shown for (FIG. 13A) gradient echo image protocol and (FIG. 13B) IDEAL image protocol.

FIG. 14 is a chart showing the effects of NAFLD associated variants in individual GOLDPlus meta-analysis datasets.

FIG. 15 is a table outlining the association of the identified biomarkers for 7 metabolic groups.

FIG. 16 is a schematic showing treatments for various indications of NAFLD.

FIGS. 17A-17F show the genetic and environmental factors associated with progression to cirrhosis in Michigan Genomics Initiative. Models were run as Fine-Gray competing risk analyses. Diabetes status (FIG. 17A), obesity status (FIG. 17B), and alanine aminotransferase (ALT) (FIG. 17C), with upper limited of normal (ULN) defined as 19 U/L in women and 30 U/L in men. PNPLA3-rs738409 genotype (FIG. 17D), TRIB1-rs28601761 genotype (FIG. 17E) and cirrhosis polygenic risk score (FIG. 17F), divided into quartiles (Q), with Q1 indicating the lowest quartile.

FIGS. 18A and 18B show PNPLA3 genotype and diabetes status identify a subgroup of patients with low FIB4 with cirrhosis incidence comparable to that of patients with high FIB4 in the Michigan Genomics Initiative (FIG. 18A) and the UK Biobank (FIG. 18B). Models were run as a Fine-Gray competing risk analysis. Patients were divided into three groups: high FIB4, low FIB4 with diabetes [(+) DM] and PNPLA3-rs738409-GG genotype [(+) PNPLA3], and low FIB4 with diabetes and PNPLA3-rs738409-CC or-CG genotype [(−) PNPLA3]. High FIB4 was defined as >=2.67 while low was defined as <2.67. Hazard ratios (HRs) and p values are shown at the top left of each graph and represent effects of each group after adjustment for age, sex, and principal components 1-10.

DETAILED DESCRIPTION

Disclosed herein are a number of loci that include several genes not previously known to be associated with nonalcoholic fatty liver disease (NAFLD). The effect of these variants on NAFLD was congruent across study, ancestry, sex, and alcohol intake. However, some of the associated variants have EAF differences across ancestries which are consistent with differences in population burden of NAFLD. An additional gene, MTTP, was associated with NAFLD via gene-based analysis. Tissue and pathways enrichment analyses of these associations identified liver, lipid, cholesterol, steroid, alcohol, and monocarboxylic acid processes as being enriched. PheWAS analysis resulted in at least seven subtypes/clusters of NAFLD associated variants and implicated genes from these analyses that play a role in mitochondrial, VLDL, cholesterol, and de novo lipogenesis processes. A risk score of the NAFLD-associated genetic variants improved risk predictions when added to age, sex, and clinical factors in identifying people with elevated risk of NAFLD, cirrhosis, and hepatocellular carcinoma (HCC).

Carrying out the analysis across imaging, ICD-based, and NLP-based diagnosis of NAFLD provided substantial advantages over traditional histology-or single modality-based GWAS. These measures are less expensive, less invasive, and more ethically applicable to asymptomatic individuals in the general population than liver biopsy. The inclusion of non-histology-measured NAFLD increased power and decreased ascertainment bias. Furthermore, by assessing heterogeneous effects of variants across multiple modalities, a variant associated with other types of liver disease, such as glycogen storage disease, that can be misdiagnosed as NAFLD can be identified and removed from the analysis. Also disclosed are machine learning methods to predict MRI-PDFF from abdominal MRI images which can be used to facilitate future studies incorporating imaging analysis for NAFLD and other imaging endpoints.

In addition to identifying novel variants associated with NAFLD, the combined effect of the single variants using MR, pathway analysis, and PRS. MR analysis suggested that obesity, as measured by high BMI or waist circumference, is causally related to development of NAFLD, but not the reverse. However, MR showed hepatic steatosis is causally related to fibrosis/cirrhosis.

Taken together, the genetic variants can identify individuals at higher risk of having NAFLD, cirrhosis and HCC. In an independent cohort, the risk score identified individuals at high risk of NAFLD, cirrhosis, and HCC in the top 5% of the risk score. The risk score added predictive ability when combined with other clinical risk factors, showing that it finds use to identify high-risk individuals who might benefit from more intense management of NAFLD risk factors.

Section headings as used in this section and the entire disclosure herein are merely for organizational purposes and are not intended to be limiting.

Definitions

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of,” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.

Unless otherwise defined herein, scientific, and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.

As used herein, “nucleic acid” or “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively (See Albert L. Lehninger, Principles of Biochemistry, at 793-800 (Worth Pub. 1982)). The present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states. In some embodiments, a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41 (14): 4503-4510 (2002)) and U.S. Pat. No. 5,034,506), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 97:5633-5638 (2000)), cyclohexenyl nucleic acids (see Wang, J. Am. Chem. Soc., 122:8595-8602 (2000)), and/or a ribozyme. Hence, the term “nucleic acid” or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non-nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand. The terms “nucleic acid,” “polynucleotide,” “nucleotide sequence,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.

The terms “complementary” and “complementarity” refer to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick base-paring or other non-traditional types of pairing. The degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary). Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence. Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100%) over a region of at least 8 nucleotides (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides), or if the two nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions. Exemplary moderate stringency conditions include overnight incubation at 37° C. in a solution comprising 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1×SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra. High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C., (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5xSSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution, sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42° C., with washes at (i) 42° C. in 0.2×SSC, (ii) 55° C. in 50% formamide, and (iii) 55° C. in 0.1×SSC (preferably in combination with EDTA). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (2001); and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John Wiley & Sons, New York (1994).

As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is influenced by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, and the Tm of the formed hybrid. Hybridization methods involve the annealing of one nucleic acid to another, complementary nucleic acid, e.g., a nucleic acid having a complementary nucleotide sequence. The ability of two polymers of nucleic acid containing complementary sequences to find each other and “anneal” or “hybridize” through base pairing interaction is a well-recognized phenomenon. The initial observations of the “hybridization” process by Marmur and Lane, Proc. Natl. Acad. Sci. USA, 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA, 46:461 (1960), have been followed by the refinement of this process into an essential tool of modern biology. For example, hybridization and washing conditions are now well known and exemplified in Sambrook et al., supra. The conditions of temperature and ionic strength determine the “stringency” of the hybridization.

“Hybridization probes” are nucleic acids capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include nucleic acids and peptide nucleic acids. Hybridization is usually performed under stringent conditions which are

The term “primer” refers to a single-stranded oligonucleotide capable of acting as a point of initiation of template-directed DNA synthesis under appropriate conditions, in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer, but typically ranges from 15 to 30 nucleotides. A primer sequence need not be exactly complementary to a template, but must be sufficiently complementary to hybridize with a template. The term “primer site” refers to the area of the target DNA to which a primer hybridizes. The term “primer pair” means a set of primers including a 5′ upstream primer, which hybridizes to the 5′ end of the DNA sequence to be amplified and a 3′ downstream primer, which hybridizes to the complement of the 3′ end of the sequence to be amplified.

The nucleic acids, including any primers, probes and/or oligonucleotides can be synthesized using a variety of techniques currently available, such as by chemical or biochemical synthesis, and by in vitro or in vivo expression from recombinant nucleic acid molecules, e.g., bacterial or retroviral vectors. For example, DNA can be synthesized using conventional nucleotide phosphoramidite chemistry or other methodologies well known in the art. In addition, the nucleic acids can comprise uncommon and/or modified nucleotide residues or non-nucleotide residues, such as those known in the art.

The terms “polymorphism” or “variant” refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. Each divergent sequence is termed an allele, and can be part of a gene or located within an intergenic or non-genic sequence. A diallelic polymorphism has two alleles, and a triallelic polymorphism has three alleles. Diploid organisms can contain two alleles and may be homozygous or heterozygous for allelic forms. The first identified allelic form is arbitrarily designated the reference form or allele; other allelic forms are designated as alternative or variant alleles. The most frequently occurring allelic form in a selected population is typically referred to as the wild-type form.

As used herein, “treat,” “treating,” and the like means a slowing, stopping, or reversing of progression of a disease or disorder. The term also means a reversing of the progression of such a disease or disorder. As such, “treating” means an application or administration of methods to a subject, where the subject has a disease or a symptom of a disease, where the purpose is to cure, heal, alleviate, relieve, alter, remedy, ameliorate, improve, or affect the disease or symptoms of the disease.

Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present disclosure. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.

Analyzing Polymorphisms

Provided herein are methods comprising analyzing a biological sample from a subject for one or more of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

The analysis described herein identified several genome-wide significant variants associated with hepatic steatosis and NAFLD, including rs738408-PNPLA3, rs58542926-TM6SF2, rs429358-APOE, rs1260326-GCKR, rs28601761-TRIB1, rs4918722-GPAM, rs2807834-MARC1, rs7661964-MTTP, rs7029757-TOR1B, rs1229984-ADH1B, rs17817449-FTO, rs79953491-COBLL1, rs112630404-INSR, rs626283-TMC4/MBOAT7, rs4561528-SREBF1, rs10756038-PTPRD, and rs140201358-PNPLA2.

The analyzed polymorphisms may be selected to include at least one polymorphism from each of the seven distinct clusters. In some embodiments, polymorphisms may contain at least one polymorphism from each of the significant variants and extended variants as shown in Table 1. In some embodiments, the polymorphisms may comprise at least two or all of the significant variants as shown in Table 1.

Presently the PRS is a composite of multiple SNPs weighted by the Beta of effect in the GOLD consortium as below with allele 1 being the effect allele and the beta being the weight. This is multiplied by the number of alleles (per individual) and summed to get the PRS per individual.

The gene-based analyses identified multiple variants in MTTP that promote NAFLD. MTTP is a well-known gene that transfers phospholipids and triacylglycerols to nascent apoB for the assembly of lipoproteins. The absence of MTTP is known to cause the Mendelian disease abetalipoproteinemia which causes malabsorption of in the digestive track resulting in fatty liver and other health issues. The mutations in MTTP may include, but are not limited to, G661S, Q244E, E98D, and N166S.

The present invention provides a method for diagnosing fatty liver disease or predisposition to fatty liver disease or related diseases or conditions. The presence of such a polymorphisms or mutations can be regarded as indicative of an individual's risk (increased or decreased) for the disease, especially in individuals who lack other predisposing or protective polymorphisms for the same disease. Even in cases where the predictive contribution of a given polymorphism is relatively minor by itself, overall assessment of the polymorphisms allows diagnosis with a much higher degree of certainty and reliability.

The present invention further provides a method of managing nonalcoholic fatty liver disease. Nonalcoholic fatty liver disease (NAFLD) is an umbrella term for a range of liver conditions affecting people who drink little to no alcohol. Some individuals with NAFLD can develop nonalcoholic steatohepatitis (NASH), an aggressive form of fatty liver disease, which is marked by liver inflammation and may progress to advanced scarring (cirrhosis), liver failure, or some forms of liver cancer. This damage is similar to the damage caused by heavy alcohol use. The methods disclosed herein may comprise managing the progression of nonalcoholic fatty liver disease to prevent a more aggressive form of liver disease. By extension, the methods disclosed herein may further act as an indication or prognosis of the risk of liver inflammation, liver scarring (cirrhosis), liver failure, or some forms of liver cancer.

The risk score may be calculated using an algorithm that accounts for one or more or each of the analyzed polymorphisms. The risk score may be calculated using non-weighted or weighted sums of risk polymorphisms using effect sizes from genome-wide association studies as their weights or effects of the particular polymorphism on the score. For example, those polymorphisms with inherently higher risk are weighted differently than those polymorphisms with lower individual risk.

The risk score may be based on other factors outside of the genetic polymorphisms described herein. Other factors may include the general health of the subject, previously identified disease in close family members, or other related identified disease or disorders. For example, risk factors may include high cholesterol, high levels of triglycerides in the blood, obesity, polycystic ovary syndrome, sleep apnea, diabetes, hypothyroidism, hypopituitarism, age, and concentration or abundance of abdominal body fat.

In some embodiments, risk score further is based on one or more of blood count, liver enzyme test data, liver function test data, hepatitis A test data, hepatitis C test data, celiac disease screening test data, fasting blood sugar, hemoglobin A1C data, and lipid profile data. In some embodiments, the risk score further is based on one or more of abdominal ultrasound data, computerized tomography (CT) scanning data, magnetic resonance imaging (MRI) data, transient elastography data, and magnetic resonance elastography data.

The risk score may be a measure of an individual risk of nonalcoholic fatty liver disease or related diseases in comparison to an average individual of a population or subset of population. For example, the score may be in comparison to any other individual or an individual with a similar ethnic background, age, sex, or prior health condition.

The risk score may be used to align a subject's level of disease with appropriate treatments. For examples, subjects with a specific disease phenotype may be linked to specific treatments for that subtype which results in the best management of the disease or lacks unwanted side effects or long-term complications.

The risk score may be output or displayed in any number of formats, including reports with bins, a color or grayscale gradient, a thermometer, a gauge, a histogram, or a bar graph. The risk score may provide a numerical output which is associated with low, medium, or high risk of NAFLD. Alternatively, or in addition, the risk score may be output as a rank score in a populations, such as a percentile of risk within a certain population. The risk score may be output with any proposed treatment recommendations or follow-up procedures to further assess risk. The risk score may be used to classify an individual into disease subtypes based on the at least seven subtypes/clusters of NAFLD associated variants and implicated genes from the analysis disclosed herein.

The risk score may further indicate the need or the type of treatment for an individual suspected to have or at risk of developing nonalcoholic fatty liver disease. Treatments for nonalcoholic fatty liver disease include those known in the art to reduce risk and include lifestyle changes, surgery, or medicament regimes. In some embodiments, the treatments include adoption of a healthy diet and exercise program, optionally as part of a weight loss regime, control of blood sugar, cholesterol lowering medications, and abstaining from alcoholic drinks. In some embodiments, treating includes liver transplantation. In some embodiment, treating comprises administration of one or more active agents. In some embodiments, the active agent is selected from: an essential phospholipid (e.g., polyenylphosphatidylcholine); an anti-diabetic agent (e.g., insulin, metformin, pioglitazone, glucagon-like peptide-1 (GLP-1) agonists, sodium-glucose cotransporter-2 (SGLT-2) inhibitors, thiazolidinediones (TZD), obeticholic acid, ursodeoxycholic acid, RG-125); a dietary supplement (e.g., vitamin E, silymarin, S-adenosyl-L-methionine (SAMe), glutathione, glycyrrhizic acid); an antifibrotic agent (e.g., RAS blockers such as angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II receptor blockers (ARBs), pentoxifylline, larsucosterol, galectin-3 inhibitors, cenicriviroc); an anti-obesity agent (e.g., sibutramine); or any combination thereof.

In some embodiments, the treating includes PNPLA3 siRNA, vitamin E administration, diet control, and Thyroid B agonists, for example when the patient is suspected to have or is at risk of low lipoprotein output. In some embodiments, the treating inhibitors of an acetyl-CoA carboxylase (ACC), Acyl-coenzyme A: diacylglycerol acyltransferase (DGAT), fatty acid synthase (FASN), or inhibitors of SCD1 (e.g., synthetic fatty-acid/bile-acid conjugate (FABAC), e.g., Aramchol) for example when the patient is suspected to have or is at risk of diversion of TG and phospholipids to lipid droplets or excess glucose conversion to fatty acids. In some embodiments, the treating includes ISIS-ANGPTL3, an antisense inhibitor to angiopoietin-like 3, vitamin E administration, diet control, and Thyroid B agonists, for example when the patient is suspected to have or is a risk of high or normal lipoprotein output. In some embodiments, the treating includes agonists of SGLT2-I (Sodium/glucose cotransporter-2), FGF21 (Fibroblast growth factor 21), glucagon-like peptide 1 (GLP1), anti-CB1/PPAR agonists (e.g., cannabinoid CB1 receptor antagonists and/or peroxisome proliferator-activated receptor agonists), inhibitors of microsomal triglyceride transfer protein (MTP or MTTP) (e.g., lomitapide), for example when the patient is suspected to have or is at risk of diabetes, insulin resistance, increases in fatty acids, or de novo lipogenesis (DNL). See, for example, FIG. 16.

In some embodiments, the treatments include modulating transcription, and thereby expression, of one or more target genes. For example, the treatments may include activation or repression of transcription of one or more target genes as listed in Table 7. In some embodiments, the treatments include knocking out one or more target genes. For example, the treatments may include knocking out one or more target genes as listed in Table 7.

In some embodiments, transcription of the target gene is modulated by administering a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) protein system for use in CRISPR interference (CRISPRi) or CRISPR activation (CRISPRa) (see, e.g., Konermann et al. Nature. 2014 Dec. 10. doi: 10.1038/nature14136; Qi, L. S., et al. (2013). Cell. 152 (5): 1173-83; Gilbert, L. A., et al., (2013). Cell. 154 (2): 442-51; and Maeder et al. Nat Methods 10 (10): 977-979 (2013)).

Cas proteins binding of specific DNA sequences through guide RNA can naturally result in a transcription block, a process termed CRISPR interference (CRISPRi). For use in mammalian cells, CRISPRi is even more effective when transcriptional repressor domains are tethered to the Cas protein. Transcriptional repressors may inhibit transcription via: recruitment of other transcription factor proteins; modification of target DNA such as methylation; recruitment of a DNA modifier; modulation of histones associated with target DNA; recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones; or a combination thereof. For example, transcriptional repressors such as the Kriippel associated box (KRAB or SKD); KOX1 repression domain; the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD); histone lysine methyltransferases such as Pr-SET7/8, SUV4-20H1, RIZ1, and the like; histone lysine demethylases such as JM JD2 A/JHDM3 A, JMJD2B, JMJD2C/GASCI, JMJD2D, JARID 1 A/RBP2, JARIDIB/PLU-1, JARIDIC/SMCX, JARIDID/SMCY; histone lysine deacetylases such as HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HD AC 5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11; DNA methylases such as Hhal DNA m5c-methyltransferase (M.Hhal), DNA methyltransferase 1 (DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), MET1, ZMET2, CMT1; periphery recruitment elements such as Lamin A and Lamin B; and functional domains thereof.

CRISPR/Cas systems can also be used to activate gene expression, in an approach termed CRISPR activation (CRISPRa). CRISPRa constructs generally utilize a Cas protein to recruit more than one transcription activation domain with a single gRNA. The activation domains may promote transcription via: recruitment of other transcription factor proteins; modification of target DNA such as demethylation; recruitment of a DNA modifier; modulation of histones associated with target DNA; recruitment of a histone modifier such as those that modify acetylation and/or methylation of histones; or a combination thereof. For example, VP 16; VP64; VP48; VP 160; p65 subdomain (e.g., from NFkB); an activation domain of EDLL; TAL activation domain; histone lysine methyltransferases such as SETIA, SETIB, MLLI to 5, ASHI, SYMD2, NSD1; histone lysine demethylases such as JHDM2a/b, UTX, JMJD3; histone acetyltransferases such as GCN5, PCAF, CBP, p300, TAF1, TIP60/PLIP, MOZ/MYST3, MORF/MYST4, SRC1, ACTR, PI 60, CLOCK; DNA demethylases such as Ten-Eleven Translocation (TET) dioxygenase 1 (TET1CD), TET1, DME, DML1, DML2, and ROS1; and functional domains thereof.

The Cas protein can recruit repressor or activation domains using direct fusions or protein linkers (e.g., SunTag). Alternatively, activation domains can be recruited using nucleic acid approaches, a guide RNA having binding motifs (e.g., MS2) recruits effector domains fused to RNA-motif binding proteins.

Any Cas protein that employs gRNA specific binding to bind to a specific target sequence can be utilized with the systems for CRISPRa and CRISPRi. Usually, a nuclease deficient version of a Cas protein is utilized, for example dCas9, a nuclease-dead Cas9 protein, but other Cas proteins can also be utilized in the methods herein, such as Cas3 and Cas12a.

In some embodiments, transcription of the target gene is knocked out by administering a CRISPR/nuclease protein system, e.g., CRISPR/Cas9, referred to as CRISPR-KO. An insertion or deletion induced by a single guide RNA (gRNA) is often used to generate knock-out cells. For example, a guide RNA targets Cas9 to a target gene, where it creates a double-stranded break (DSB). Cells can survive a DSB when an error-prone repair mechanism like nonhomologous end joining (NHEJ) results in insertion or deletion of one or more base pairs, precluding further binding of the gRNA. Such repairs can result in frameshift mutations and thereby disrupt gene function, oftentimes resulting in functional knockouts.

The CRISPR/Cas systems comprise a guide RNA specific to a target gene to be modulated. The target gene may be any of those listed in Table 7, and the CRISPR/Cas system may comprise any of those gRNAs for CRISPRa, CRISPRi, and CRISPR-KO as indicated in Table 7.

The CRISPR/Cas systems, including Cas proteins and gRNAs, or polynucleotides encoding thereof, may be delivered by any suitable means. Methods of delivering polypeptides and polynucleotides to cells are well known in the art and may include DNA or RNA electroporation, transfection reagents such as liposomes or nanoparticles to delivery DNA or RNA; delivery of DNA, RNA, or protein by mechanical deformation (see, e.g., Sharei et al. Proc. Natl. Acad. Sci. USA (2013) 110 (6): 2082-2087, incorporated herein by reference); or viral transduction. Nucleic acids can be delivered as part of a larger construct, such as a plasmid or viral vector, or directly, e.g., by electroporation, lipid vesicles, viral transporters, microinjection, and biolistics (high-speed particle bombardment). Similarly, polynucleotides can be delivered by any method appropriate for introducing nucleic acids into a cell. In some embodiments, the polynucleotide is a DNA molecule. In some embodiments, the CRISPR/Cas system is provided in a DNA vector. In some embodiments, the CRISPR/Cas system is provided as an RNA molecule.

Additionally, delivery vehicles such as nanoparticle- and lipid-based polynucleotide or protein delivery systems can be used. Further examples of delivery vehicles include lentiviral vectors, ribonucleoprotein (RNP) complexes, lipid-based delivery system, gene gun, hydrodynamic, electroporation or nucleofection microinjection, and biolistics. Various gene delivery methods are discussed in detail by Nayerossadat et al. (Adv Biomed Res. 2012; 1:27) and Ibraheem et al. (Int J Pharm. 2014 Jan. 1;459 (1-2): 70-83), incorporated herein by reference.

The risk score may also be used for selection (e.g., inclusion or exclusion) for a clinical trial. For example, subjects with a specific risk score may be included for a clinical trial to specifically study those individuals at an increased risk for nonalcoholic fatty acid liver disease, e.g., a genetic enrichment trial. Alternatively, subjects with a specific risk score may be excluded for a clinical trial to avoid potential interference with clinical trial analysis.

In some embodiments, the presence of such a polymorphisms or mutations can be regarded as indicative of an individual's risk (increased or decreased) for other diseases and conditions. As shown in FIG. 2, many of the polymorphisms or mutations had effects on metabolic and anthropometric traits such as lipid concentrations, cardiovascular disease, body mass index, waist/hip circumference, and liver enzyme levels.

In some embodiments, select polymorphisms or mutations are associated with higher low-density lipoprotein (LDL) and triglycerides (TG), increased risk of cardiovascular, lower high-density lipoprotein (HDL), and lower body mass index (BMI) and waist/hip circumference. In some embodiments, select polymorphisms or mutations are associated with higher LDL and TG and higher HDL. In some embodiments, select polymorphisms or mutations are associated with lower LDL, strongly increased risk of liver fibrosis/cirrhosis, and lower or no difference in alkaline phosphatase. In some embodiments, select polymorphisms or mutations are associated with decreased LDL and TG.

In some embodiments, rs28601761 and rs1260326 may be indicative of a decreased level of risk for cholelithiasis and/or cholecystitis. In some embodiments, rs1260326 may be associated with lower insulin-like growth factor 1 (IGF1) and sex hormone binding globulin (SHBG) levels. In some embodiments, rs429358 may be indicative of a decreased level of risk for familial Alzheimer's disease and LDL cholesterol.

The biological sample for analysis in the disclosed methods may be obtained from any suitable biological source, such as, a swab or brush, a physiological fluid including, but not limited to, whole blood, serum, plasma, interstitial fluid, saliva, ocular lens fluid, cerebral spinal fluid, sweat, urine, milk, ascites fluid, mucous, synovial fluid, peritoneal fluid, vaginal fluid, menses, amniotic fluid, semen, feces, and the like, or a tissue or cell sample including, but not limited to, hair, skin, blood, biopsies of the kidney, or liver or other organs or tissues, or sources such as saliva, cheek scrapings, urine, amniotic fluid or CVS samples. In some embodiments, the biological sample is selected from the group consisting of blood, serum, plasma, saliva, tissue, hair, semen, and urine.

The sample can be obtained from a subject using routine techniques known to those skilled in the art, and the sample may be used directly as obtained from the biological source or following a pretreatment to modify the character of the sample. Such pretreatment may include, for example, preparing plasma from blood, diluting viscous fluids, filtration, precipitation, dilution, distillation, mixing, concentration, inactivation of interfering components, the addition of reagents, lysing, and the like.

A “subject” or “patient” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein. Likewise, patient may include either adults or juveniles (e.g., children). Moreover, patient may mean any living organism, preferably a mammal (e.g., human or non-human). Examples of mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. Examples of non-mammals include, but are not limited to, birds, fish, and the like. In one embodiment of the methods and compositions provided herein, the mammal is a human. In some embodiments, the subject is suspected of having nonalcoholic fatty liver disease.

A polymorphism as described herein may be detected directly or indirectly. Direct detection methods may include inspecting a data set indicative of genetic characteristics derived from analysis of the individual's genome. A data set of genetic characteristics of the individual may include, for example, a listing of single nucleotide polymorphisms in the individual's genome or a complete or partial sequence of the individual's genomic DNA. Inspection of the data set including all or part of the individual's genome may optimally be performed by computer inspection. Screening may further comprise the step of producing a report identifying the individual and the identity of alleles at the site of at least one or more polymorphisms. Alternatively, the methods include obtaining and analyzing a nucleic acid sample (e.g., DNA or RNA) from an individual to determine whether the DNA contains informative polymorphisms, such as by combining a nucleic acid sample from the subject with one or more polynucleotide probes capable of hybridizing selectively to a nucleic acid carrying the polymorphism or sequencing the region of the DNA containing the polymorphisms. One skilled in the art will recognize that any one of the commonly available hybridization, amplification and array assay formats can readily be adapted to detect the polymorphisms disclosed herein.

In some embodiments, the polymorphisms are detected by a sequencing assay. The sequence assay may be conducted by any means known in the art, such as the dideoxy chain termination method. In some embodiments, the sequencing assay is performed using high-throughput sequence methods. Following sequencing, the data may be aligned or other analyzed for the presence of the polymorphisms. Methods of alignment of sequences for comparison purposes are well known in the art.

In some embodiments, the polymorphisms may be detected by an amplification-based assay in which a polymorphism-specific primer hybridizes to a region on a target nucleic acid molecule that overlaps the polymorphism and only primes amplification of that form to which the primer exhibits perfect complementarity. This primer is used in conjunction with a second primer that hybridizes at a distal site. Amplification proceeds from the two primers, producing a detectable product that indicates the polymorphism is present in the test sample. A control is usually performed with a second pair of primers, one of which shows one or more mismatches at the polymorphic site and the other of which exhibits perfect complementarity to a distal site. The mismatches prevent amplification or substantially reduce amplification efficiency, so that either no detectable product is formed or it is formed in lower amounts or at a slower pace. Amplification assays are well-known in the art including polymerase chain reaction, ligase chain reactions, strand displacement assays, and the like.

In a hybridization-based assay, probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective DNA segments. Hybridization conditions should be sufficiently stringent that there is a significant detectable difference in hybridization intensity, and preferably an essentially binary response, whereby a probe hybridizes to only one of the loci or significantly more strongly to one loci. A probe may be designed to hybridize to a target sequence that contains a polymorphism anywhere along the sequence of the probe. However, the probe is preferably designed to hybridize to a segment of the target sequence such that the polymorphism aligns with a central position of the probe (e.g., a position within the probe that is at least three nucleotides from either end of the probe). This design of probe generally achieves good discrimination in hybridization between different allelic forms.

Indirect detection refers to determining the presence or absence of a specific polymorphism identified in the genetic profile by detecting a surrogate or proxy polymorphism that is in linkage disequilibrium with the SNP in the individual's genetic profile. Detection of a proxy polymorphism is indicative of a polymorphism of interest and is increasingly informative to the extent that the polymorphisms are in linkage disequilibrium, e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, or about 100% LD. Another indirect method involves detecting allelic variants of proteins accessible in a sample from an individual that are consequent of a risk-associated or protection-associated allele in DNA that alters a codon.

Based on the polymorphisms and associated sequence information disclosed herein, detection reagents can be developed and used to assay any polymorphism of the present invention individually or in combination, and such detection reagents can be readily incorporated into a kit or system. The terms “kits” and “systems,” as used herein in the context of polymorphism detection reagents, are intended to refer to such things as combinations of multiple polymorphism detection reagents, or one or more polymorphism detection reagents in combination with one or more other types of elements or components (e.g., other types of biochemical reagents, containers, packages, substrates, electronic hardware components, etc.). Accordingly, the present invention further provides polymorphism detection kits and systems, including but not limited to, packaged probe and primer, arrays/microarrays of nucleic acid molecules, and beads that contain one or more probes, primers, or other detection reagents for detecting one or more polymorphisms of the present invention. The kits/systems can optionally include various electronic hardware components; for example, arrays (“DNA chips”) and microfluidic systems (“lab-on-a-chip” systems) provided by various manufacturers typically comprise hardware components.

In some embodiments, a polymorphism detection kit typically contains one or more detection reagents and other components (e.g., a buffer, enzymes such as DNA polymerases or ligases, chain extension nucleotides such as deoxynucleotide triphosphates, and in the case of Sanger-type DNA sequencing reactions, chain terminating nucleotides, positive control sequences, negative control sequences, and the like) necessary to carry out an assay or reaction, such as amplification and/or detection of a polymorphism-containing nucleic acid molecule. A kit may further contain means for determining the amount of a target nucleic acid, and means for comparing the amount with a standard, and can comprise instructions for using the kit to detect the polymorphism-containing nucleic acid molecule of interest. In one embodiment of the present invention, kits are provided which contain the necessary reagents to carry out one or more assays to detect one or more polymorphisms disclosed herein. In a preferred embodiment of the present invention, polymorphism detection kits/systems are in the form of nucleic acid arrays, or compartmentalized kits, including microfluidic/lab-on-a-chip systems.

Polymorphism detection kits or systems may contain, for example, one or more probes, or pairs of probes, that hybridize to a nucleic acid molecule at or near each target position. Multiple pairs of allele-specific probes may be included in the kit/system to simultaneously assay large numbers of polymorphisms, at least one of which is a polymorphism of the present invention. In some kits/systems, the allele-specific probes are immobilized to a substrate such as an array or bead. For example, the same substrate can comprise allele-specific probes for detecting any or all of the polymorphisms described herein.

A polymorphism detection kit or system of the present invention may include components that are used to prepare nucleic acids from a test sample for the subsequent amplification and/or detection of a polymorphism-containing nucleic acid molecule. Such sample preparation components can be used to produce nucleic acid extracts (including DNA and/or RNA), proteins or membrane extracts from any biological sample, as described herein.

The terms “arrays,” “microarrays,” and “DNA chips” are used herein interchangeably to refer to an array of distinct polynucleotides affixed to a substrate, such as glass, plastic, paper, nylon or other type of membrane, filter, chip, or any other suitable solid support. The polynucleotides can be synthesized directly on the substrate, or synthesized separate from the substrate and then affixed to the substrate by methods known in the art. Any number of probes, such as allele-specific probes, may be implemented in an array, and each probe or pair of probes can hybridize to a different polymorphism position. In the case of polynucleotide probes, they can be synthesized at designated areas (or synthesized separately and then affixed to designated areas) on a substrate using a chemical process. Each DNA chip can contain, for example, thousands to millions of individual synthetic polynucleotide probes arranged in a grid-like pattern and miniaturized (e.g., to the size of a dime). Preferably, probes are attached to a solid support in an ordered, addressable array.

Another form of kit contemplated by the present invention is a compartmentalized kit. A compartmentalized kit includes any kit in which reagents are contained in separate containers. Such containers include, for example, small glass containers, plastic containers, strips of plastic, glass or paper, or arraying material such as silica. Such containers allow one to efficiently transfer reagents from one compartment to another compartment such that the test samples and reagents are not cross-contaminated, or from one container to another vessel not included in the kit, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another or to another vessel. Such containers may include, for example, one or more containers which will accept the test sample, one or more containers which contain at least one probe or other polymorphism detection reagent for detecting one or more polymorphisms of the present invention, one or more containers which contain wash reagents (such as phosphate buffered saline, Tris-buffers, etc.), and one or more containers which contain the reagents used to reveal the presence of the bound probe or other polymorphism detection reagents. The kit can optionally further comprise compartments and/or reagents for, for example, nucleic acid amplification or other enzymatic reactions such as primer extension reactions, hybridization, ligation, electrophoresis (preferably capillary electrophoresis), mass spectrometry, and/or laser-induced fluorescent detection. The kit may also include instructions for using the kit. Exemplary compartmentalized kits include microfluidic devices known in the art. In such microfluidic devices, the containers may be referred to as, for example, microfluidic “compartments,” “chambers,” or “channels.”

Microfluidic devices and systems miniaturize and compartmentalize processes such as probe/target hybridization, nucleic acid amplification, and capillary electrophoresis reactions in a single functional device. Such microfluidic devices typically utilize detection reagents in at least one aspect of the system, and such detection reagents may be used to detect one or more polymorphisms of the present invention. Exemplary microfluidic systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer included on a microchip. The movements of the samples may be controlled by electric, electroosmotic, or hydrostatic forces applied across different areas of the microchip to create functional microscopic valves and pumps with no moving parts. Varying the voltage can be used as a means to control the liquid flow at intersections between the micro-machined channels and to change the liquid flow rate for pumping across different sections of the microchip.

For genotyping polymorphisms, an exemplary microfluidic system may integrate, for example, nucleic acid amplification, primer extension, capillary electrophoresis, and a detection method such as laser induced fluorescence detection. In a first step of an exemplary process for using such an exemplary system, nucleic acid samples are amplified, preferably by PCR. Then, the amplification products are subjected to automated primer extension reactions using ddNTPs (specific fluorescence for each ddNTP) and the appropriate oligonucleotide primers to carry out primer extension reactions which hybridize just upstream of the targeted polymorphism. Once the extension at the 3′ end is completed, the primers are separated from the unincorporated fluorescent ddNTPs by capillary electrophoresis. The separation medium used in capillary electrophoresis can be, for example, polyacrylamide, polyethyleneglycol or dextran. The incorporated ddNTPs in the single nucleotide primer extension products are identified by laser-induced fluorescence detection.

The present disclosure also provides non-transitory computer-readable media. The non-transitory computer-readable media stores instructions that when executed by one or more processors performs some or all of the operations described in the disclosed methods. In some embodiments, the one or more processors perform operations comprising receiving data identifying the presence or absence of a polymorphism in a biological sample, generating a nonalcoholic fatty acid liver disease risk score from said data, and displaying or reporting said risk score.

The methods described herein can be implemented by one or more processors and a computer-readable medium storing instructions executable by the one or more processors to perform operations, as described above. An at least one computer system may comprise the one or more processors and/or the computer-readable media. The computer system may further comprise one or more local servers or databases connected to or integrated with the at least one computer system. The one or more processors may be configured to communicate via wired or wireless communications with each other or other processors. The one or more processors may be configured to operate on one or more processor-controlled devices that can be similar or different devices.

The readable media described herein may protect the confidentiality and security of protected health information (PHI) in compliance with various privacy standards (e.g., Health Insurance Portability and Accountability Act (HIPAA)). Thus, the readable media may be considered HIPAA-compliant. The readable media and/or the one or more processors may provide or allow one or all of: means of access control, mechanisms to authenticate electronic PHI, functionalities for encryption/decryption, and mechanisms to log activity and implement audits. Data may be communicated using known encryption/decryption and security techniques. For example, DICOM imaging standards support encryption. The system and methods may anonymize any protected subject data.

EXAMPLES

Materials and Methods

Analyses were carried out in cohorts from the Genetics of Obesity-related Liver Disease (GOLD) Consortium, United Kingdom Biobank (UKBB), FinnGen, Electronic Medical Record and Genomics (eMERGE) Consortium, and Michigan Genomics Initiative (MGI) (FIG. 1).

GOLD Consortium—The multiethnic GOLD Consortium includes nine multiethnic cohorts with CT-measured steatosis (N=23,521): AGES11, COPDGene12, FamHS13, FHS14, GENOA15, IRASFS16, JHS17, MESA18, and OOA19.

UKBB—The UKBB cohort was previously described.20 Participants in the NAFLD analyses were included regardless of ethnicity and excluded if they or their relatives had abdominal MRI images. NAFLD cases were identified by ICD-9 571.8 or ICD-10 K76.0 codes. The UKBB NAFLD dataset included 1,827 NAFLD cases and 436,262 controls. A second UKBB NAFLD European only dataset was assembled as stated above and included 1,706 cases and 412,151 controls.

Convolutional neural network (CNN) model for UKBB liver MRI imaging—A CNN model was applied to determine liver proton density fat fraction (PDFF) from MRI in UKBB. UKBB uses two imaging protocols: gradient echo (GRE) (N=10,093) and IDEAL (N=35,779), which includes N=1,491 individuals that had undergone both protocols. To determine the MRI proton density fat fraction (PDFF) for all participants, a standard 2D U-Net was applied to segment the GRE and IDEAL liver data. ITK-SNAP software was used to manually annotate the liver in 98 randomly chosen images from the GRE protocol. Next, the segmented GRE images were split into training (N=64), validation (N=16), and test (N=18) sets. The result showed that liver segmentation achieved Dice scores over 94%. Similarly, the liver was manually annotated in 95 randomly chosen images from the IDEAL protocol. Next, the segmented IDEAL images were split into training (N=64), validation (N=16), and test (N=15) sets. The overall performance of the liver segmentation is also about 94% on Dice scores. After the liver has been identified by 2D U-net model on each slice for all of two imaging protocols, a 2D CNN Residual Neural Network (2D-CNN-ResNet) model using two steps was applied on the segmented liver. From the 4,616 individuals with true PDFF values, quantified by Perspectum Diagnostics from gradient echo imaging, 4,569 individuals with a full set of ten standard liver segmentation images were selected and split into training, validation, and test datasets. The 2D-CNN-ResNet model was trained and validated on 3,500 participants and tested on the remaining 1,069 participants. For the remaining 5,477 individuals from the gradient echo protocol, the CNN model developed here was used to predict PDFF. This 2D-CNN-ResNet model was then applied to estimate the PDFF value of participants from the IDEAL protocol. Based on these overlapping samples (N=1,491) with true PDFF value derived from the first step, 2D-CNN-ResNet model was trained (N=952), validated (N=238), and tested (N=301). PDFF for the remaining 34,351 participants with only IDEAL imaging were then inferred using this CNN model. Inferred PDFF had a Pearson correlation coefficient of 0.976 and 0.984 in the validation and testing datasets. True PDFF values were also measured (FIG. 13). This will be called the UKBB MRI-PDFF dataset, which after accounting for genetic missingness (N=1,151) totaled N=43,293. A second UKBB MRI-PDFF dataset included only European participants and totaled N=41,834.

eMERGE—The eMERGE NAFLD cohort (N=1,106 cases; 8,571 controls) was previously described and summary statistics are available at ebi.ac.uk/gwas/studies/GCST008468. Effect allele frequencies were not available and were estimated using UK Biobank Europeans.

FinnGen—FinnGen data freeze 4 summary statistics from finngen.fi/fi (N=651 NAFLD cases, 176,248 controls) was used for the analysis described herein.

MGI—MGI is a hospital-based cohort of patients seen at Michigan Medicine (Ann Arbor, MI). The MGI cohort was previously described.23 NAFLD cases were identified by ICD-9 571.8, or ICD-10 K76.0, and HCC by ICD-9 155.0 or ICD-10 C22.0. Cirrhosis was defined by ICD-9 571.2 or 571.5 or 571.6, or ICD-10 K70.2-4 or K74.x or K71.7 or NLP (which has been previously described).23

Genome-wide association study (GWAS) and meta-analysis—GWAS of autosomal variants was carried out assuming additive effects in each of the nine GOLD cohorts separately. The analyses were corrected for age, age2, sex, alcoholic drinks, and principal components (PCs) or admixture. Sensitivity analyses by sex, study, and ancestry did not show significant heterogeneity allowing us to combine the data across cohorts for all individuals with genetic data (N=23,521). The GOLD Consortium meta-analysis was performed using the inverse variance approach in METAL (08/28/2018 release).

GWAS of autosomal variants were carried out independently in UKBB using linear mixed modeling using SAIGE (version 0.29) with binary NAFLD or inverse normally-transformed MRI-PDFF as the dependent variable using an additive genetic model. A SNP imputation quality cutoff of 0.85 was used. The model was controlled for sex, age, age2, and PCs 1-10.

Summary statistics from FinnGen and eMERGE studies were combined with the UKBB NAFLD, UKBB MRI-PDFF, and GOLD CT steatosis analyses using a sample size and direction of effect meta-analysis implemented in METAL (FIG. 1) in an analysis referred to herein as GOLDPlus. Multi-allelic variants, indels, variants with minor allele frequency<0.001, and variants with minor allele count<400 were excluded. Variants with HetP-value<0.05 and opposing directionality were also excluded across studies. A p-value<5.0×10−3 was considered genome-wide significant. Given the multiethnic nature of the analysis, independent loci were identified using a 500Kb flanking criteria from the lowest p-value associated variant. To ascertain independent signals, a direct conditional analysis was also performed for all top hits using the UKBB multiethnic cohort. To perform conditional analysis, the genetic dosage of the loci was added to the other covariates (age, age2, sex, PCs 1-10) of SAIGE step 1 and the GWAS was rerun.

Ancestry-specific and sex-specific analyses in the GOLD Consortium—In order to assess ancestry-specific differences, a meta-analysis was conducted in the GOLD Consortium for each ancestry (European, African, Hispanic, and Chinese) separately and all ancestries together using METAL. Additionally, separate GWAS in men and women in the GOLD Consortium were conducted and meta-analyzed the GWAS using METAL. Sex-specific GWAS analyses were controlled for age, age2, and PCs 1-10. Cochran's Q was used to assess the observed heterogeneity and the I2 metric was used for quantification. A Cochran's Q p-value<2.0×10−4 was considered significant.

GWAS analysis stratified by alcohol use—Using the UKBB MRI-PDFF data alcohol-specific GWAS of heavy and light drinkers was performed. Heavy drinkers were identified as ≥14 drinks consumed per week for males or ≥7 drinks a week for females (N=21,396) and light drinkers as ≤1 drinks consumed per week for males and females (N=9,888). The UKBB MRI-PDFF GWAS were carried out as described above. A meta-analysis of the heavy and light drinkers was performed using METAL in order to assess the heterogeneity.

Previously published NAFLD/Steatosis variants—The effects of previously reported NAFLD/Steatosis variants were evaluated in GOLDPlus. A literature search was conducted for NAFLD and steatosis GWAS in PubMed and genome-wide significant variants were identified. Variants that were independent of the GOLDPlus genome-wide significant variants (500Kb flanking criteria from the lowest p-value associated variant) were assessed.

Phenome-wide association study (PheWAS)—Publicly available UKBB GWAS data from the Neale lab was utilized to perform a PheWAS of the NAFLD increasing alleles with related phenotypes. Associations were considered significant with a p-value<0.05.

PheWAS clustering—The PheWAS data was clustered by Z-score for the respective phenotype/variant combinations. Clustering was performed using R version 4.0.2. Optimal clusters were determined using the ‘NbClust’ package version 3.0. The ‘stats’ package was used for K-means clustering and the ‘dendextend’ version 1.13.4 and ‘dendogram’ packages were used for hierarchical clustering.

Mendelian randomization—A two-sample Mendelian randomization (MR) was performed, implemented in R version 3.6.0 using ‘TwoSampleMR’ version 0.5.5. For the analysis, the variant-NAFLD effect estimates from the GOLD Consortium (betas are required for MR and the GOLD Consortium data had the highest quality measures of hepatic steatosis in the population-based cohorts) were used. Only those variants with an F-statistic>10 were included in the MR analysis. 43 MR was performed using the resulting variants as the exposure and related publicly available and UKBB GWAS (K74 fibrosis and cirrhosis of liver and 185 oesophageal varices, a complication of cirrhosis) as outcomes. The reverse analysis was also performed where independent genome-wide significant (p-value<5.0×10−8) variants from the aforementioned GWAS were used as exposure and the GOLD Consortium phenotype as the outcome. Inverse-variance weighted, penalized weighted median, weighted median, weighted mode, and MR-Egger methods were also applied. Tests for heterogeneity and horizontal pleiotropy were also performed.

Data-driven expression prioritization integration for complex traits (DEPICT)—DEPICT provides details regarding GWAS-prioritized tissues, genes, and pathways across cells and tissues.44 Enrichment was considered statistically significant at a false discovery rate (FDR) p-value<0.05.

Polygenic risk scores (PRS) and NAFLD risk factors—A PRS was created using the liver fat increasing variants (N=17) from the GOLDPlus meta-analysis. The PRS was based on a weighted sum of dosage of the NAFLD associated single variants. The beta value of each allele (from GOLD Consortium) was used to weigh the PRS. The predictive power of the PRS was assessed on NAFLD, cirrhosis, and HCC cohorts in MGI European ancestry samples. PRS were defined as inverse-normally transformed rank units or as percentiles. Analyses were adjusted for age, age2, sex, and PCs 1-10. The predictive power of the PRS was assessed in comparison to other NAFLD risk factors using univariate and multivariate linear models. NAFLD risk factors were the median outpatient values for the MGI cohort. Linear models were generated using the ‘glm’ function in R. The C-statistic was calculated using the ‘DecsTools’ package in R.

Example 1

GOLDPlus meta-analysis

A meta-analysis of CT measured liver fat (GOLD) was carried out with UKBB MRI liver PDFF, UKBB NAFLD, eMERGE NAFLD, and FinnGen NAFLD in the largest meta-analysis to date of NAFLD (FIG. 4). In all cases the top associated variants for all datasets were at PNPLA3 verifying congruency across the phenotypes. Eleven independent genome-wide significant variants were identified (p-value<5.0×10−8) (Table1; FIG. 5). These variants are referred to as the GOLDPlus Significant Variants. Genes for annotation were prioritized if the index variant was a missense variant in the gene, in high LD (r2>0.7) with an exonic variant in the gene, and/or was an eQTL for the gene in liver. Genes that were within 1 Mb of the index variant and predominantly expressed in the liver, prioritized by DEPICT analysis, and/or nearest to the index variant were also prioritized for annotation.

One region contained possible two independent loci within close proximity of each other: one at ADH1B—rs1229984 which is within 500 kb of MTTP-rs7661964. To confirm that these two signals were independent of each other conditional analyses were carried out in the UKBB multiethnic dataset. ADH1B in the UKBB multiethnic cohort had a p-value=5.09E-06 and a p-value=1.03E-05 before and after conditioning on MTTP. MTTP had a p-value=2.01E-07 and a p-value=4.09E-07 before and after conditioning on ADH1B. Novel variants were defined as those more than 1 MB away from genome-wide significant variants (p-value<5.0×10−8) from previously published NAFLD and hepatic steatosis GWAS. Novel associations were identified in or near TOR1B, FTO, COBLL1/GRB14, INSR, SREBF1, and PNPLA2 (Table1; FIG. 5). Previously identified NAFLD associations were confirmed in or near PNPLA3, TM6SF2, APOE, GCKR, TRIB1, GPAM, MARC1, MTTP, ADH1B, TMC4/MBOAT7, and PTPRD. One genome-wide significant variant LOC157273/PPPIR3B (rs4841132; p-value=4.21λ10−13; HetP-value=7.44×10−19) was removed from downstream analysis due to phenotype heterogeneity (see Methods). rs4841132 is known to promote liver damage by increasing glycogen, which is a distinct pathology from NAFLD.

The index variants at several loci are missense variants: TM6SF2, APOE, GCKR, ADH1B, and PNPLA2. The index variants in PNPLA3, GPAM, MARC1, MTTP, and TMC4/MBOAT7 are in LD (r2>0.99 across all ethnicities) with missense variants PNPLA3 (1148M; rs738409), GPAM (V43I; rs2792751), MARC1 (T493A; rs2807834), MTTP (145T; rs3816873), and TMC4/MBOAT7 (TMC4 G17E; rs641738) respectively. The index variants associated with TRIB1 and SREBF1 are intergenic, while the variants in TOR1B, FTO, COBLL1/GRB14, INSR, and PTPRD are intronic. rs7029757 is an eQTL for TOR1B (FDR p-value=5.00E-04), which is expressed in the liver. TRIB1, MTTP, TOR1B, INSR and PTPRD are the genes nearest to the respective non-coding index variants. SREBF1 is within 1 MB of the index variant and is highly expressed in the liver. rs79953491 is an intronic variant in COBLL1 which is expressed in the liver. Additionally, GRB14, which is highly expressed in the liver, is within 1 MB of rs79953491. Literature review suggests that rs56094641 at FTO may exert its effects on BMI by affecting IRX3/6 expression in adipose tissue.

A second meta-analysis was performed using the same datasets but included only European ancestry participants (FIG. 6). Seventeen independent genome-wide significant variants were also identified (p-value<5.0×10−8) (PNPLA3, TM6SF2, APOE, GCKR, TRIB1, GPAM, MARC1, MTTP, ADH1B, TOR1B, TMC4/MBOAT7, COBLL1/GRB14, SREBF1, INSR, FTO, PNPLA2 and TAMM41/SYN2) (Table 2). The European meta-analysis differs only at one locus from the multiethnic analysis: TAMM41/SYN2 is genome wide significant in the European analysis whereas PTPRD is significant in the multiethnic analysis. The overlapping genome-wide significant variants shared across the two analyses have a less significant p value of association in the European data due to the smaller sample size in this dataset.

Example 2

Effects of Identified Variants by Study, Ancestry, Sex, and Alcohol Intake

The heterogeneity of effect of the NAFLD associated variants across the studies was assessed in GOLDPlus. After Bonferroni correction, only s58542926 at TM6SF2 and rs429358 at APOE showed statistically significant heterogeneity of effect. However, its direction of effect across studies was congruent. For completeness, the effects of the loci overall are shown and stratified by cohort (Table 1 and FIG. 14, respectively).

The effects of the NAFLD associated variants across ancestries were assessed (FIGS. 1 and 7) (European (EUR), N=15,880; African (AFR), N=5,607; Hispanic (HIS), N=1,674; and Chinese (CHN), N=360) and sex (males, N=11,006; females, N=12,515) (FIG. 8). For these analyses, the GOLD Consortium data was utilized, which had the highest quality measures of hepatic steatosis in population-based cohorts across ancestries and sex. PNPLA3 (B=0.24 EUR, B=0.27 AFR, B=0.24 HIS, B=0.17 CHN, HetP-value=5.69×10−6) exhibited significant heterogeneity of effect across ancestries. However, a limited sample size in the Chinese ancestry cohort likely caused unstable estimate of betas, influencing the estimates of heterogeneity. After removal of the Chinese cohort from the meta-analysis the heterogeneity P-value was non-significant after Bonferroni correction (PNPLA3, HetP-value=0.69). No other loci showed significant heterogeneity of effect by ancestry or sex.

Greater than a 10% absolute difference in effect allele frequencies (EAF) was found for index variants in PNPLA3 (rs738408-T), GCKR (rs1260326-T), TRIB1 (rs28601761-C), GPAM (rs2792735-G), MARC1 (rs2642438-G), ADH1B (rs1229984-C), FTO (rs62033399-T), PTPRD (rs10756038-G), TMC4/MBOAT7 (rs641738-T), MAST3 (rs273507-C), ERLIN1 (rs17729876-G), OSGIN1 (rs4782568-C), COBLL1 (rs6712203-C), ITPR2 (rs10842708-G), SDCBP (rs113895159-C), and SUOX (rs705699-G) across ancestries (FIGS. 1 and 7). Variants in six genes, PNPLA3, GCKR, GPAM, PTPRD, COBLL1/GRB14, and INSR, had a relative decreased frequency of the NAFLD increasing allele while those in TRIB1, MARC1, and SREBF1 had an increased frequency in the African ancestry cohort as compared to the European ancestry cohort. In the Hispanic cohort, as compared to the European cohort, the frequency of the NAFLD increasing allele was lower in variants in GCKR and FTO and higher in PNPLA3, TRIB1, MARC1, COBBL1/GRB14, and SREBF1. In the Chinese cohort, as compared to the European cohort, the frequency of the NAFLD increasing allele was lower in variants in ADH1B, FTO, INSR, and TMC4/MBOAT7 and higher in PNPLA3, GCKR, TRIB1, MARC1, MTTP, COBLL1/GRB14, and SREBF1. PNPLA2 is a rare variant and was not well imputed in GOLD Consortium datasets and thus QC′d out.

The starkest contrasts in allele frequencies across ancestries existed in ADH1B. In the Chinese ancestry cohort ADH1B (rs1229984-C) had an EAF of 0.26, while it had >65% EAF in the European, African, and Hispanic ancestry cohorts. The variance explained across the ancestries paralleled the allele frequencies more than the effect sizes, which were similar across ancestries. The highest variances explained were 2.79% in the Hispanic cohort for PNPLA3, 2.42% in the Chinese cohort for GCKR, and 2.04% in the European cohort for PNPLA3. Taken together, these findings suggest EAF, more than effect size, accounts for the differences in genetic disease burden across ancestries. To assess the effects of alcohol the largest population based cohort, UKBB MRI-PDFF, was used to perform a GWAS analysis stratified by alcohol use. After Bonferroni correction, only ADH1B exhibited significant heterogeneity of effect (HetP-value=6.16E-04) between heavy (>14 drinks per week for males or >7 drinks a week for females; N=21,396) and light (≤1 drinks per week for males and females; N=9,888) drinkers for the NAFLD associated variants. ADH1B had a significantly greater effect (B=0.20) in heavy drinkers as compared to light drinkers (B=0.03).

Example 3

Tissue, Gene-Set, and Pathway Analyses

To further understand the biology underlying NAFLD associations, DEPICT was used to identify enriched tissues and cell types (FDR p-value<0.05).44 Input into DEPICT included the 17 NAFLD associated single variants. Liver and adipose tissue were the most enriched tissue types (FIG. 9). Epithelial cells (hepatocytes) were the most enriched cell type (FIG. 9). Using mSigDB significant gene functional overlaps were computed. Enrichment was found (FDR p-value<0.01) in the following biological functions: lipid homeostasis, lipid metabolic processes, monocarboxylic acid metabolic processes, alcohol metabolic processes, lipid biosynthesis, regulation of cholesterol biosynthesis, and steroid biosynthesis.

Example 4

Association of NAFLD Variants with Other Phenotypes

Publicly available GWAS data was utilized to perform a PheWAS of NAFLD-risk increasing alleles with ICD-based diseases; alcohol intake; cardiovascular and body composition measures; and lipid, metabolic, and liver function test blood values (FIG. 2). Clustering of the PheWAS results revealed six distinct groups with differing biological effects (FIG. 10). The NAFLD-risk increasing allele of the variants broadly separated into two groups: one showing significant associations with increased serum low density lipoprotein cholesterol (LDL) and increased alanine aminotransferase (ALT) (TRIB1, GCKR, COBLL1/GRB14, INSR, PNPLA2, SREBF1, MTTP, GPAM, MARC1, TMC4/MBOAT7, TOR1B, and ADH1B associations) and the other group exhibiting decreased associations with LDL and increased associations with ALT (FTO, PTPRD, PNPLA3, TM6SF2, and APOE). Further separations showed NAFLD associating variants at TRIB1, GCKR, COBLL1/GRB14, INSR, PNPLA2, and SREBF1 were distinguished from TOR1B, MARC1, GPAM, TMC4/MBOAT7, and ADH1B associations by being associated with high serum triglycerides and low high-density lipoprotein (HDL) cholesterol. NAFLD associated variants at TRIB1 and GCKR were distinguished from COBLL1/GRB14, INSR, PNPLA2, SREBF1, and MTTP, SREBF1 by being associated with low risk of cholelithiasis and cholecystitis; GCKR had particularly strong association with lower insulin-like growth factor 1 (IGF1) and sex hormone binding globulin (SHBG) levels. NAFLD increasing associations at PTPRD, and FTO all associated with increased serum triglycerides whereas those at PNPLA3, TM6SF2, and APOE associated with decreased serum triglycerides. FTO clustered alone, and differed from other loci in having very strong association with increased body mass index (BMI). Likewise, APOE clustered alone and differed from PNPLA3, and TM6SF2 associations in having an increased association with body composition measures and decreased association with familial Alzheimer's disease.

Example 5

Mendelian Randomization

To determine whether NAFLD causally influences liver and metabolic diseases and traits two-sample Mendelian randomization was performed. NAFLD associated variants with an F-statistic>10 were used as a combined instrumental variable for steatosis (N=12; combined F-statistic=158.2). Using the GOLD Consortium effects as the exposure, NAFLD increased risk of liver fibrosis and cirrhosis (ICD K74; OR=1.002, 95% CI=1.001-1.003, MR-Egger p-value=1.69E-03) and esophageal varices (ICD 185; OR=1.003, 95% CI=1.002-1.004, MR-Egger p-value=1.75E-04) (FIG. 11). The MR Egger heterogeneity p-values were non-significant for fibrosis (p-value=0.21) and esophageal varices (p-value=0.08). The MR Egger pleiotropy p-values were non-significant for fibrosis (p-value=0.19) but were significant for esophageal varices (p-value=0.02), indicating horizontal pleiotropy may be driving the results of the esophageal varices Mendelian randomization. Sensitivity analyses are shown in FIGS. 11C-11D.

The causal effects of metabolic disorders, body composition measures and advanced liver disease were assessed on NAFLD. The GOLD Consortium was used as outcome and independent genome-wide significant variants (p-value<5E-08) from previously published GWAS (ebi.ac.uk/gwas/) as exposure. Increased BMI (OR=1.29, 95% CI=1.05-1.59, MR-Egger p-value=0.02) and waist circumference (OR=1.36, 95% CI=1.02-1.82, MR-Egger p-value=3.6E-02) increased risk of NAFLD (FIG. 12). The MR-Egger heterogeneity p-values were non-significant for BMI (p-value=0.051) and waist circumference (p-value=0.095). The MR-Egger pleiotropy p-values were non-significant for BMI (p-value=0.46) and waist circumference (p-value=0.296). The respective sensitivity analyses are shown in FIGS. 12C-12D.

Example 6

Effects on Liver Outcomes: NAFLD, Cirrhosis, Hepatocellular Carcinoma

In order to assess the cumulative effects of NAFLD increasing variants on disease a PRS was constructed based on a weighted sum of dosage (multiethnic ancestry) of the NAFLD associated single variants (N=17) and its effect was assessed in an independent cohort: MGI (Table 3). Higher NAFLD PRS was strongly associated with an increased odds-ratio for NAFLD in MGI (FIG. 3A). Compared to those in the bottom decile of the PRS, individuals in the top 10%, 5%, and 1% had OR=2.83 (95% CI=2.39-3.34), 3.40 (95% CI=2.83-4.09), and 4.66 (95% CI=3.53-6.14) for NAFLD, respectively. Higher NAFLD PRS was also associated with increased odds of both MGI cirrhosis (top 10% OR 2.47 (95% CI=1.95-3.12), 5% 3.39 (95% CI=2.64-4.36), and 1% 4.87 (95% CI=3.39-7.00)) and MGI HCC (top 10% OR 2.91 (95% CI=1.77-4.78), 5% 4.35 (95% CI=2.59-7.31), and 1% 6.34 (95% CI=3.14-12.78)) (FIGS. 3B-3C).

Example 7

Pnpla3 and Diabetes in NAFLD Progression

NAFLD was defined based on ALT elevation and cirrhosis based on ICD codes in both cohorts. A Michigan Medicine cohort the ALT criterion has 88.6% specificity for NAFLD. In UKBB, the ALT definition of NAFLD was validated among the subset of participants who underwent liver magnetic resonance imaging with proton density fat fraction measurement and found that specificity of ALT elevations was 93.0% (3,272/3,515) for liver fat fraction >5.5%. ICD codes for cirrhosis demonstrated a positive predictive value of 86% in a Michigan Medicine cohort. A Michigan Medicine cohort was evaluated for sensitivity for ICD-10 codes for cirrhosis by evaluating patients with NAFLD (defined by ALT as above) who had imaging evidence of cirrhosis. It was found that 973/1251 (77.8%) of these patients had an ICD code for cirrhosis within 12 months of the date of the imaging study showing cirrhosis, implying that ICD codes have acceptable sensitivity for cirrhosis. ICD codes for cirrhosis were unable to be directly validated sensitivity in UK Biobank due to lack of access to a “gold standard” metric of cirrhosis.

The MGI cohort included 7,893 participants with NAFLD, among whom median age 52 years and approximately half were female. As expected in a NAFLD cobort, there was a high prevalence of diabetes (36%) and obesity (58%). Incident cirrhosis developed in 590 (6.8%) of MGI participants during a median follow-up of 72.5 months (IQR 45.9-100.5 months), yielding an incidence rate of 4.01 per 1,000 PY overall and 3.58 per 1,000 PY among those who did not have baseline advanced fibrosis (FIB4<2.67).

Univariate analysis showed that Fibrosis-4 (FIB4) score was strongly predictive of incident cirrhosis. Other risk factors included diabetes (hazard ratio [HR] 2.14 [95% confidence interval (CI) 1.60-2.85, p=3.0×10−7]), higher body mass index (HR 1.83 [95% CI 1.26-2.64, p=0.0014] for obese vs. lean/overweight), and elevated ALT (HR 2.00 [95% CI 1.48-2.69], p=5.2×10−6 for ≥ vs. <2x ULN) (FIG. 17). There was no significant association between hypertension or dyslipidemia and incident cirrhosis (p>0.05 for both comparisons).

Genetic variants previously associated with steatosis/cirrhosis were systematically evaluated. In MGI, only two of these individual variants were associated with increased rate of progression to cirrhosis: PNPLA3-rs738409-GG (vs.-CC) with HR 3.48 (95% CI 2.32-5.22, p=1.7×10−9) and TRIB1-rs28601761-CC (vs. GG) with HR 2.15 (95% CI 1.30-3.53, p=0.0026) (Table 4, FIG. 17). A previously-reported polygenic risk score for cirrhosis was associated with incident cirrhosis, but an effect was only observed at the highest risk quartile: HR 2.30 (1.53-3.46, p=6.3×10−5) vs. lowest quartile. Variants in TM6SF2, HSD17B13, and other previously reported risk loci were not significantly associated with incident cirrhosis. A sensitivity analysis including only patients without baseline advanced fibrosis (FIB4<2.67) yielded the same overall findings for the association with cirrhosis and genetic and non-genetic predictors as in the overall cohort.

A multivariable model for incident cirrhosis was generated including the most consistent predictors of incident cirrhosis in both cohorts, namely PNPLA3-rs738409-G, TRIB1-rs28601761-C, diabetes, obesity (categorized as obesity vs. lean/overweight), ALT level (categorized as ≥ vs. <2x ULN) and all remained significantly associated with incident cirrhosis with similar hazard ratios compared to the univariable analysis (Table 4). A sensitivity analysis in patients without baseline advanced fibrosis showed similar finding.

The remainder of the MGI analyses focused on patients without advanced fibrosis (FIB4<2.67) to determine whether genetic and/or environmental risk factors can identify a subgroup with more rapid disease progression.

PNPLA3 status was associated with increased risk of progression in the overall cohort of patients without advanced fibrosis (8.89 vs. 3.15 cases per 1,000 PY with PNPLA3-rs738409-GG vs.-CC/-CG genotype, respectively, p<0.0001) (Table 5). This association between PNPLA3 genotype and cirrhosis risk was even more notable when stratified by diabetes status, obesity status, and ALT level (Table 5). For example, among patients with diabetes, the cumulative incidence of cirrhosis was 3.2-fold higher in patients with PNPLA3-rs738409-GG vs.-CC/-CG genotype (16.4 vs. 5.1/1000 PY, respectively) (Table 5). A clinical risk score was generated based on diabetes, obesity, and ALT ≥2x ULN where each patient received 2 points if she had diabetes and 1 point each for obesity and ALT >2x ULN. Patients were divided in low, intermediate, and high risk (0-1, 2-3, and 4 points, respectively); these cutoffs were chosen because cumulative incidence of cirrhosis was similar in patients with 0 vs. 1, or 2 vs. 3 points. PNPLA3-rs738409-GG genotype was again associated with much higher cumulative incidence in the low-risk (6.3 vs. 2.3/1000 PY) and intermediate-risk groups (8.9 vs. 4.0/1000 PY; p<0.05 for both), with a trend toward higher cumulative incidence in the high-risk group as well (22.9 vs. 9.1/1000 PY, p=0.14) (Table 5). TRIB1-rs28601761-CC genotype was associated with higher risk of cirrhosis than-GG or-GC genotypes (4.6 vs. 3.0/1000 PY overall, p=0.0072). This association was also significant in patients without diabetes, with obesity, or with ALT ≥ 2x ULN. In models including gene-environment interaction terms (e.g., PNPLA3-rs738409-G dosage*diabetes status), the interaction terms were not significant for either PNPLA3 or TRIB1 genotype and any of the environmental predictors (p >0.05 for all).

Patients with low baseline FIB4 scores, but with diabetes and PNPLA3-rs738409-GG, had an incidence of cirrhosis similar to that of patients with high baseline FIB4 (HR=0.90 [95% CI 0.39-2.08], p=0.81), and markedly higher than those with low FIB4 score, diabetes, and PNPLA3-rs738409-CC or-CG genotypes (HR 3.03 [95% CI 1.44-6.67], p=0.0035; both comparisons were after adjustment for age, sex, and principal components 1-10) (FIG. 18). Thus, persons with low FIB4 but PNPLA3-rs738409-GG genotype and diabetes had a rate of progression indistinguishable from those with high FIB4 scores.

The findings from MGI in patients with NAFLD in patients from an independent cohort were validated with UKBB. Unlike MGI, UKBB is a population-based cohort and as expected had a lower prevalence of comorbidities such as diabetes and obesity, and lower FIB4 scores. The UKBB cohort included 46,880 patients. In a median follow-up of 155.2 months (IQR 147.2-163.1 months), 191 (0.40%) developed incident cirrhosis, yielding an incident rate of 0.60 per 1000 PY overall and 0.39 per 1000 PY among those without baseline advanced fibrosis.

On univariable analysis, diabetes, obesity, elevated ALT, PNPLA3-rs738409-GG genotype were associated with incident cirrhosis, as was the case in MGI. The association between the TRIB1-rs28601761-G allele was not statistically significant. In UKBB unlike in MGI, TM6SF2-rs58542926-T associated with incident cirrhosis while the cirrhosis polygenic risk score did not. On multivariable analysis, the association between obesity and incident cirrhosis was no longer statistically significant (p=0.09) but there were otherwise no meaningful changes in the results. On sensitivity analysis including only those without advanced fibrosis at baseline, the overall findings were similar compared to the overall UKBB cohort.

Next, the combined effects on cirrhosis incidence of PNPLA3-rs738409 or TRIB1-rs28601761 genotype and the environmental factors of DM, obesity, or ALT elevations, were evaluated in UKBB in patients without baseline advanced fibrosis (e.g., FIB4<2.67). The associations between PNPLA3 genotype, metabolic risk factors, and incident cirrhosis were similar to the findings in MGI. PNPLA3-rs738409-GG genotype was associated with higher overall cumulative incidence of cirrhosis than-CC or-CG genotype (0.61 vs. 0.37/1000 PY, p=0.042) (Table 5). These differences were even greater among patients with diabetes or obesity: UKBB participants with diabetes or obesity and PNPLA3-rs738409-GG genotype had a >3-fold higher cumulative incidence of cirrhosis than did those with the-CC or-CG genotype (3.4 vs. 1.0 events/1000 PY for diabetes and 1.27 vs. 0.42 events/1000 PY for obesity; p<0.001 for both). Similarly, compared to PNPLA3-rs738409-CG or-CC genotype, the GG genotype was strongly associated with higher cumulative incidence of cirrhosis among patients with clinical risk score in the intermediate (1.31 vs. 0.53/1000 PY) or high range (5.78 vs. 1.78/1000 PY; p <0.05 for both). PNPLA3-rs738409 genotype was not significantly associated with incident cirrhosis among patients with low clinical risk score due to very small number of non-obese patients with incident cirrhosis and PNPLA3-rs738409-GG genotype (n=2) or among people without diabetes, obesity, or ALT≥2x ULN. TRIB1-rs28601761 genotype was not significantly associated with increased cumulative incidence of cirrhosis overall or in any subgroup in UKBB. As in MGI, gene-environment interaction terms were not significant between PNPLA3 or TRIB1 genotype and any of the above predictors (p>0.05 for all).

As with MGI, patients with low baseline FIB4 score and diabetes who carried the PNPLA3-rs738409-GG genotype had a cumulative incidence of cirrhosis similar to that of the patients with high baseline FIB4 (HR=0.57 [95% CI 0.29-1.14], p=0.11) and much greater than those with PNPLA3-rs738409-CC or-CG genotypes (HR=3.33 [95% CI 1.61-7.14], p=0.0013; both comparisons adjusted for age, sex, and principal components 1-10) (FIG. 18).

TABLE 1
Variants associated with NAFLD measures in GOLDPlus meta-analysis
SNP ID CHR:POS EA OA EAF Z-score P-value Gene Annotation
rs738408 22:44324730 T C 0.22 35.21  1.53E−271 PNPLA3 (D, E, L, N); SAMM50 (D)
rs58542926 19:19379549 T C 0.07 22.76  1.19E−114 TM6SF2 (D, B*, L, N); NCAN (D); SUGP1
(D); MAU2 (D)
rs429358 19:45411941 T C 0.85 12.18 4.24E−34 APOE (D, E*, L); APOC1 (D, L); TOMM40
(D); PVRL2 (D)
rs1260326 2:27730940 T C 0.38 11.62 3.10E−31 GCKR (D, E*, L, Q); SNX17 (D); C2orf16 (Q)
rs28601761 8:126500031 C G 0.59 9.69 3.50E−22 TRIB1 (D, L, N)
rs4918722 10:113947040 C T 0.27 9.27 1.94E−20 GPAM (D, E, L, N)
rs2807834 1:220970593 G T 0.70 7.79 6.68E−15 MARC1 (D, E*, L, N)
rs7661964 4:100505326 A T 0.74 7.00 2.58E−12 MTTP (D, E, L, N); C4orf17 (D)
rs7029757 9:132566666 G A 0.91 6.68 2.38E−11 TOR1B (N, Q); TOR1A (D)
rs1229984 4:100239319 C T 0.95 6.56 5.57E−11 ADH1B (D, E*, L, N); ADH4 (L); ADH1A (L)
rs17817449 16:53813367 G T 0.39 6.15 7.56E−10 FTO (N); RPGRIP1L (D)
rs79953491 2:165555539 A G 0.88 5.95 2.71E−09 COBLL1 (D, N, E); GRB14 (L)
rs112630404 19:7218635 A T 0.18 5.85 4.88E−09 INSR (D, N)
rs626283 19:54677001 C G 0.43 5.75 8.99E−09 TMC4 (E*, Q, N); MBOAT7 (Q); LENG1 (D)
rs4561528 17:17979099 T C 0.35 5.57 2.52E−08 SREBF1 (D, L); MYO15A (D, Q, E); DRG2
(D, N); DRC3 (D, Q, E); ATPAF2 (D, Q);
TOM1L2 (D, Q); LLGL1 (Q); G1D4 (E);
rs10756038 9:10462423 G A 0.72 5.47 4.58E−08 PTPRD (D, N)
rs140201358 11:823586 G C 0.01 5.50 3.81E−08 PNPLA2 (D, E*, N)
rs738408 22:44324730 T C 0.22 35.21  1.53E−271 PNPLA3 (D, E, L, N); SAMM50 (D)
CHR:POS, chromosome:position; EA, effect allele; OA, other allele; EAF, effect allele frequency.
Gene annotation tag: Gene prioritized by Depict analyses (D); Index variant is exonic (E*); Index variant is in strong LD (r2 > 0.85) with an exonic variant in the indicated gene (E); Index variant is within 1 MB of a variant in the indicated gene that is highly expressed in the liver using Gtex (L); Gene nearest to the index SNP (N); Index variant is in eQTL (FDR p < 0.05) with the indicated gene (Q).

TABLE 2
Independent GOLDPlus European meta-analysis NAFLD variants
Gene CHRBP rsID EA OA EAF Zscore P. value HetPVal
PNPLA3 22:44324730 rs738408 t c 0.22 32.72  7.93E−235 3.49E−02
TM6SF2 19:19388500 rs8107974 t a 0.08 22.86  1.19E−115 1.35E−07
APOE 19:45411941 rs429358 t c 0.85 12.37 3.61E−35 2.07E−02
GCKR 2:27598097 rs4665972 t c 0.40 10.68 1.25E−26 8.75E−02
TRIB1 8:126506694 rs112875651 g a 0.60 9.34 9.32E−21 8.43E−03
GPAM 10:113949664 rs10787429 t c 0.28 8.66 4.87E−18 7.75E−01
MARC1 1:220973563 rs2642442 t c 0.69 7.93 2.20E−15 1.17E−01
MTTP 4:100480915 rs138764179 t c 0.74 6.62 3.71E−11 9.74E−01
ADH1B 4:100239319 rs1229984 c t 0.97 6.54 6.22E−11 6.79E−01
TOR1B 9:132566666 rs7029757 g a 0.91 6.41 1.46E−10 5.44E−01
TMC4/ 19:54677001 rs626283 c g 0.43 6.15 7.76E−10 7.13E−01
MBOAT7
COBLL1/ 2:165555539 rs79953491 a g 0.88 5.89 3.76E−09 2.19E−01
GRB14
SREBF1 17:17977355 rs9303144 c t 0.31 5.68 1.39E−08 8.64E−01
INSR 19:7202759 rs8113542 g a 0.26 5.53 3.19E−08 6.90E−02
FTO 16:53811788 rs62033400 g a 0.40 5.49 3.95E−08 8.27E−01
PNPLA2 11:823586 rs140201358 g c 0.01 5.48 4.25E−08 4.69E−01
TAMM41/ 3:11916108 rs559803897 c t 0.99 5.47 4.50E−08 9.61E−01
SYN2

TABLE 3
Independent GOLDPlus European meta-analysis NAFLD variants
Multiethnic cohort
Covariates N Value in UKBB
mean age (SD) years 43,293 64.2 (7.7)
% female 43,293 51.5
Diseases N Value in UKBB
mean PDFF (SD) 43,293  3.9 (4.3)
European cohort
Covariates N Value in UKBB
mean age (SD) years 41,834 64.3 (7.7)
% female 41,834 51.7
Diseases N Value in UKBB
mean PDFF (SD) 41,834  3.9 (4.3)
Each row gives number of UKBB participants for which a measurement is available/characteristic is known (N); and, the value, as either mean with standard deviation (SD), or N for cases and controls.

TABLE 4
Univariable and multivariable predictors of incident
cirrhosis in the Michigan Genomics Initiative cohort
Univariable Multivariable
Hazard ratio (95% Hazard ratio (95%
Predictor confidence interval) P value confidence interval) P value
Diabetes 2.14 (1.60-2.85) 3.00E−07 2.01 (1.43-2.83) 5.70E−05
Body mass index
Lean/overweight (Referent) (Referent)
Obese 1.83 (1.26-2.64) 0.0014 1.50 (1.04-2.18) 0.031
Alanine aminotransferase
<2x ULN (Referent) (Referent)
>=2x ULN 2.00 (1.48-2.69)  5.2e−06 1.49 (1.06-2.10) 0.024
PNPLA3-rs738409 genotype
CC (Referent) (Referent)
CG 1.45 (1.06-1.98) 0.02 1.43 (1.00-2.06) 0.052
GG 3.48 (2.32-5.22) 1.70E−09 3.24 (2.01-5.23) 1.50E−06
TRIB1-rs28601761 genotype
GG (Referent) (Referent)
GC 1.44 (0.87-2.38) 0.15 1.20 (0.69-2.11) 0.52
CC 2.15 (1.30-3.53) 0.0026 1.91 (1.10-3.32) 0.022
Models were run as Fine-Gray competing risk analyses. Results are shown as hazard ratio (95% confidence interval). In univariable models, effect of each specific predictor is shown after adjustment for age, sex, and genetic principal components 1-10 to account for ethnic variation. Multivariable results indicate hazard ratios for each predictor additionally adjusted for all of the other predictors shown in this table. ULN, upper limit of normal, defined as 19 U/L for women and 30 U/L for men.

TABLE 5
Cumulative incidence of cirrhosis stratified by PNPLA3
genotype, in patients without baseline advanced fibrosis,
in the Michigan Genomics Initiative and UK Biobank
PNPLA3-rs738409 genotype
Cohort CC (lowest risk) or CG GG (highest risk) P value
UK Biobank
All 0.37 (0.31-0.44) 0.61 (0.36-0.97) 0.042
Diabetes 1.00 (0.69-1.41) 3.40 (1.55-6.45) 0.00061
Obesity 0.42 (0.32-0.53) 1.27 (0.74-2.03) <0.0001
ALT >= 2x ULN 0.58 (0.46-0.73) 0.82 (0.41-1.47) 0.29
Clinical risk score
Low 0.27 (0.21-0.34) 0.15 (0.03-0.43) 0.28
Intermediate 0.53 (0.38-0.72) 1.31 (0.63-2.40) 0.0091
High 1.78 (1.00-2.94)  5.78 (1.88-13.50) 0.017
Michigan Genomics Initiative
All 3.15 (2.59-3.80)  8.89 (5.75-13.12) <0.0001
Diabetes 5.09 (3.70-6.83) 16.43 (8.20-29.40) <0.0001
Obesity 3.70 (2.80-4.80)  9.21 (4.76-16.09) 0.0022
ALT >= 2x ULN 4.16 (3.00-5.62) 13.85 (8.07-22.17) <0.0001
Clinical risk score
Low 2.27 (1.62-3.08)  6.32 (2.73-12.45) 0.0059
Intermediate 3.96 (2.74-5.53)  8.89 (3.57-18.31) 0.035
High  9.09 (4.54-16.26) 22.89 (4.72-66.90) 0.14
Cumulative incidence is shown as per 1,000 person-years (95% confidence interval), in the overall cohort and among patients with/without diabetes, obesity, or elevated alanine aminotransferase (ALT), and across the range of clinical risk score. Clinical risk score: low risk includes patients with no diabetes and no more than one of ALT >= 2x ULN or obesity; high risk includes those with diabetes, obesity, and ALT >= 2x ULN; and intermediate risk indicates all other patients. P value is for the association between PNPLA3 genotype (defined as rs738409-CC or -CG vs. -GG) and cumulative incidence of cirrhosis within each subgroup group. ULN, upper limit of normal, defined as 19 U/L for women or 30 U/L for men. Absence of baseline advanced fibrosis was defined as baseline Fibrosis-4 score <2.67.

TABLE 6
Top NAFLD SNPs
Chromosome Position EA OA SNPID Nearest Gene
1 66554145 T C rs11208797 PDE4B
1 110650174 A G rs4839136 LINC01397
1 172354992 C T rs10752943 DNM3
1 219448378 C T rs12137855 LYPLAL1
1 220970028 G A rs2642438 MTARC1
1 235327523 G A rs112879517 ARID4B
2 21383514 G A rs1712246 TDRD15
2 25623603 G A rs114018216 DTNB
2 27169393 G A rs149219797 DPYSL5
2 27730940 T C rs1260326 GCKR
2 99738961 A G rs6741772 MRPL30
2 106914285 C T rs34071542 LOC402096
2 113841030 A G rs6734238 IL1RN
2 137655519 T C rs12999325 THSD7B
2 165528876 C T rs13389219 GRB14
3 5727851 A G rs1840069 MIR4790
3 12329783 C T rs17036160 PPARG
3 50208406 C G rs3774750 SEMA3F
4 17880416 C A rs7700107 LCORL
4 77173739 T C rs75132248 FAM47E, FAM47E-STBD1
4 88230100 T G rs10433937 HSD17B13
4 92929643 A C rs116160256 LNCPRESS2
4 100239319 C T rs1229984 ADH1B
4 100505326 A T rs7661964 MTTP
4 103710930 G A rs223454 LOC102723704
5 22988560 A C rs72750636 CDH12
5 148342399 T C rs2400785 SH3TC2
6 25818755 G A rs9461218 SLC17A1
6 31587870 T A rs2857694 PRRC2A
6 119484820 C G rs601575 MAN1A1
7 127383860 T A rs1936811 RSPO3
7 10521339 T C rs58074807 MGC4859
7 84532205 C G rs782894 SEMA3D
7 98980659 T G rs11973460 ARPC1B
8 6577140 T G rs2911980 AGPAT5
8 9183596 A G rs4841132 LOC157273
8 19824492 T C rs13702 LPL
8 126482077 A G rs2954021 TRIB1, LINC00861
9 10462423 G A rs10756038 PTPRD
9 15194625 C T rs613981 TTC39B
9 16792621 A G rs12553314 BNC2
9 33109149 A C rs13296330 MIR12117
9 132566666 G A rs7029757 TOR1B
10 36070931 C T rs7073191 PCAT5
10 78726447 G C rs118028160 KCNMA1-AS1
10 101912064 T C rs2862954 ERLIN1
10 113949664 I C rs10787429 GPAM, TECTB
10 135378544 T C rs9630002 SYCE1
11 823586 G C rs140201358 PNPLA2
11 122013169 C T rs531897 MIR100HG
12 19149829 T C rs10505835 PLEKHA5
12 21499248 T C rs75208026 SLCO1A2
12 82554772 A G rs75159697 LINC02426
12 85105077 C T rs10862921 SLC6A15
12 97557708 G T rs7307068 NEDD1
12 121424861 A G rs7310409 HNF1A
12 124506631 T C rs10773049 ZNF664-RFLNA
13 51106522 T A rs1239948 DLEU1
13 111019462 A C rs4773169 COL4A2
14 30067638 A G rs7146602 PRKD1
14 94844947 T C rs28929474 SERPINA1
15 73645403 G A rs11630240 HCN4
16 53806453 G A rs56094641 FTO
16 68644795 A G rs11643361 CDH3
17 17979099 T C rs4561528 MYO15A
17 64210580 C A rs1801689 APOH
19 7218635 A T rs112630404 INSR
19 18229208 T G rs56252442 MAST3
19 19379549 T C rs58542926 TM6SF2
19 33889593 A G rs7256564 PEPD
19 45411941 T C rs429358 APOE
19 54677001 C G rs626283 TMC4/MBOAT7
20 62336258 T C rs6062497 ARFRP1
22 17649774 C T rs5748926 IL17RA
22 44324727 G C rs738409 PNPLA3

TABLE 7
CRISPRa and CRISPRi gRNAs
Gene Target gRNA SEQ ID NOs CRISPRa CRISPRi CRISPR-KO
CEBPA 31-45 10177-10191 20358-20372 ACSL3  1-15 10147-10161 20328-20342
DGAT2 46-60 10192-10206 20373-20387 SCAP 16-30 10162-10176 20343-20357
NUDT10 61-75 10207-10221 20388-20402 FBXL14 661-675 10807-10821 20988-21002
USP22 76-90 10222-10236 20403-20417 CD27 676-690 10822-10836 21003-21017
FAM47E  91-105 10237-10251 20418-20432 C5AR1 691-705 10837-10851 21018-21032
HRC 106-120 10252-10266 20433-20447 INTS6 706-720 10852-10866 21033-21047
PRADC1 121-135 10267-10281 20448-20462 LYZL2 721-735 10867-10881 21048-21062
IP6K1 136-150 10282-10296 20463-20477 MAD2L1BP 736-750 10882-10896 21063-21077
DCAF8L1 151-165 10297-10311 20478-20492 TAF2 751-765 10897-10911 21078-21092
TTLL12 166-180 10312-10326 20493-20507 SLC10A3 766-780 10912-10926 21093-21107
PCGF1 181-195 10327-10341 20508-20522 SEC31A 781-768 10927-10941 21108-21122
GAGE1 196-210 10342-10356 20523-20537 NTPCR 769-810 10942-10956 21123-21137
PLEKHF2 211-225 10357-10371 20538-20552 SCD 811-825 10957-10971 21138-21152
CHP1 226-240 10372-10386 20553-20567 CCDC146 826-840 10972-10986 21153-21167
HILPDA 241-255 10387-10401 20568-20582 PAX8 841-855 10987-11001 21168-21182
GRIK5 256-270 10402-10416 20583-20597 TMEM11 856-870 11002-11016 21183-21197
PRR7 271-285 10417-10431 20598-20612 SSTR5 871-885 11017-11031 21198-21212
B3GNT6 286-300 10432-10446 20613-20627 GRPR 886-900 11032-11046 21213-21227
PITPNA 301-315 10447-10461 20628-20642 GSN 901-915 11047-11061 21228-21242
JPH2 316-330 10462-10476 20643-20657 ATXN2L 916-930 11062-11076 21243-21257
MAZ 331-345 10477-10491 20658-20672 HDAC4 931-945 11077-11091 21258-21272
SLC4A2 346-360 10492-10506 20673-20687 ZNF831 946-960 11092-11106 21273-21287
CALHM2 361-375 10507-10521 20688-20702 PREB 961-975 11107-11121 21288-21302
XAGE1A 376-390 10522-10536 20703-20717 OR6C75 976-990 11122-11134 21303-21317
JUP 391-405 10537-10551 20718-20732 ACACA  991-1005 11135-11149 21318-21332
PRR5- 406-420 10552-10566 20733-20747 PSME3IP1 1006-1020 11150-11164 21333-21347
ARHGAP8 ST8SIA5 1021-1035 11165-11179 21348-21362
RTCB 421-435 10567-10581 20748-20762 GPAT4 1036-1050 11180-11194 21363-21377
PHKG2 436-450 10582-10596 20763-20777 HOXD9 1051-1065 11195-11209 21378-21392
UPK1A 451-465 10597-10611 20778-20792 HNF4A 1066-1080 11210-11224 21393-21407
INPP5K 466-480 10612-10626 20793-20807 PCDHGA7 1081-1095 11225-11239 21408-21422
GAMT 481-495 10627-10641 20808-20822 MIR6738 1096-1110 11240-11254 21423-21432
MID1IP1 496-510 10642-10656 20823-20837 OR2A5 1111-1125 11255-11269 21433-21447
APOA4 511-525 10657-10671 20838-20852 NPB 1126-1140 11270-11284 21448-21462
POU2AF3 526-540 10672-10686 20853-20867 KRTAP1-5 1141-1154 11285-11299 21463-21477
TMEM134 541-555 10687-10701 20868-20882 KCNG2 1155-1169 11300-11314 21478-21492
AIFM3 556-570 10702-10716 20883-20897 ATIC 1170-1184 11315-11329 21493-21507
CD24 571-585 10717-10731 20898-20912 MLLT1 1185-1199 11330-11344 21508-21322
DHH 586-600 10732-10746 20913-20927 PRKAR1B 1200-1214 11345-11359 21523-21537
FEM1B 601-615 10747-10761 20928-20942 MIR6765 1215-1229 11360-11374 21538-21552
SETDB1 616-630 10762-10776 20943-20957 STK11 1230-1244 11375-11389 21553-21567
FCER1G 631-645 10777-10791 20958-20972 JTB 1245-1259 11390-11404 21568-21582
KLK4 646-660 10792-10806 20973-20987 ADCY9 1260-1274 11405-11419 21583-21597
TAF7 1305-1319 11450-11464 21628-21642 ZNF688 1275-1289 11420-11434 21598-21612
CXXC1 1320-1334 11465-11479 21643-21657 JAG1 1290-1304 11435-11449 21613-21627
MIR6893 1335-1349 11480-11494 21658-21672 MYOM2 1950-1964 12095-12109 22269-22283
VEGFD 1350-1364 11495-11509 21673-21687 LRRC71 1965-1979 12110-12124 22284-22298
SETD1A 1365-1379 11510-11524 21688-21702 BMPER 1980-1994 12125-12139 22299-22313
PMAIP1 1380-1394 11525-11539 21703-21717 P4HTM 1995-2009 12140-12154 22314-22328
USP46 1395-1409 11540-11554 21718-21732 TXNL1 2010-2024 12155-12169 22329-22343
MIR1471 1410-1424 11555-11569 21733-21743 B9D2 2025-2039 12170-12184 22344-22358
FGFBP2 1425-1439 11570-11584 21744-21758 AHRR 2040-2054 12185-12199 22359-22373
CHAD 1440-1454 11585-11599 21759-21773 OR6A2 2055-2069 12200-12214 22374-22388
KCNC3 1455-1469 11600-11614 21774-21788 HOXA13 2070-2084 12215-12229 22389-22403
SCX 1470-1484 11615-11629 21789-21803 USP39 2085-2099 12230-12244 22404-22418
SOX17 1485-1499 11630-11644 21804-21818 FKBP1B 2100-2114 12245-12259 22419-22433
RALGAPA1 1500-1514 11645-11659 21819-21833 SBSPON 2115-2129 12260-12274 22434-22448
NKX2-3 1515-1529 11660-11674 21834-21848 RPIA 2130-2144 12275-12289 22449-22463
OR2C3 1530-1544 11675-11689 21849-21863 PRDM13 2145-2159 12290-12304 22464-22478
KMT2D 1545-1559 11690-11704 21864-21878 ENO2 2160-2174 12305-12319 22479-22493
FRMD8 1560-1574 11705-11719 21879-21893 ANGPT1 2175-2189 12320-12334 22494-22508
IFNA8 1575-1589 11720-11734 21894-21908 BNIP1 2190-2204 12335-12349 22509-22523
CDYL2 1590-1604 11735-11749 21909-21923 B3GNT4 2205-2219 12350-12364 22524-22538
COL7A1 1605-1619 11750-11764 21924-21938 HLA-F 2220-2234 12365-12379 22539-22553
CLDN1 1620-1634 11765-11779 21939-21953 GSE1 2235-2249 12380-12394 22554-22568
SSX2 1635-1649 11780-11794 21954-21968 RASGEF1B 2250-2264 12395-12409 22569-22583
KLHL20 1650-1664 11795-11809 21969-21983 PCSK1N 2265-2279 12410-12424 22584-22598
ATP13A1 1665-1679 11810-11824 21984-21998 RAB11FIP1 2280-2294 12425-12439 22599-22613
EGLN3 1680-1694 11825-11839 21999-22013 POLDIP3 2295-2309 12440-12454 22614-22628
CREBZF 1695-1709 11840-11854 22014-22028 MIR190A 2310-2324 12455-12469 22629-22362
RBM10 1710-1724 11855-11869 22029-22043 TPSD1 2325-2339 12470-12484 22633-22647
COMP 1725-1739 11870-11884 22044-22058 RHBDF1 2340-2354 12485-12499 22648-22662
PTCHD4 1740-1754 11885-11899 22059-22073 CHD7 2355-2369 12500-12514 22663-22677
RIT2 1755-1769 11900-11915 22074-22088 KLF9 2370-2381 12515-12529 22678-22692
ALX4 1770-1784 11915-11929 22089-22103 METTL22 2382-2396 12530-12544 22693-22707
IL17D 1785-1799 11930-11944 22104-22118 AURKB 2397-2411 12545-12559 22708-22722
AMN1 1800-1814 11945-11959 22119-22133 TSHZ1 2412-2426 12560-12574 22723-22737
MIR378J 1815-1829 11960-11974 22134-22148 FLT3 2427-2441 12575-12589 22738-22752
NF2 1830-1844 11975-11989 22149-22163 HNF1A 2442-2456 12590-12604 22753-22767
INF2 1845-1859 11990-12004 22164-22178 DISP2 2457-2471 12605-12619 22768-22782
SLC26A10P 1860-1874 12005-12019 22179-22193 OTUD7B 2472-2486 12620-12634 22783-22797
FBXO5 1875-1889 12020-12034 22194-22208 SLC7A4 2487-2501 12635-12649 22798-22812
FBXO11 1890-1904 12035-12049 22209-22223 POLR2F 2502-2516 12650-12664 22813-22827
ZNF395 1905-1919 12050-12064 22224-22238 USF1 2517-2531 12665-12679 22828-22842
EEF2K 1920-1934 12065-12079 22239-22253 LRP10 2532-2546 12680-12694 22843-22857
NMRK2 1935-1949 12080-12094 22254-22268 KLF1 2547-2561 12695-12709 22858-22872
HAPSTR1 2592-2606 12740-12754 22903-22917 REPIN1 2562-2576 12710-12724 22873-22887
MIR6803 2607-2621 12755-12769 22918-22932 VSTM2A 2577-2591 12725-12739 22888-22902
ELFN2 2622-2636 12770-12784 22933-22947 FAM25C 3236-3250 13385-13399 23548-23562
MBTPS1 2637-2651 12785-12799 22948-22962 COX6A2 3251-3265 13400-13414 23563-23577
ALPK1 2652-2666 12800-12814 22963-22977 HUWE1 3266-3280 13415-13429 23578-23592
RBP5 2667-2681 12815-12829 22978-22992 MIR6857 3281-3295 13430-13444 23593-23607
CARD6 2682-2696 12830-12844 22993-23007 CRHR2 3296-3310 13445-13459 23608-23622
BRAT1 2697-2711 12845-12859 23008-23022 UHRF1 3311-3325 13460-13474 23623-23637
TRIM10 2712-2726 12860-12874 23023-23037 SPSB4 3326-3340 13475-13489 23638-26352
SH3BP5L 2727-2741 12875-12889 23038-23052 NOTCH1 3341-3355 13490-13504 23653-23667
SUDS3 2742-2756 12890-12904 23053-23067 NRL 3356-3370 13505-13519 23668-23682
THOC6 2757-2771 12905-12919 23068-20382 SSTR1 3371-3385 13520-13534 23683-23697
PCDHA12 2772-2785 12920-12934 23083-23097 GTF3C1 3386-3400 13535-13549 23698-23712
AREG 2786-2800 12935-12949 23098-23112 ITLN1 3401-3415 13550-13564 23713-23727
GSC 2801-2815 12950-12964 23113-23127 KCNIP3 3416-3430 13565-13579 23728-23742
TEX264 2816-2830 12965-12979 23128-23142 ZSWIM8 3431-3445 13580-13594 23743-23757
KDM4D 2831-2845 12980-12994 23143-23157 CPEB1 3446-3460 13595-13609 23758-23772
OTUD7A 2846-2860 12995-13009 23158-23172 OR52B4 3461-3473 13610-13624 23773-23787
ENTPD1 2861-2875 13010-13024 23173-23187 KCNV1 3474-3488 13625-13639 23788-23802
ARMC5 2876-2890 13025-13039 23188-23202 SLC35C2 3489-3503 13640-13654 23803-23817
IL27 2891-2905 13040-13054 23203-23217 KRTAP19-7 3504-3516 13655-13669 23818-23832
SLC16A9 2906-2920 13055-10369 23218-23232 SERPINC1 3517-3531 13670-13684 23833-23847
CYP7A1 2921-2935 13070-13084 23233-23247 SLC4A8 3532-3546 13685-13699 23848-23862
TBC1D10B 2936-2950 13085-13099 23248-23262 FMNL1 3547-3561 13700-13714 23863-23877
TUBA3C 2951-2965 13100-13114 23263-23277 ZMYND19 3562-3576 13715-13729 23878-23892
MED30 2966-2980 13115-13129 23278-23292 PCNX3 3577-3591 13730-13744 23893-23907
ALDH2 2981-2995 13130-13144 23293-23307 RBM47 3592-3606 13745-13759 23908-23922
CCR9 2996-3010 13145-13159 23308-23322 AKR1C3 3607-3621 13760-13774 23923-23937
MTDH 3011-3025 13160-13174 23323-23337 CD22 3622-3636 13775-13789 23938-23952
CNN2 3026-3040 13175-13189 23338-23352 ADRA2C 3637-3651 13790-13804 23953-23967
CEACAM4 3041-3055 13190-13204 23353-23367 SERPINE1 3652-3666 13805-13819 23968-23982
CLEC19A 3056-3070 13205-13219 23368-23382 POU3F2 3667-3681 13820-13834 23983-23997
TRPS1 3071-3085 13220-13234 23383-23397 CEACAM1 3682-3696 13835-13849 23998-24012
ZNF784 3086-3100 13235-13249 23398-23412 TCEA1 3697-3711 13850-13864 24013-24027
NMUR1 3101-3115 13250-13264 23413-23427 SPPL3 3712-3726 13865-13879 24028-24042
MTFR1 3116-3130 13265-13279 23428-23442 RAI14 3727-3741 13880-13894 24043-24057
DOCK10 3131-3145 13280-13294 23443-23457 NR2E1 3742-3756 13895-13909 24058-24072
GPR135 3146-3160 13295-13309 23458-23472 GLYR1 3757-3771 13910-13924 24073-24087
MROH8 3161-3175 13310-13324 23473-23487 B3GNTL1 3772-3786 13925-13939 24088-24102
PLPPR3 3176-3190 13325-13339 23488-23502 ZBTB20 3787-3801 13940-13954 24103-24117
NRM 3191-3205 13340-13354 23503-23517 BICDL2 3802-3816 13955-13969 24118-24132
TNIP2 3206-3220 13355-13369 23518-23532 ITGB1 3817-3831 13970-13984 24133-24147
WFDC10A 3221-3235 13370-13384 23533-23547 LTBP1 3832-3846 13985-13999 24148-24162
HEATR9 3877-3891 14030-14044 24193-24207 THBS4 3847-3861 14000-14014 24163-24177
ZNE511 3892-3906 14045-14059 24208-24222 TBC1D25 3862-3876 14015-14029 24178-24192
MED16 3907-3921 14060-14074 24223-24237 G6PC3 4520-4534 14675-14689 24838-24852
PCDHGA9 3922-3935 14075-14089 24238-24252 RBBP8NL 4535-4549 14690-14704 24853-24867
PRR15 3936-3950 14090-14104 24253-24267 DTYMK 4550-4564 14705-14719 24868-24882
MIR6752 3951-3965 14105-14119 24268-24282 HCLS1 4565-4579 14720-14734 24883-24897
ZNF837 3966-3980 14120-14134 24283-24297 MRPS26 4580-4594 14735-14749 24898-24912
PARP4 3981-3995 14135-14149 24298-24312 CYCS 4595-4609 14750-14764 24913-24927
HSPBP1 3996-4010 14150-14164 24313-24327 BLCAP 4610-4624 14765-14779 24928-24942
TRIM56 4011-4025 14165-14179 24328-24342 BRDT 4625-4639 14780-14794 24943-24957
LYZL1 4026-4040 14180-14194 24343-24357 DDX60 4640-4654 14795-14809 24958-24972
CREB3L2 4041-4055 14195-14209 24358-24372 CNN1 4655-4669 14810-14824 24973-24987
GJB6 4056-4070 14210-14224 24373-24387 TNNC1 4670-4684 14825-14839 24988-25002
FSCN2 4071-4085 14225-14239 24388-24402 EQTN 4685-4699 14840-14854 25003-25017
PDIK1L 4086-4100 14240-14254 24403-24417 HPS6 4700-4714 14855-14869 25018-25032
MIR7109 4101-4115 14255-14269 24418-24432 RNASEH2A 4715-4729 14870-14884 25033-25047
ACKR2 4116-4129 14270-14284 24433-24447 NRDC 4730-4744 14885-14899 25048-25062
TMIE 4130-4144 14285-14299 24448-24462 SSH1 4745-4759 14900-14914 25063-20577
KIF1A 4145-4159 14300-14314 24463-24477 ADGRG4 4760 25078-25092
IRF8 4160-4174 14315-14329 24478-24492 CSMD2 4761-4775 14915-14928 25093-25107
NLRP11 4175-4189 14330-14344 24493-24507 ABHD5 4776-4790 14930-14944 25108-25122
ATP8A1 4190-4204 14345-14359 24508-24522 DNASE1L3 4791-4805 14945-14959 25123-25137
DDT 4205-4219 14360-14374 24523-24537 PUM1 4806-4820 14960-14974 25138-25152
CKMT2 4220-4234 14375-14389 24538-24552 PPP2R2C 4821-4835 14975-14989 25153-25167
ACSM3 4235-4249 14390-14404 24553-24567 VPS72 4836-4850 14990-15004 25168-25182
STRAP 4250-4264 14405-14419 24568-24582 CGNL1 4851-4865 15005-15019 25183-25197
MIR6850 4265-4279 14420-14434 24583-24597 ACAD9 4866-4880 15020-15034 25198-25212
CEBPE 4280-4294 14435-14449 24598-24612 ASNS 4881-4895 15035-15049 25213-25227
PRPF4B 4295-4309 14450-14464 24613-24627 NAT14 4896-4910 15050-15064 25228-25242
GSDME 4310-4324 14465-14479 24628-24642 MRGBP 4911-4925 15065-15079 25243-25257
UBQLN3 4325-4339 14480-14494 24643-24657 MRPS18A 4926-4940 15080-15094 25258-25272
IQCF2 4340-4354 14495-14509 24658-24672 PRR20A 4941-4955 15095-15109 25273-24287
UBE2J2 4355-4369 14510-14524 24673-24687 MYCBPAP 4956-4970 15110-15124 25288-25302
INSL3 4370-4384 14525-14539 24688-24702 SAC3D1 4971-4985 15125-15139 25303-25317
RILPL2 4385-4399 14540-14554 24703-24717 SRSF10 4986-5000 15140-15154 25318-25332
HDAC3 4400-4414 14555-14569 24718-24732 MIR6878 5001-5013 15155-15169 25333-25339
PMPCA 4415-4429 14570-14584 24733-24747 FLCN 5014-5028 15170-15184 25340-25354
RFC2 4430-4444 14585-14599 24748-24762 MYBPHL 5029-5043 15185-15199 25355-25369
HID1 4445-4459 14600-14614 24763-24777 ZNG1A 5044-5058 15200-15214 25370-25384
RETREG3 4460-4474 14615-14629 24778-24792 OR5AR1 5059-5072 15215-15229 25385-25399
GRSF1 4475-4489 14630-14644 24793-24807 HUS1 5073-5087 15230-15244 25400-25414
HADHB 4490-4504 14645-14659 24808-24822 COL6A1 5088-5102 15245-15259 25415-25429
NDUFA6 4505-4519 14660-14674 24823-24837 SASS6 5103-5117 15260-15274 25430-25444
ELSPBP1 5148-5162 15305-15319 25474-25488 MIR6129 5118-5132 15275-15289 25445-25458
GCGR 5163-5177 15320-15334 25489-25503 PELO 5133-5147 15290-15304 25459-25473
RAB4B 5178-5192 15335-15349 25504-25518 ZZZ3 5808-5822 15965-15979 26132-26146
SLC13A2 5193-5207 15350-15364 25519-25533 SUMO4 5823-5837 15980-15994 26147-26161
MIR6825 5208-5222 15365-15379 25534-25548 HSF1 5838-5852 15995-16009 26162-26176
NEK9 5223-5237 15380-15394 25549-25563 SHOX2 5853-5867 16010-16024 26177-26191
CYB5D1 5238-5252 15395-15409 25564-25578 PSME3 5868-5882 16025-16039 26192-26206
MAP1LC3B 5253-5267 15410-15424 25579-25593 TOR1A 5883-5897 16040-16054 26207-26221
ZNF829 5268-5282 15425-15439 25594-25608 MKLN1 5898-5912 16055-16069 26222-26236
INSIG1 5283-5297 15440-15454 25609-25623 MROH2B 5913-5927 16070-16084 26237-26251
BLOC1S6 5298-5312 15455-15469 25624-25638 MRPL18 5928-5942 16085-16099 26252-26266
NAA38 5313-5327 15470-15484 25639-25653 SP6 5943-5957 16100-16114 26267-26281
TMX1 5328-5342 15485-15499 25654-25668 FUCA1 5958-5972 16115-16129 26282-26296
GIMAP8 5343-5257 15500-15514 25669-25683 DNAAF10 5973-5987 16130-16144 26297-26311
TARS2 5358-5372 15515-15529 25684-25698 WDR44 5988-6002 16145-16159 26312-26326
PTPRR 5373-5287 15530-15544 25699-25713 TBCD 6003-6017 16160-16174 26327-26341
ZNF654 5388-5402 15545-15559 25714-25728 SAYSD1 6018-6032 16175-16189 26342-26356
DNAH11 5403-5417 15560-15574 25729-25743 ATG3 6033-6047 16190-16204 26357-26371
CFAP36 5418-5432 15575-15589 25744-25758 CC2D1B 6048-6062 16205-16219 26372-26386
EIF4B 5433-5447 15590-15604 25759-25773 ZMPSTE24 6063-6077 16220-16234 26387-26401
EMC9 5448-5462 15605-15619 25774-25788 KLK9 6078-6092 16235-16249 26402-26416
HIGD1A 5463-5477 15620-15634 25789-25803 NBPF4 6093-6107 16250-16264 26417-26431
KMT2B 5478-5492 15635-15649 25804-25818 CCZ1B 6108-6122 16265-16279 26432-26446
SPTBN5 5493-5507 15650-15664 25819-25833 ODAD4 6123-6137 16280-16294 26447-26461
SCYL1 5508-5522 15665-15679 25834-25848 SYT13 6138-6152 16295-16309 26462-26476
TMEM199 5523-5537 15680-15694 25849-25863 ZFR 6153-6167 16310-16324 26477-26491
PNPT1 5538-5552 15695-15709 25864-25878 STK40 6168-6182 16325-16339 26492-26506
RBBP4 5553-5567 15710-15724 25879-25893 RASGEF1C 6183-6197 16340-16354 26507-26521
TBX21 5568-5582 15725-15739 25894-25908 NPRL2 6198-6212 16355-16369 26522-26536
ZRSR2 5583-5597 15740-15754 25909-25923 CTAGE4 6213-6227 16370-16384 26537-26551
LHX9 5598-5612 15755-15769 25924-25938 NAA10 6228-6242 16385-16399 26552-26566
HPCA 5613-5642 15770-15799 25939-25968 CSTF2 6243-6257 16400-16414 26567-26581
ORC5 5643-5657 15800-15814 25969-25983 NDUFAF3 6258-6272 16415-16429 26582-26596
CCDC172 5658-5672 15815-15829 25984-25998 RASL10B 6273-6287 16430-16444 26597-26611
CDC14A 5673-5687 15830-15844 25999-26013 UNC13C 6288-6302 16445-16459 26612-26626
ANGPTL6 5688-5702 15845-15859 26014-26028 WASHC1 6303-6317 16460-16474 26627-26641
RFC5 5703-5717 15860-15874 26029-26043 C16orf87 6318-6332 16475-16489 26642-26656
NSUN2 5718-5732 15875-15889 26044-26058 TVP23B 6333-6347 16490-16504 26657-26671
SLC25A12 5733-5747 15890-15904 26059-26073 TM4SF5 6348-6362 16505-16519 26672-26686
MIR6760 5748-5762 15905-15919 26074-26086 LSM11 6363-6377 16520-16534 26687-26701
RPS28 5763-5777 15920-15934 26087-26101 ATP11A 6378-6392 16535-16549 26702-26716
TMEM9B 5778-5792 15935-15949 26102-26116 CIDEB 6393-6407 16550-16564 26717-26731
NAA25 5793-5807 15950-15964 26117-26131 VPS18 6408-6422 16565-16579 26732-26746
H2AC16 6453-6467 16610-16624 26777-26791 FAM120A 6423-6437 16580-16594 26747-26761
NEDD8 6468-6482 16625-16639 26792-26806 PIGN 6438-6452 16595-16609 26762-26776
CFLAR 6483-6492 16640-16654 26807-26821 SYDE2 7090-7104 17254-17268 27422-27436
LRRC2 6493-6507 16655-16669 26822-26836 ASCL3 7105-7119 17269-17283 27437-27451
CCND1 6508-6522 16670-16684 26837-26851 SPATA21 7120-7134 17284-17298 27452-27466
MTMR2 6523-6537 16685-16699 26852-26866 PNPLA2 7135-7149 17299-17313 27467-27481
CTPS1 6538-6552 16700-16714 26867-26881 SULT1A4 7150-7164 17314-17328 27482-27496
RPLPO 6553-6567 16715-16729 26882-26896 FOXF1 7165-7179 17329-17343 27497-27511
NKAIN4 6568-6582 16730-16744 26897-26911 ADSS2 7180-7194 17344-17358 27512-27526
NOL10 6583-6597 16745-16759 26912-26926 ALYREF 7195-7209 17359-17373 27527-27541
MT1G 6598-6612 16760-16774 26927-26941 FDFT1 7210-7224 17374-17388 27542-27556
DUSP7 6613-6627 16775-16789 26942-26956 GABRB3 7225-7239 17389-17403 27557-27571
TRIR 6628-6642 16790-16804 26957-26971 MRGPRX3 7240-7254 17404-17418 27572-27586
HINT1 6643-6657 16805-16819 26972-26986 UNC45A 7255-7269 17419-17433 27587-27601
AGMO 6658-6672 16820-16834 26987-27001 HABP4 7270-7284 17434-17448 27602-27616
DAGLA 6673-6687 16835-16849 27002-27016 IRAG1 7285-7299 17449-17463 27617-27631
LRRC39 6688-6699 16850-16864 27017-27031 USP10 7300-7314 17464-17478 27632-27646
TRIM47 6700-6914 16865-16879 27032-27046 SPACA9 7315-7329 17479-17493 27647-27662
CATSPER3 6715-6729 16880-16894 27047-27061 VCAM1 7330-7344 17494-17508 27662-27676
CD151 6730-6744 16895-16909 27062-27076 ECM2 7345-7359 17509-17519 27677-27691
PSD4 6745-6759 16910-16924 27077-27091 GINS3 7360-7374 17520-17534 27692-27706
RNF17 6760-6774 16925-16939 27092-27106 ILK 7375-7389 17535-17549 27707-27721
IST1 6775-6789 16940-16954 27107-27121 COG4 7390-7404 17550-17564 27722-27736
TMPPE 6790-6804 16955-16969 27122-27136 KLHL1 7405-7419 17565-17579 27737-27751
FBXL3 6805-6819 16970-16984 27137-27151 HECW1 7420-7434 17580-17594 27752-27766
CD3G 6820-6834 16985-16999 27152-27166 GPR171 7435-7443 17595-17609 27767-27781
ZNF420 6835-6849 17000-17014 27167-27181 MTRNR2L1 7444-7458 17610-17624 30530-30531
LHFPL1 6850-6864 17015-17029 27182-27196 IFNW1 7459-7473 17625-17639 27782-27796
SOX9 6865-6879 17030-17044 27197-27211 MIR590 7474-7488 17640-17654 27797-27799
RSRC2 6880-6894 17045-17059 27212-27226 SSU72 7489-7503 17655-17669 27800-27814
CAMK1 6895-6909 17060-17074 27227-27241 MST1L 7504-7518 17670-17684 27815-27829
C2CD2L 6910-6924 17075-17089 27242-27256 TNFRSF13C 7519-7533 17685-17699 27830-27844
PHF2 6925-6939 17090-17104 27257-27271 MIR1243 7534-7546 17700-17714 27845-27851
CPSF3 6940-6954 17105-17119 27272-27286 SYNCRIP 7547-7561 17715-17729 27852-27866
MYH4 6955-6969 17120-17133 27287-27301 OR4C46 7562-7573 17730-17744 27867-27881
KLHDC4 6970-6984 17134-17148 27302-27316 NLRP13 7574-7583 17745-17759 27882-27896
DXO 6985-6999 17149-17163 27317-27331 SEC62 7584-7598 17760-17774 27897-27911
FCHO2 7000-7014 17164-17178 27332-27346 H4C11 7599-7613 17775-17789 27912-27926
RHOA 7015-7029 17179-17193 27347-27361 HTR3A 7614-7628 17790-17804 27927-27941
MIR1199 7030-7044 17194-17208 27362-27376 PAFAH1B2 7629-7643 17805-17819 27942-27956
FBXO10 7045-7059 17209-17223 27377-27391 DTNA 7644-7658 17820-17834 27957-27971
PROCA1 7060-7074 17224-17238 27392-27406 CTNNBL1 7659-7673 17835-17849 27972-27986
IGSF5 7075-7089 17239-17253 27407-27421 TGIF1 7674-7688 17850-17864 27987-28001
ZMYND8 7719-7733 17895-17909 28032-28046 RPN1 7689-7703 17865-17879 28002-28016
MEF2B 7734-7748 17910-17924 28047-28061 RBP2 7704-7718 17880-17894 28017-28031
CYBSD2 7749-7763 17925-17939 28062-28076 NAALADL1 8337-8351 18531-18545 28669-28683
GPR141 7764-7778 17940-17954 28077-28091 IFT43 8352-8366 18546-18560 28684-28698
RCN3 7779-7793 17955-17969 28092-28106 EMC6 8367-8381 18561-18575 28699-28713
TCF19 7794-7808 17970-17984 28107-28121 ZACN 8382-8396 18576-18590 28714-28728
TMEM217 7809-7823 17985-17999 28122-28136 DHX34 8397-8411 18591-18605 28729-28743
RAD9A 7824-7838 18000-18014 28137-28151 TARP 8412-8426 18606-18620 28744-28758
KANSL1 7839-7853 18015-18029 28152-28166 FRAT2 8427-8441 18621-18635 28759-28773
OR4F16 7854-7868 18030-18044 28167-28181 FIBIN 8442-8456 18636-18650 28774-28788
DHFR 7869-7883 18045-18059 28182-28196 DLX5 8457-8471 18651-18665 28789-28803
ZNF510 7884-7898 18060-18074 28197-28211 TRMT112 8472-8486 18666-18680 28804-28818
TMEM14EP 7899-7901 18075-18089 28212-28226 MRPS6 8487-8501 18681-18695 28819-28833
TICAM1 7902-7916 18090-18104 28227-28241 GPR85 8502-8516 18696-18710 28834-28848
CACNB2 7917-7931 18105-18119 28242-28256 GRAMD4 8517-8531 18711-18725 28849-28863
TMEM233 7932-7946 18120-18134 28257-28271 PSMD9 8532-8546 18726-18740 28864-28878
PRELID3B 7947-7961 18135-18149 28272-28286 NUDT8 8547-8561 18741-18755 28879-28893
DIDO1 7962-7976 18150-18164 28287-28301 POTEJ 8562-8576 18756-18770 28894-28908
SPG21 7977-7991 18165-18179 28302-28316 ADAM19 8577-8591 18771-18785 28909-28923
MIR6721 7992-8006 18180-18194 28317-28331 SLC9A8 8592-8606 18786-18800 28924-28938
MAJIN 8007-8021 18195-18209 28332-28346 RPL9 8607-8621 18801-18815 28939-28953
GRM5 8022-8036 18210-18224 28347-28361 GUCA2B 8622-8636 18816-18830 28954-28968
OR5A2 8037-8051 18225-18239 28362-28376 PDE4B 8637-8651 18831-18845 28969-28983
SEMA6B 8052-8066 18240-18254 28377-28391 LINC01397 8652-8666 18846-18860 28984-28998
FHDC1 8067-8081 18255-18269 28392-28406 LINC00626 8667-8681 18861-18875 28999-29013
SLC6A20 8082-8096 18270-18284 28407-28421 DNM3 8682-8697 18876-18890 29014-29028
FAM169A 8097-8111 18285-18299 28422-28436 ZBTB41 8697-8712 18891-18905 29029-29043
CFAP77 8112-8126 18300-18314 28437-28451 MTARC1 8712-8726 18906-18920 29044-20958
ARF1 8127-8141 18315-18329 28452-28466 ARID4B 8727-8741 18921-18935 29059-20973
HTN1 8142-8156 18330-18335 28467-28473 TDRD15 8742-8756 18936-18950 29074-29088
MIR6785 8157-8171 18336-18350 28474-28488 DTNB 8757-8771 18951-18965 29089-29103
TESMIN 8172-8186 18351-18365 28489-28503 DPYSL5 8772-8786 18966-18980 29104-29118
SCNN1D 18366-18380 28504-28518 GCKR 8787-8801 18981-18995 29119-29133
C11orf86 8187-8201 18381-18395 28519-28533 MRPL30 8802-8816 18996-19010 29134-29148
DDI2 8202-8216 18396-18410 28534-28548 THSD7B 8817-8831 19011-19025 29149-29163
ZNF568 8217-8231 18411-18425 28549-28563 COBLL1 8832-8846 19026-19040 29164-29178
ADGRE3 8232-8246 18426-18440 28564-28578 MIR4790 8847-8861 19041-19047 29179-29180
PRPF38B 8247-8261 18441-18455 28579-28593 SEMA3F 8862-8876 19048-19062 29181-29195
SFMBT1 8262-8276 18456-18470 28594-28608 CPZ 8877-8891 19063-19077 29196-29210
CAPZB 8277-8291 18471-18485 28609-28623 LINC02494 8892-8901 19078-19092 29211-29225
LIN28B 8292-8306 18486-18500 28624-28638 LNCPRESS2 8902-8916 19093-19107 29226-29240
CNEP1R1 8307-8321 18501-18515 28639-28653 UNC5C 8917-8931 19108-19122 29241-29255
LDAH 8322-8336 18516-18530 28654-28668 ADH1B 8932-8946 19123-19133 29256-29270
CDH12 8977-8991 19164-19178 29301-29315 MTTP 8947-8961 19134-19148 29271-29285
SH3TC2 8992-9006 19179-19193 29316-29330 TRIM2 8962-8976 19149-19163 29286-29300
SLC17A1 9007-9021 19194-19208 29331-29345 C17orf113 9607-9621 19788-19802 29922-29936
HCG15 9022-9036 19209-19223 29346-29360 APOH 9622-9636 19803-19817 29937-29951
TRIM26 9037-9051 19224-19238 29361-29375 INSR 9637-9651 19818-19832 29952-29966
PRRC2A 9052-9066 19239-19253 29376-29390 JUND 9652-9666 19833-19847 29967-29981
HLA-DOA 9067-9081 19254-19268 29391-29405 TM6SF2 9667-9681 19848-19862 29982-29996
MAN1A1 9082-9096 19269-19283 29406-29420 APOE 9682-9696 19863-19877 29997-30011
RSPO3 9097-9111 19284-19298 29421-29435 TMC4 9697-9711 19878-19892 30012-30026
MGC4859 9112-9126 19299-19313 29436-29450 PYGB 9712-9726 19893-19907 30027-30041
AUTS2 9127-9141 19314-19328 29451-29465 CDH4 9727-9741 19908-19922 30042-30056
SEMA3D 9142-9156 19329-19343 29466-29480 ARFRP1 9742-9756 19923-19937 30057-30071
ARPC1B 9157-9171 19344-19358 29481-29495 MAP3K7CL 9757-9771 19938-19952 30072-30086
LINC02237 9172-9186 19359-19374 29496-29510 PNPLA3 9772-9786 19953-19967 30087-30101
TRIBI 9187-9201 19374-19388 29511-29525 ADIPOQ 9787-9801 19968-19982 30102-30116
PTPRD 9202-9216 19389-19403 29526-29540 LIPE 9802-9816 19983-19997 30117-30131
TTC39B 9217-9231 19404-19418 29541-29555 UCP1 9817-9831 19998-20012 30132-30146
BNC2 9232-9246 19419-91433 29556-29570 HSD17B13 9832-9846 20013-20027 30147-30161
MIR12117 9247-9261 19434-19448 29571-29576 MTARC2 9847-9861 20028-20042 30162-30176
GABBR2 9262-9276 19449-19463 29577-29591 MLXIP 9862-9876 20043-20057 30177-30191
TOR1B 9277-9291 19464-19478 29592-29606 LYPLAL1 9877-9891 20058-20072 30192-30206
ABO 9292-9306 19479-19493 29607-29621 TOR1AIP1 9892-9906 20073-20087 30207-30221
PCAT5 9307-9321 19494-19508 29622-29636 MLXIPL 9907-9921 20088-20102 30222-30236
ZNF487 9322-9336 19509-19523 29637-29651 CPT2 9922-9936 20103-20117 30237-30251
KCNMA1-AS1 9337-9351 19524-19538 29652-29666 PPARG 9937-6651 20118-20132 30252-30266
CWF19L1 9352-9366 19539-19553 29667-29681 TOR1AIP2 9952-9966 20133-20147 30267-30281
GPAM 9367-9381 19554-19568 29682-29696 CPT1A 9967-9981 20148-20162 30282-30296
SYCE1 9382-9396 19569-19583 29697-29711 LMNA 9982-9996 20163-20177 30297-30311
MIR100HG 9397-9411 19584-19598 29712-29726 ACAA2  9997-10011 20178-20192 30312-30326
PLEKHA5 9412-9426 19599-19613 29727-29741 SUN1 10012-10026 20193-20207 30327-30341
SLCO1A2 9427-9441 19614-19622 29742-29756 GRB14 10027-10041 20208-20222 30342-30356
LINC02426 9442-9456 19623-19637 29757-29771 TMPO 10042-10056 20223-20237 30357-30371
SLC6A15 9457-9471 19638-19652 29772-29786 HSD17B11 10057-10071 20238-20252 30372-30386
LINC02392 9472-9486 19653-19667 29787-29801 ERLIN1 10072-10086 20253-20267 30387-30401
NEDD1 9487-9501 19668-19682 29802-29816 PRKAA1 10087-10101 20268-20282 30402-30416
ZNF664- 9502-9516 19683-19698 29817-29831 FASN 10102-10116 20283-20297 30417-30431
RFLNA SERPINA1 10117-10131 20298-20312 30432-30446
DLEU1 9517-9531 19698-19712 29832-29846 APOB 10132-10146 20313-20327 30447-30461
ARGLU1 9532-9546 19713-19727 29847-29861 MIR4792 30462-30465
PRKD1 9547-9561 19728-19742 29862-29876 MIR4264 30466-30469
HCN4 9562-9576 19743-19757 29877-29891 LOC100289187 30470-30472
FTO 9577-9591 19758-19772 29892-29906 MIR5093 30473-30476
MYO15A 9592-9606 19773-19787 29907-29921 MIR3180-1 30477-30480
MIR34c 30489-30492 MIR5000 30481-30484
LOC100287534 30493-30495 MIR193b 30485-30488
MIR3678 30496-30499 MIR4461 30514-30517
MIR3159 30500-30503 MIR5701-1 30518-30521
MIR4673 30504-30507 MIR4787 30522-30525
LOC283403 30508-30510 MIR1224 30526-30529
LOC100287399 30511-30513 MIR4728 30532-30535
MIR101-1 30536-30539

REFERENCES

  • 1. Lazo M, Hernaez R, Eberhardt M S, et al. Prevalence of nonalcoholic fatty liver disease in the United States: the Third National Health and Nutrition Examination Survey, 1988-1994. Am J Epidemiol 2013; 178:38-45.
  • 2. Portillo Sanchez P, Bril F, Maximos M, et al. High Prevalence of Nonalcoholic Fatty Liver Disease in Patients with Type 2 Diabetes Mellitus and Normal Plasma Aminotransferase Levels. J Clin Endocrinol Metab 2014; 100.
  • 3. Crespo J, Fernandez-Gil P, Hernandez-Guerra M, et al. Are there predictive factors of severe liver fibrosis in morbidly obese patients with non-alcoholic steatohepatitis? Obes Surg 2001; 11:254-7.
  • 4. Younossi Z M, Blissett D, Blissett R, et al. The economic and clinical burden of nonalcoholic fatty liver disease in the United States and Europe. Hepatology 2016; 64:1577-1586.
  • 5. Romeo S, Kozlitina J, Xing C, et al. Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat Genet 2008; 40:1461-5.
  • 6. Speliotes E K, Yerges-Armstrong L M, Wu J, et al. Genome-wide association analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct effects on metabolic traits. PLOS Genet 2011; 7: e1001324.
  • 7. Luukkonen P K, Juuti A, Sammalkorpi H, et al. MARC1 variant rs2642438 increases hepatic phosphatidylcholines and decreases severity of non-alcoholic fatty liver disease in humans. J Hepatol 2020; 73:725-726.
  • 8. Parisinos C A, Wilman H R, Thomas E L, et al. Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis. J Hepatol 2020; 73:241-251.
  • 9. Middleton M S, Heba E R, Hooker C A, et al. Agreement Between Magnetic Resonance Imaging Proton Density Fat Fraction Measurements and Pathologist-Assigned Steatosis Grades of Liver Biopsies From Adults With Nonalcoholic Steatohepatitis. Gastroenterology 2017; 153:753-761.
  • 10. Saadeh S, Younossi Z M, Remer E M, et al. The utility of radiological imaging in nonalcoholic fatty liver disease. Gastroenterology 2002; 123:745-50.
  • 11. Harris T B, Launer L J, Eiriksdottir G, et al. Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. Am J Epidemiol 2007; 165:1076-87.
  • 12. Regan E A, Hokanson J E, Murphy J R, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 2010; 7:32-43.
  • 13. Carr J J, Nelson J C, Wong N D, et al. Calcified coronary artery plaque measurement with cardiac C T in population-based studies: standardized protocol of Multi-Ethnic Study of

Atherosclerosis (MESA) and Coronary Artery Risk Development in Young Adults (CARDIA) study. Radiology 2005; 234:35-43.

  • 14. Speliotes E K, Massaro J M, Hoffmann U, et al. Liver fat is reproducibly measured using computed tomography in the Framingham Heart Study. J Gastroenterol Hepatol 2008; 23:894-9.
  • 15. Daniels P R, Kardia S L, Hanis C L, et al. Familial aggregation of hypertension treatment and control in the Genetic Epidemiology Network of Arteriopathy (GENOA) study. Am J Med 2004; 116:676-81.
  • 16. Palmer N D, Goodarzi M O, Langefeld C D, et al. Genetic Variants Associated With Quantitative Glucose Homeostasis Traits Translate to Type 2 Diabetes in Mexican Americans; The GUARDIAN (Genetics Underlying Diabetes in Hispanics) Consortium. Diabetes 2015; 64:1853-66.
  • 17. Liu J, Musani S K, Bidulescu A, et al. Fatty liver, abdominal adipose tissue and atherosclerotic calcification in African Americans: the Jackson Heart Study. Atherosclerosis 2012; 224:521-5.
  • 18. Kramer H, Han C, Post W, et al. Racial/ethnic differences in hypertension and hypertension treatment and control in the multi-ethnic study of atherosclerosis (MESA). Am J Hypertens 2004; 17:963-70.
  • 19. Rampersaud E, Bielak L F, Parsa A, et al. The association of coronary artery calcification and carotid artery intima-media thickness with distinct, traditional coronary artery disease risk factors in asymptomatic adults. Am J Epidemiol 2008; 168:1016-23.
  • 20. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature Genetics 2018; 50:1593-1599.
  • 21. K. He X Z, S. Ren and J. Sun. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV 2016:770-778.
  • 22. Namjou B, Lingren T, Huang Y, et al. GWAS and enrichment analyses of non-alcoholic fatty liver disease identify new trait-associated genes and pathways across eMERGE Network. BMC Med 2019; 17:135.
  • 23. Chen V L, Chen Y, Du X, et al. Genetic variants that associate with cirrhosis have pleiotropic effects on human traits. Liver Int 2020; 40:405-415.
  • 24. Willer C J, Li Y, Abecasis G R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26:2190-2191.
  • 25. Zhou W, Nielsen J B, Fritsche L G, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet 2018; 50:1335-1341.
  • 26. Dongiovanni P, Valenti L, Rametta R, et al. Genetic variants regulating insulin receptor signalling are associated with the severity of liver damage in patients with non-alcoholic fatty liver disease. Gut 2010; 59:267-73.
  • 27. Feitosa M F, Wojczynski M K, North K E, et al. The ERLIN1-CHUK-CWF19L1 gene cluster influences liver fat deposition and hepatic inflammation in the NHLBI Family Heart Study. Atherosclerosis 2013; 228:175-80.
  • 28. Chalasani N, Guo X, Loomba R, et al. Genome-wide association study identifies variants associated with histologic features of nonalcoholic Fatty liver disease. Gastroenterology 2010; 139:1567-76, 1576 e1-6.
  • 29. Eslam M, Hashem A M, Leung R, et al. Interferon-lambda rs12979860 genotype and liver fibrosis in viral and non-viral chronic liver disease. Nat Commun 2015; 6:6422.
  • 30. Wiedmann S, Fischer M, Kochler M, et al. Genetic variants within the LPIN1 gene, encoding lipin, are influencing phenotypes of the metabolic syndrome in humans. Diabetes 2008; 57:209-17.
  • 31. Shang X R, Song J Y, Liu P H, et al. GWAS-Identified Common Variants With Nonalcoholic Fatty Liver Disease in Chinese Children. J Pediatr Gastroenterol Nutr 2015; 60:669-74.
  • 32. Petta S, Grimaudo S, Camma C, et al. IL28B and PNPLA3 polymorphisms affect histological liver damage in patients with non-alcoholic fatty liver disease. J Hepatol 2012:56:1356-62.
  • 33. Kitamoto T, Kitamoto A, Yoneda M, et al. Genome-wide scan revealed that polymorphisms in the PNPLA3, SAMM50, and PARVB genes are associated with development and progression of nonalcoholic fatty liver disease in Japan. Hum Genet 2013; 132:783-92.
  • 34. Anstee Q M, Darlay R, Cockell S, et al. Genome-wide association study of non-alcoholic fatty liver and steatohepatitis in a histologically characterised cohort ( ) J Hepatol 2020; 73:505-515.
  • 35. Mancina R M, Dongiovanni P, Petta S, et al. The MBOAT7-TMC4 Variant rs641738 Increases Risk of Nonalcoholic Fatty Liver Disease in Individuals of European Descent. Gastroenterology 2016; 150:1219-1230 e6.
  • 36. Ma Y, Belyaeva O V, Brown P M, et al. 17-Beta Hydroxysteroid Dehydrogenase 13 Is a Hepatic Retinol Dehydrogenase Associated With Histological Features of Nonalcoholic Fatty Liver Disease. Hepatology 2019; 69:1504-1519.
  • 37. Park S L, Li Y, Sheng X, et al. Genome-Wide Association Study of Liver Fat: The Multiethnic Cohort Adiposity Phenotype Study. Hepatol Commun 2020; 4:1112-1123.
  • 38. Chen V L, Du X, Chen Y, et al. Genome-wide association study of serum liver enzymes implicates diverse metabolic and liver pathology. Nat Commun 2021; 12:816.
  • 39. Neale B. http://www.nealelab.is/uk-biobank/.
  • 40. Charrad M, Ghazzali N, Boiteau V, et al. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software 2014; 61:1-36.
  • 41. Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015; 31:3718-20.
  • 42. Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 2018; 7.
  • 43. Lawlor D A, Harbord R M, Sterne J A, et al. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 2008; 27:1133-63.
  • 44. Pers T H, Karjalainen J M, Chan Y, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 2015; 6:5890.
  • 45. Andri S. DescTools: Tools for Descriptive Statistics. R package version 0.99.40, 2021.
  • 46. Lumley T, Brody J, Dupuis J, et al. Meta-analysis of a rare-variant association test. Stat Tech, University of Auckland 2012.
  • 47. Lee S, Emond M J, Bamshad M J, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 2012; 91:224-37.
  • 48. Yates A D, Achuthan P, Akanni W, et al. Ensembl 2020. Nucleic Acids Res 2020; 48: D682-D688.
  • 49. Zhou W, Zhao Z, Nielsen J B, et al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat Genet 2020; 52:634-639.
  • 50. Kahali B, Chen Y, Feitosa M F, et al. A Noncoding Variant Near PPP1R3B Promotes Liver Glycogen Storage and MetS, but Protects Against Myocardial Infarction. J Clin Endocrinol Metab 2021; 106:372-387.
  • 51. Landgraf K, Scholz M, Kovacs P, et al. FTO Obesity Risk Variants Are Linked to Adipocyte IRX3 Expression and BMI of Children-Relevance of FTO Variants to Defend Body Weight in Lean Children? PLOS One 2016; 11: e0161739.
  • 52. Liberzon A, Subramanian A, Pinchback R, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011; 27:1739-40.
  • 53. Consortium G T. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 2020; 369:1318-1330.
  • 54. Polimanti R, Gelernter J. ADH1B: From alcoholism, natural selection, and cancer to the human phenome. Am J Med Genet B Neuropsychiatr Genet 2018; 177:113-125.
  • 55. Muenter M D, Perry H O, Ludwig J. Chronic vitamin A intoxication in adults. Hepatic, neurologic and dermatologic complications. Am J Med 1971; 50:129-36.
  • 56. Shin J Y, Hernandez-Ono A, Fedotova T, et al. Nuclear envelope-localized torsinA-LAP1 complex regulates hepatic VLDL secretion and steatosis. J Clin Invest 2019; 129:4885-4900.
  • 57. Innes H, Buch S, Hutchinson S, et al. Genome-Wide Association Study for Alcohol-Related Cirrhosis Identifies Risk Loci in MARC1 and HNRNPUL1. Gastroenterology 2020; 159:1276-1289 e7.
  • 58. Xia M, Chandrasekaran P, Rong S, et al. Hepatic Deletion of Mboat7 (Lpiat1) Causes Activation of SREBP-Ic and Fatty Liver. J Lipid Res 2020.
  • 59. Chen Y, Chen C, Ke X, et al. Analysis of circulating cholesterol levels as a mediator of an association between ABO blood group and coronary heart disease. Circ Cardiovasc Genet 2014; 7:43-8.
  • 60. Wolpin B M, Kraft P, Xu M, et al. Variant ABO blood group alleles, secretor status, and risk of pancreatic cancer: results from the pancreatic cancer cohort consortium. Cancer Epidemiol Biomarkers Prev 2010; 19:3140-9.
  • 61. Zhong G C, Liu S, Wu Y L, et al. ABO blood group and risk of newly diagnosed nonalcoholic fatty liver disease: A case-control study in Han Chinese population. PLOS One 2019; 14: e0225792.
  • 62. Chambers J C, Zhang W, Sehmi J, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 2011; 43:1131-8.
  • 63. Kathiresan S, Melander O, Guiducci C, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet 2008; 40:189-97.
  • 64. Beer N L, Tribble N D, McCulloch L J, et al. The P446L variant in GCKR associated with fasting plasma glucose and triglyceride levels exerts its effect through increased glucokinase activity in liver. Hum Mol Genet 2009; 18:4081-8.
  • 65. Ishizuka Y, Nakayama K, Ogawa A, et al. TRIB1 downregulates hepatic lipogenesis and glycogenesis via multiple molecular interactions. J Mol Endocrinol 2014; 52:145-58.
  • 66. Bauer R C, Sasaki M, Cohen D M, et al. Tribbles-1 regulates hepatic lipogenesis through posttranscriptional regulation of C/EBPalpha. J Clin Invest 2015; 125:3809-18.
  • 67. Agius L. Hormonal and Metabolite Regulation of Hepatic Glucokinase. Annu Rev Nutr 2016; 36:389-415.
  • 68. Nakajima S, Tanaka H, Sawada K, et al. Polymorphism of receptor-type tyrosine-protein phosphatase delta gene in the development of non-alcoholic fatty liver disease. J Gastroenterol Hepatol 2018; 33:283-290.
  • 69. Kozlitina J, Smagris E, Stender S, et al. Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease. Nat Genet 2014; 46:352-6.
  • 70. Wang Y, Kory N, BasuRay S, et al. PNPLA3, CGI-58, and Inhibition of Hepatic Triglyceride Hydrolysis in Mice. Hepatology 2019; 69:2427-2441.
  • 71. Palmer N, Kahali B, Kuppa A, et al. Allele Specific Variation at APOE Increases Non-alcoholic Fatty Liver Disease and Obesity but Decreases Risk of Alzheimer's Disease and Myocardial Infarction. 2021.
  • 72. Hannah V C, Ou J, Luong A, et al. Unsaturated fatty acids down-regulate srebp isoforms 1a and 1 by two mechanisms in HEK-293 cells. J Biol Chem 2001; 276:4365-72.
  • 73. Abul-Husn N S, Cheng X, Li A H, et al. A Protein-Truncating HSD17B13 Variant and Protection from Chronic Liver Disease. N Engl J Med 2018; 378:1096-1106.
  • 74. Fox C S, Liu Y, White C C, et al. Genome-wide association for abdominal subcutaneous and visceral adipose reveals a novel locus for visceral fat in women. PLoS Genet 2012; 8: e1002695.
  • 75. Sirwi A, Hussain M M. Lipid transfer proteins in the assembly of apoB-containing lipoproteins. J Lipid Res 2018; 59:1094-1102.
  • 76. Burnett J R, Hooper A J, Hegele R A. Abetalipoproteinemia. In: Adam M P, Ardinger H H, Pagon R A, Wallace S E, Bean LJH, Mirzaa G, Amemiya A, eds. GeneReviews ((R)). Seattle (WA), 1993.

Claims

1. A method comprising: analyzing a biological sample from a subject for ten to one hundred variants, wherein at least ten of the variants are from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

2. The method of claim 1, wherein said at least ten of the variants comprises at least fifteen of the variants from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

3. The method of claim 1, wherein said at least ten of the variants comprises each of the variants from the list rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

4. The method of claim 1, wherein said at least ten of the variants consists of only the variants from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

5. The method of claim 1, wherein said at least ten of the variants consists of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP.

6. The method of any of claims 1-5, wherein said biological sample is obtained from a subject suspected of having nonalcoholic fatty liver disease.

7. The method of any of claims 1-6, wherein said biological sample is selected from the group consisting of blood, serum, plasma, saliva, tissue, hair, semen, and urine.

8. The method of any of claims 1-7, wherein said analyzing comprises directly detecting said variants using a molecule assay.

9. The method of claim 8, wherein the molecule assay is a hybridization assay or a sequencing assay.

10. The method of any of claims 1-9, wherein said analyzing comprises indirectly detecting said variants.

11. The method of claim 10, wherein said indirectly detecting comprises assessing gene expression or detecting a mutation in linkage disequilibrium with a variant.

12. A method of managing nonalcoholic fatty liver disease, comprising:

a) analyzing a biological sample from a subject for at least ten of the variants from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, and mutations in MTTP;

b) generating a fatty liver disease risk score based on the presence or absence of said variants; and

c) treating the subject with a nonalcoholic fatty liver disease intervention if said risk score indicates a predisposition to nonalcoholic fatty liver disease.

13. The method of claim 12, wherein said risk score is calculated using an algorithm that accounts for each of the analyzed variants.

14. The method of claim 12 or 13, wherein said risk score further is based on one or more of blood count, liver enzyme test data, liver function test data, hepatitis A test data, hepatitis C test data, celiac disease screening test data, fasting blood sugar, hemoglobin A1C data, and lipid profile data.

15. The method of any of claims 12-14, wherein said risk score further is based on one or more of abdominal ultrasound data, computerized tomography (CT) scanning data, magnetic resonance imaging (MRI) data, transient elastography data, and magnetic resonance elastography data.

16. The method of any of claims 12-15, wherein said treating comprises applying a weight loss regime.

17. The method of any of claims 12-16, wherein said treating comprises liver transplantation.

18. The method of any of claims 12-17, wherein said treating comprises administration of one or more active agents selected from the group consisting of an essential phospholipid; anti-diabetic agent; a dietary supplement; an antifibrotic agent; an anti-obesity agent; and any combination thereof.

19. A system comprising: a set or reagents that specifically detect ten to one hundred variants, wherein at least ten of the variants are from the list of rs738408, rs58542926, rs429358, rs 1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant of marker in linkage disequilibrium therewith, and mutations in MTTP.

20. The system of claim 19, wherein said reagents comprises one or more primers or probe specific for said variants.

21. The system of claim 19 or 20, wherein said reagents comprising sequence reagents.

22. The system of any of claims 19-21, wherein said reagents comprises a microarray.

23. A non-transitory computer-readable storage medium comprising an instruction, wherein when the instruction is run by at least one computer processor, wherein the at least one processor performs operations comprising: a) receiving data identifying the presence or absence of a variant in a biological sample from at least ten of from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP; b) generating a nonalcoholic fatty acid liver disease risk score from said data; and c) displaying or reporting said risk score.

25. A method of diagnosing fatty liver disease or predisposition to fatty liver disease comprising: analyzing a biological sample from a subject for at least ten variants from the list of rs738408, rs58542926, rs429358, rs1260326, rs28601761, rs4918722, rs2807834, rs7661964, rs1229984, rs7029757, rs17817449, rs79953491, rs112630404, rs626283, rs4561528, rs10756038, rs140201358, or a variant or marker in linkage disequilibrium therewith, and mutations in MTTP.

Resources

Images & Drawings included:

Sources:

Recent applications in this class: