🔗 Share

Patent application title:

METHODS AND PRODUCTS FOR PRODUCING ENGINEERED MAMMALIAN CELL LINES WITH AMPLIFIED TRANSGENES

Publication number:

US20180163233A1

Publication date:

2018-06-14

Application number:

15/783,243

Filed date:

2017-10-13

Abstract:

Methods of inserting genes into defined locations in the chromosomal DNA of cultured mammalian cell lines which are subject to gene amplification are disclosed. In particular, sequences of interest (e.g., genes encoding biotherapeutic proteins) are inserted proximal to selectable genes in amplifiable loci, and the transformed cells are subjected to selection to induce co-amplification of the selectable gene and the sequence of interest. The invention also relates to meganucleases, vectors and engineered cell lines necessary for performing the methods, to cell lines resulting from the application of the methods, and use of the cell lines to produce protein products of interest.

Inventors:

Derek JANTZ 138 🇺🇸 Durham, NC, United States
Michael G. NICHOLSON 50 🇺🇸 Chapel Hill, NC, United States
James Jefferson Smith 98 🇺🇸 Morrisville, NC, United States

Assignee:

PRECISION BIOSCIENCES, INC. 123 🇺🇸 Durham, NC, United States

Interested in similar patents?

Get notified when new applications in this technology area are published.

Create Free Alert

Classification:

C12N15/907 » CPC main

Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor; Recombinant DNA-technology; Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation; Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells

C12N9/93 » CPC further

Enzymes; Proenzymes; Compositions thereof ; Processes for preparing, activating, inhibiting, separating or purifying enzymes Ligases (6)

C12Y603/01002 » CPC further

Ligases forming carbon-nitrogen bonds (6.3); Acid-ammonia (or amine)ligases (amide synthases)(6.3.1) Glutamate-ammonia ligase (6.3.1.2)

C12Y105/01003 » CPC further

Oxidoreductases acting on the CH-NH group of donors (1.5) with NAD+ or NADP+ as acceptor (1.5.1) Dihydrofolate reductase (1.5.1.3)

C12N9/003 » CPC further

Enzymes; Proenzymes; Compositions thereof ; Processes for preparing, activating, inhibiting, separating or purifying enzymes; Oxidoreductases (1.) acting on nitrogen containing compounds as donors (1.4, 1.5, 1.6, 1.7) acting on CH-NH groups of donors (1.5) with NAD or NADP as acceptor (1.5.1) Dihydrofolate reductase [DHFR] (1.5.1.3)

C12Y301/04 » CPC further

Hydrolases acting on ester bonds (3.1) Phosphoric diester hydrolases (3.1.4)

C12N15/90 IPC

Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor; Recombinant DNA-technology; Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation Stable introduction of foreign DNA into chromosome

C12N9/00 IPC

Enzymes; Proenzymes; Compositions thereof ; Processes for preparing, activating, inhibiting, separating or purifying enzymes

C12N9/22 » CPC further

Enzymes; Proenzymes; Compositions thereof ; Processes for preparing, activating, inhibiting, separating or purifying enzymes; Hydrolases (3) acting on ester bonds (3.1) Ribonucleases RNAses, DNAses

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 14/806,175, filed Jul. 22, 2015, which is a continuation of U.S. patent application Ser. No. 14/091,572, filed Nov. 27, 2013, which is a continuation of International Application No. PCT/US2012/040599, filed Jun. 1, 2012, which claims priority to U.S. Provisional application No. 61/492,174 filed Jun. 1, 2011, the disclosures of all of which are hereby incorporated by reference in their entireties for all purposes.

FIELD OF THE INVENTION

The invention relates to the field of molecular biology and recombinant nucleic acid technology. In particular, the invention relates to methods of inserting genes into defined locations in the chromosomal DNA of cultured mammalian cell lines which are subject to gene amplification. The invention also relates to meganucleases, vectors and engineered cell lines necessary for performing the methods, cell lines resulting from the application of the methods, and use of the cell lines to produce protein products of interest.

BACKGROUND OF THE INVENTION

Therapeutic proteins are the primary growth driver in the global pharmaceutical market (Kresse, Eur J Pharm Biopharm 72, 479 (2009)). In 2001, biopharmaceuticals accounted for $24.3 billion in sales. By 2007, this number had more than doubled to $54.5 billion. The market is currently estimated to reach $78 billion by 2012 (Pickering, Spectrum Pharmaceutical Industry Dynamics Report, Decision Resources, Inc., 5 (2008)). This includes sales of “blockbuster” drugs such as erythropoietin, tissue plasminogen activator, and interferon, as well as numerous “niche” drugs such as enzyme replacement therapies for lysosomal storage disorders. The unparalleled growth in market size, however, is driven primarily by skyrocketing demand for fully human and humanized monoclonal antibodies (Reichert, Curr Pharm Biotechnol 9, 423 (2008)). Because they have the ability to confer a virtually unlimited spectrum of biological activities, monoclonal antibodies are quickly becoming the most powerful class of therapeutics available to physicians. Not surprisingly, more than 25% of the molecules currently undergoing clinical trials in the United States and Europe are monoclonal antibodies (Reichert, Curr Pharm Biotechnol 9, 423 (2008)).

Unlike more traditional pharmaceuticals, therapeutic proteins are produced in living cells. This greatly complicates the manufacturing process and introduces significant heterogeneity into product formulations (Field, Recombinant Human IgG Production from Myeloma and Chinese Hamster Ovary Cells, in Cell Culture and Upstream Processing, Butler, ed., (Taylor and Francis Group, New York, 2007)). In addition, protein drugs are typically required at unusually high doses, which necessitates highly scalable manufacturing processes and makes manufacturing input costs a major price determinant. For these reasons, treatment with a typical therapeutic antibody (e.g., the anti-HER2-neu monoclonal Herceptin®) costs $60,000-$80,000 for a full course of treatment (Fleck, Hastings Center Report 36, 12 (2006)). Further complicating the economics of biopharmaceutical production is the fact that many of the early blockbuster biopharmaceuticals are off-patent (or will be off-patent soon) and the US and EU governments are expected to greatly streamline the regulatory approval process for “biogeneric” and “biosimilar” therapeutics (Kresse, Eur J Pharm Biopharm 72, 479 (2009)). These factors should lead to a significant increase in competition for sales of many prominent biopharmaceuticals (Pickering, Spectrum Pharmaceutical Industry Dynamics Report, Decision Resources, Inc., 5 (2008)). Therefore, there is enormous interest in technologies which reduce manufacturing costs of protein therapeutics (Seth et al., Curr Opin Biotechnol 18, 557 (2007)).

Many of the protein pharmaceuticals on the market are glycoproteins that cannot readily be produced in easy-to-manipulate biological systems such as bacteria or yeast. For this reason, recombinant therapeutic proteins are produced almost exclusively in mammalian cell lines, primarily Chinese hamster ovary (e.g., CHO-K1), mouse myeloma (e.g., NSO), baby hamster kidney (BHK), murine C127, human embryonic kidney (e.g., HEK-293), or human retina-derived (e.g., PER-C6) cells (Andersen and Krummen, Curr Opin Biotechnol 13, 117 (2002)). Of these, CHO cells are, by far, the most common platform for bioproduction because they offer the best combination of high protein expression levels, short doubling time, tolerance to a wide range of media conditions, established transfection and amplification protocols, an inability to propagate most human pathogens, a paucity of blocking intellectual property, and the longest track record of FDA approval (Field, Recombinant Human IgG Production from Myeloma and Chinese Hamster Ovary Cells, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)).

Large-market biopharmaceuticals are typically produced in enormous stirred-tank bioreactors containing hundreds of liters of CHO cells stably expressing the protein product of interest (Chu and Robinson, Curr Opin Biotechnol 12, 180 (2001), Coco-Martin and Harmsen, Bioprocess International 6, 28 (2008)). Under optimized industrial conditions, such manufacturing processes can yield in excess of 5g of protein per liter of cells per day (Coco-Martin and Harmsen, Bioprocess International 6, 28 (2008)). Because of the large number of cells involved (˜50 billion cells per liter), the level of protein expression per cell has a very dramatic effect on yield. For this reason, all of the cells involved in the production of a particular biopharmaceutical must be derived from a single “high-producer” clone, the production of which constitutes one of the most time- and resource-intensive steps in the manufacturing process (Clarke and Compton, Bioprocess International 6, 24 (2008)).

The first step in the large-scale manufacture of a biopharmaceutical is the transfection of mammalian cells with plasmid DNA encoding the protein product of interest under the control of a strong constitutive promoter. Stable transfectants are selected by using a selectable marker gene also carried on the plasmid. Most frequently, this marker is a dihydrofolate reductase (DHFR) gene which, when transfected into a DHFR deficient cell line such as DG44, allows for the selection of stable transfectants using media deficient in hypoxanthine. The primary reason for using DHFR as a selectable marker is that it enables a process called “gene amplification”. By growing stable transfectants in gradually increasing concentrations of methotrexate (MTX), a DHFR inhibitor, it is possible to amplify the number of copies of the DHFR gene present in the genome. Because the gene encoding the protein product of interest is physically coupled to the DHFR gene, this results in amplification of both genes with a concomitant increase in the expression level of the therapeutic protein (Butler, Cell Line Development for Culture Strategies: Future Prospects to Improve Yields, in Cell Culture and Upstream Processing, Butler, ed., (Taylor and Francis Group, New York, 2007)). Related systems for the creation of stable bioproduction lines use the glutamine synthetase (GS) or hypoxanthine phosphoribosyltransferase (HPRT) genes as selectable markers and require the use of GS- or HPRT- deficient cell lines as hosts for transfection (Clarke and Compton, Bioprocess International 6, 24 (2008)). In the case of the GS system, gene amplification is accomplished by growing cells in the presence of methionine sulphoximine (MSX) (Clarke and Compton, Bioprocess International 6, 24 (2008)). In the case of the HPRT system, gene amplification is accomplished by growing cells in HAT medium, which contains aminopterin, hypoxanthine, and thymidine (Kellems, ed. Gene amplification in mammalian cells: a comprehensive guide, Marcel Dekker, New York, 1993).

In all of these systems, the initial plasmid DNA comprising a biotherapeutic gene expression cassette and a selectable marker integrates into a random location in the genome, resulting in extreme variability in therapeutic protein expression from one stable transfectant to another (Collingwood and Urnov, Targeted Gene Insertion to Enhance Protein Production from Cell Lines, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). For this reason, it is necessary to screen hundreds to thousands of initial transfectants to identify cells which express acceptably high levels of gene product both before and after gene amplification (Butler, Cell Line Development for Culture Strategies: Future Prospects to Improve Yields, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). A second and more problematic consequence of random gene integration is the phenomenon of transgene silencing, in which recombinant protein expression slows or ceases entirely over time (Collingwood and Urnov, Targeted Gene Insertion to Enhance Protein Production from Cell Lines, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). Because these effects often do not manifest themselves for weeks to months following the initial transfection and screening process, it is generally necessary to carry and expand dozens of independent clonal lines to identify one that expresses the protein of interest consistently over time (Butler, Cell Line Development for Culture Strategies: Future Prospects to Improve Yields, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)).

This large number of screening and expansion steps results in a very lengthy and expensive process to simply generate the cell line that will, ultimately, produce the therapeutic of interest. Indeed, using conventional methods, a minimum of 10 months (with an average of 18 months) and an upfront investment of tens of millions of dollars in labor and material is required to produce an initial pool of protein-expressing cells suitable for industrial manufacturing (Butler, Cell Line Development for Culture Strategies: Future Prospects to Improve Yields, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). If one takes into account lost time on market for a blockbuster protein therapeutic, inefficiencies in cell line production can cost biopharmaceutical manufacturers hundreds of millions of dollars (Seth et al., Curr Opin Biotechnol 18, 557 (2007)).

Much of the time and expense of bioproduction cell line creation can be attributed to random genomic integration of the bioproduct gene resulting in clone-to-clone variability in genotype and, hence, variability in gene expression. One way to overcome this is to target gene integration to a defined location that is known to support a high level of gene expression. To this end, a number of systems have been described which use the Cre, Flp, or ΦC31 recombinases to target the insertion of a bioproduct gene (reviewed in Collingwood and Urnov, Targeted Gene Insertion to Enhance Protein Production from Cell Lines, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). Recent embodiments of these systems, most notably the Flp-In® system marketed by Invitrogen Corp. (Carlsbad, Calif.), couple bioproduct gene integration with the reconstitution of a split selectable marker so that cells with correctly targeted genes can be selected. As expected, these systems result in greatly reduced heterogeneity in gene expression and, in some cases, individual stable transfectants can be pooled, obviating the time and expense associated with expanding a single clone.

The principal drawback to recombinase-based gene targeting systems is that the recombinase recognition sites (loxP, FRT, or attB/attP sites) do not naturally occur in mammalian genomes. Therefore, cells must be pre-engineered to incorporate a recognition site for the recombinase before that site can be subsequently targeted for gene insertion. Because the recombinase site itself integrates randomly into the genome, it is still necessary to undertake extensive screening and evaluation to identify clones which carry the site at a location that is suitable for high level, long-term gene expression (Collingwood and Urnov, Targeted Gene Insertion to Enhance Protein Production from Cell Lines, in Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). In addition, the biomanufacturing industry is notoriously hesitant to adopt “new” cell lines, such as those that have been engineered to carry a recombinase site, that do not have a track record of FDA approval. For these reasons, recombinase-based cell engineering systems may not readily be adopted by the industry and an approach that allows biomanufacturers to utilize their existing cell lines is preferable.

SUMMARY OF THE INVENTION

The present invention depends, in part, upon the development of mammalian cell lines in which sequences of interest (e.g., exogenous, actively transcribed transgenes) are inserted proximal to an endogenous selectable gene in an amplifiable locus, and the discovery that (a) the insertion of such exogenous sequences of interest does not inhibit amplification of the endogenous selectable gene, (b) the exogenous sequence of interest can be co-amplified with the endogenous selectable gene, and (c) the resultant cell lines, with an amplified region comprising multiple copies of the endogenous selectable gene and the exogenous sequence of interest, are stable for extended periods even in the absence of the selection regime which was employed to induce amplification. Thus, in one aspect, the invention provides a method for producing cell lines which can be used for biomanufacturing of a protein product of interest by specifically targeting the insertion of an exogenous sequence of interest capable of actively expressing the protein product of interest proximal to an endogenous selectable gene.

In another aspect, the invention provides engineered cell lines that can be used to produce protein products of interest (e.g., therapeutic proteins such as monoclonal antibodies) at high levels.

It is understood that any of the embodiments described below can be combined in any desired way, and any embodiment or combination of embodiments can be applied to each of the aspects described below, unless the context indicates otherwise.

In one aspect, the invention provides a recombinant mammalian cell comprising an engineered target site stably integrated within selectable gene within an amplifiable locus, wherein the engineered target site disrupts the function of the selectable gene and wherein the engineered target site comprises a recognition sequence for a site specific endonuclease.

In some embodiments, the selectable gene is glutamine synthetase (GS) and the locus is methionine sulphoximine (MSX) amplifiable. In some embodiments, the selectable gene is dihydrofolate reductase (DHFR) and the locus is Methotrexate (MTX) amplifiable.

In some embodiments, the selectable gene is selected from the group consisting of Dihydrofolate Reductase, Glutamine Synthetase, Hypoxanthine Phosphoribosyltransferase, Threonyl tRNA Synthetase, Na,K-ATPase, Asparagine Synthetase, Ornithine Decarboxylase, Inosine-5′-monophosphate dehydrogenase, Adenosine Deaminase, Thymidylate Synthetase, Aspartate Transcarbamylase, Metallothionein, Adenylate Deaminase (1,2), UMP-Synthetase and Ribonucleotide Reductase.

In some embodiments, the selectable gene is amplifiable by selection with a selection agent selected from the group consisting of Methotrexate (MTX), Methionine sulphoximine (MSX), Aminopterin, hypoxanthine, thymidine, Borrelidin, Ouabain, Albizziin, Beta-aspartyl hydroxamate, alpha-difluoromethylornithine (DFMO), Mycophenolic Acid, Adenosine, Alanosine, 2′ deoxycoformycin, Fluorouracil, N-Phosphonacetyl-L-Aspartate (PALA), Cadmium, Adenine, Azaserine, Coformycin, 6-azauridine, pyrazofuran, hydroxyurea, motexafin gadolinium, fludarabine, cladribine, gemcitabine, tezacitabine and triapine.

In some embodiments, the engineered target site is inserted into an exon of the selectable gene. In some embodiments, the site specific endonuclease is a meganuclease, a zinc finger nuclease or TAL effector nuclease. In some embodiment, the recombinant cell further comprises the site specific endonuclease.

In one aspect, the invention provides a recombinant mammalian cell comprising an engineered target site stably integrated proximal to a selectable gene within an amplifiable locus, wherein the engineered target site comprises a recognition sequence for a site specific endonuclease.

In some embodiments, the engineered target site is downstream from the 3′ regulatory region of the selectable gene. In some embodiments, the engineered target site is 0 to 100,000 base pairs downstream from the 3′ regulatory region of the selectable gene. In other embodiments, the engineered target site is upstream from the 5′ regulatory region of the selectable gene. In some embodiments, the engineered target site is 0 to 100,000 base pairs upstream from the 5′ regulatory region of the selectable gene.

In another aspect, the invention provides a method for inserting an exogenous sequence into an amplifiable locus of a mammalian cell comprising: (a) providing a mammalian cell having an endogenous target site proximal to a selectable gene within the amplifiable locus, wherein the endogenous target site comprises: (i) a recognition sequence for an engineered meganuclease; (ii) a 5′ flanking region 5′ to the recognition sequence; and

In some embodiments, the method futhter comprises growing the modified cell in the presence of a compound that inhibits the function of the selectable gene to amplify the copy number of the selectable gene. In some embodiments, the exogenous sequence comprises a gene of interest.

In some embodiments endogenous target site is downstream from the 3′ regulatory region of the selectable gene. In some embodiments, the endogenous target site is 0 to 100,000 base pairs downstream from the 3′ regulatory region of the selectable gene. In other embodiments, the endogenous target site is upstream from the 5′ regulatory region of the selectable gene. In some embodiments, the endogenous target site is 0 to 100,000 base pairs upstream from the 5′ regulatory region of the selectable gene.

In one aspect, the invention provides a method for inserting an exogenous sequence into an amplifiable locus of a mammalian cell comprising: (a) providing a mammalian cell having an endogenous target site proximal to a selectable gene within the amplifiable locus, wherein the endogenous target site comprises: (i) a recognition sequence for an engineered meganuclease; (ii) a 5′ flanking region 5′ to the recognition sequence; and (iii) a 3′ flanking region 3′ to the recognition sequence; and (b) introducing a double-stranded break between the 5′ and 3′ flanking regions of the endogenous target site; (c) contacting the cell with an engineered target site donor vector comprising from 5′ to 3′: (i) a donor 5′ flanking region homologous to the 5′ flanking region of the endogenous target site; (ii) an exogenous sequence comprising an engineered target site; and (iii) a donor 3′ flanking region homologous to the 3′ flanking region of the endogenous target site; whereby the donor 5′ flanking region, the exogenous sequence and the donor 3′ flanking region are inserted between the 5′ and 3′ flanking regions of the endogenous target site by homologous recombination to provide a mammalian cell comprising the engineered target site; (d) introducing a double-stranded break between the 5′ and 3′ flanking regions of the engineered target site; (e) contacting the cell comprising the engineered target site with a sequence of interest donor vector comprising from 5′ to 3′: (i) a donor 5′ flanking region homologous to the 5′ flanking region of the engineered target site; (ii) an exogenous sequence comprising a sequence of interest; and (iii) a donor 3′ flanking region homologous to the 3′ flanking region of the engineered target site; whereby the donor 5′ flanking region, the exogenous sequence comprising the sequence of interest and the donor 3′ flanking region are inserted between the 5′ and 3′ flanking regions of the engineered target site by homologous recombination to provide an engineered mammalian cell comprising the sequence of interest.

In some embodiments, the methof further comprises growing the engineered mammalian cell in the presence of a compound that inhibits the function of the selectable gene to amplify the copy number of the selectable gene. In some embodiments, the sequence of interest comprises a gene.

In another aspect, the invention provides a method for inserting an exogenous sequence into an amplifiable locus of a mammalian cell comprising: (a) providing a mammalian cell having an endogenous target site within a selectable gene within the amplifiable locus, wherein the endogenous target site comprises: (i) a recognition sequence for an engineered meganuclease; (ii) a 5′ flanking region 5′ to the recognition sequence; and

(iii) a 3′ flanking region 3′ to the recognition sequence; and (b) introducing a double-stranded break between the 5′ and 3′ flanking regions of the endogenous target site; (c) contacting the cell with an engineered target site donor vector comprising from 5′ to 3′: (i) a donor 5′ flanking region homologous to the 5′ flanking region of the endogenous target site; (ii) an exogenous sequence comprising an engineered target site; and (iii) a donor 3′ flanking region homologous to the 3′ flanking region of the endogenous target site; whereby the donor 5′ flanking region, the exogenous sequence and the donor 3′ flanking region are inserted between the 5′ and 3′ flanking regions of the endogenous target site by homologous recombination to provide a mammalian cell comprising the engineered target site; (d) introducing a double-stranded break between the 5′ and 3′ flanking regions of the engineered target site; (e) contacting the cell comprising the engineered target site with a sequence of interest donor vector comprising from 5′ to 3′: (i) a donor 5′ flanking region homologous to the 5′ flanking region of the engineered target site; (ii) an exogenous sequence comprising a sequence of interest; and (iii) a donor 3′ flanking region homologous to the 3′ flanking region of the engineered target site; whereby the donor 5′ flanking region, the exogenous sequence comprising the sequence of interest and the donor 3′ flanking region are inserted between the 5′ and 3′ flanking regions of the engineered target site by homologous recombination to provide a engineered mammalian cell comprising the sequence of interest.

In some emboduments, the method further comprises growing the engineered mammalian cell in the presence of a compound that inhibits the function of the selectable gene to amplify the copy number of the selectable gene.

In some embodiments, the sequence of interest comprises a gene.

In some embodiments, the endogenous target site is within an intron of the selectable gene. In other embodiments, the endogenous target site is within an exon of the selectable gene.

In one aspect, the invention provides a recombinant meganuclease comprising a polypeptide having at least 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 15.

In another aspect, the invention provides a recombinant meganuclease comprising the amino acid sequence of SEQ ID NO: 15.

In another aspect, the invention provides a recombinant meganuclease which recognizes and cleaves a recognition site having 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 14. In one embodiment, the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 14.

In another aspect, the invention provides a recombinant meganuclease comprising a polypeptide having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 9. In one embodiment, the recombinant meganuclease has the sequence of the meganuclease of SEQ ID NO: 9.

In another aspect, the invention provides a recombinant meganuclease which recognizes and cleaves a recognition site having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 7. In one embodiment, the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 7.

In another aspect, the invention provides a recombinant meganuclease comprising a polypeptide having at least 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 10. In one embodiment, the recombinant meganuclease comprises the polypeptide of SEQ ID NO: 10.

In another aspect, the invention provides a recombinant meganuclease which recognizes and cleaves a recognition site having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 8. In one embodiment, the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 8.

In another aspect, the invention provides a recombinant meganuclease comprising a polypeptide having at least 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 13. In one embodiment, rhe recombinant meganuclease comprises the polypeptide of SEQ ID NO: 13.

In another aspect, the invention provides a recombinant meganuclease which recognizes and cleaves a recognition site having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 12. In one embodiment, the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 12.

In another aspect, the invention provides a recombinant meganuclease comprising a polypeptide having at least 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 29. In one embodiment, the recombinant meganuclease comprises the polypeptide of SEQ ID NO: 29.

In another aspect, the invention provides a recombinant meganuclease which recognizes and cleaves a recognition site having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 30. In one embodiment, the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 30.

In another aspect, the invention provides recombinant mammalian cell lines which continue to express a protein product of interest from an exogenous sequence of interest present in an amplified region of the genome (i.e., present in 2-1,000 copies, co-amplified with a selectable gene in an amplifiable locus) for a period of at least 8, 9, 10, 11, 12, 13, or 14 weeks after removal of the amplification selection agent, and with a reduction of expression levels and/or copy number of less than 20, 25, 30, 35 or 40%.

In another aspect, the invention provides methods of producing recombinant cells with amplified regions including a sequence of interest and a selectable gene by subjecting the above-described recombinant cells to selection with a selection agent which causes co-amplification of the sequence of interest and the selectable gene.

In another aspect, the invention provides methods of producing a protein product of interest by culturing the above-described recombinant cells, or the above-described recombinant cells with amplified regions, and obtaining the protein product of interest from the culture medium or a cell lysate.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1. A general strategy for targeting a sequence of interest to an amplifiable locus.

FIGS. 2A and 2B. (A) Schematic of the CHO DHFR locus showing a preferred region for targeting a sequence of interest 5,000-60,000 base pairs downstream of the DHFR gene. (B) Schematic of the CHO GS locus showing a preferred region for targeting a sequence of interest 5,000-55,000 base pairs downstream of the GS gene.

FIG. 3. Strategy for inserting a sequence of interest into an amplifiable locus in a two-step process involving a pre-integrated engineered target sequence.

FIG. 4. Strategy for inserting an engineered target sequence into an amplifiable locus with concomitant removal of a portion of the selectable gene, followed by insertion of a sequence of interest and reconstitution of the selectable gene.

FIG. 5. Strategy for inserting an engineered target sequence into an amplifiable locus with concomitant disruption of the coding sequence of a selectable gene, followed by insertion of a sequence of interest and reconstitution of the selectable gene.

FIG. 6. Strategy for inserting an engineered target sequence into an amplifiable locus with concomitant disruption of the mRNA processing, followed by insertion of a sequence of interest and reconstitution of the selectable gene.

FIGS. 7A through 7D. (A) A direct-repeat recombination assay for site-specific endonuclease activity. (B) Results of the assay in (A) applied to the CHO-23/24 and CHO-51/52 meganucleases.(C) Alignment of sequences obtained from CHO cells transfected with mRNA encoding the CHO-23/24 meganuclease (SEQ ID NOS 37-39, 38, 40, 38, and 38, respectively, in order of appearance). (D) Alignment of sequences obtained from CHO cells transfected with mRNA encoding the CHO-51/52 meganuclease (SEQ ID NOS 41-51, respectively, in order of appearance).

FIGS. 8A and 8B. (A) Strategy for inserting an exogenous DNA sequence into the CHO DHFR locus using the CHO-51/52 meganuclease.(B) PCR products demonstrating insertion of an engineered target sequence.

FIGS. 9A through 9C. (A) Strategy for inserting an engineered target sequence into the CHO DHFR locus using the CHO-23/24 meganuclease, followed by Flp recombinase-mediated insertion of a sequence of interest. (B) PCR products from hygromycin-resistant clones produced in (A). (C) GFP expression by the 24 clones produced in (B).

FIGS. 10A through 10C. Results of experiments with a GFP-expressing CHO line produced by integrating a GFP gene expression cassette into the DHFR locus using a target sequence strategy as shown in FIG. 9.

FIGS. 11A through 11C. (A) A direct-repeat recombination assay, as in FIG. 5A. (B) The assay in (A) applied to the CHO-13/14 and CGS-5/6 meganucleases. (C) Alignment of sequences obtained from CHO cells transfected with mRNA encoding the CGS-5/6 meganuclease (SEQ ID NOS 52-56, 56, 56-63, 63, 63, and 63-64, respectively, in order of appearance).

DETAILED DESCRIPTION OF THE INVENTION

1.1 Introduction

The present invention depends, in part, upon the development of mammalian cell lines in which exogenous actively transcribed transgenes have been inserted proximal to an endogenous amplifiable locus, and the discovery that (a) the insertion of such exogenous actively transcribed transgenes does not prevent or substantially inhibit amplification of the endogenous amplifiable locus, (b) the exogenous actively transcribed transgene can be co-amplified with the endogenous amplifiable locus, and (c) the resultant cell line, with an amplified region comprising multiple copies of the endogenous amplifiable locus and the exogenous actively transcribed transgene is stable for extended periods even in the absence of the selection regime which was employed to induce amplification. Thus, in one aspect, the invention provides a method for producing cell lines which can be used for biomanufacturing of a protein product of interest by specifically targeting the insertion of an exogenous gene capable of actively expressing the protein product of interest proximal to an endogenous amplifiable locus. In another aspect, the invention provides engineered cell lines that can be used to produce protein products of interest (e.g., therapeutic proteins such as monoclonal antibodies) at high levels.

1.2 References and Definitions

The patent and scientific literature referred to herein establishes knowledge that is available to those of skill in the art. The entire disclosures of the issued U.S. patents, pending applications, published foreign applications, and scientific and technical references cited herein, including protein and nucleic acid database sequences, are hereby incorporated by reference to the same extent as if each was specifically and individually indicated to be incorporated by reference.

As used herein, the term “meganuclease” refers to naturally-occurring homing endonucleases (also referred to as Group I intron encoded endonucleases) or non-naturally-occurring (e.g., rationally designed or engineered) endonucleases based upon the amino acid sequence of a naturally-occurring homing endonuclease. Examples of naturally-occurring meganucleases include I-SceI, I-CreI, I-Ceul, I-Dmol, I-Msol, I-AniI, etc. Rationally designed meganucleases are disclosed in, for example, WO 2007/047859 and WO 2009/059195, and can be engineered to have modified DNA-binding specificity, DNA cleavage activity, DNA-binding affinity, or dimerization properties relative to a naturally occurring meganuclease. A meganuclease may bind to double-stranded DNA as a homodimer (e.g., wild-type I-CreI), or it may bind to DNA as a heterodimer (e.g., engineered meganucleases disclosed in WO 2007/047859). An engineered meganuclease may also be a “single-chain meganuclease” in which a pair of DNA-binding domains derived from a natural meganuclease are joined into a single polypeptide using a peptide linker (e.g., single-chain meganucleases disclosed in WO 2009/059195).

As used herein, the term “single-chain meganuclease” refers to a polypeptide comprising a pair of meganuclease subunits joined by a linker. A single-chain meganuclease has the organization: N-terminal subunit—Linker—C-terminal subunit. The two meganuclease subunits will generally be non-identical in amino acid sequence and will recognize non-identical DNA sequences. Thus, single-chain meganucleases typically cleave pseudo-palindromic or non-palindromic recognition sequences. Methods of producing single-chain meganucleases are disclosed in WO 2009/059195.

As used herein, the term “site specific endonuclease” means a meganuclease, zinc-finger nuclease or TAL effector nuclease.

As used herein, with respect to a protein, the term “recombinant” means having an altered amino acid sequence as a result of the application of genetic engineering techniques to nucleic acids which encode the protein, and cells or organisms which express the protein. With respect to a nucleic acid, the term “recombinant” means having an altered nucleic acid sequence as a result of the application of genetic engineering techniques. Genetic engineering techniques include, but are not limited to, PCR and DNA cloning technologies; transfection, transformation and other gene transfer technologies; homologous recombination; site-directed mutagenesis; and gene fusion. In accordance with this definition, a protein having an amino acid sequence identical to a naturally-occurring protein, but produced by cloning and expression in a heterologous host, is not considered recombinant. As used herein, the term “engineered” is synonymous with the term “recombinant.”

As used herein, with respect to a meganuclease, the term “wild-type” refers to any naturally-occurring form of a meganuclease. The term “wild-type” is not intended to mean the most common allelic variant of the enzyme in nature but, rather, any allelic variant found in nature. Wild-type homing endonucleases are distinguished from recombinant or non-naturally-occurring meganucleases.

As used herein, the term “recognition sequence” refers to a DNA sequence that is bound and cleaved by a meganuclease. A recognition sequence comprises a pair of inverted, 9 base pair “half sites” which are separated by four base pairs. In the case of a homo- or heterodimeric meganucleases, each of the two monomers makes base-specific contacts with one half-site. In the case of a single-chain heterodimer meganuclease, the N-terminal domain of the protein contacts a first half-site and the C-terminal domain of the protein contacts a second half-site. In the case if I-CreI, for example, the recognition sequence is 22 base pairs and comprises a pair of inverted, 9 base pair “half sites” which are separated by four base pairs.

As used herein, the term “target site” refers to a region of the chromosomal DNA of a cell comprising a target sequence into which a sequence of interest can be inserted. As used herein, the term “engineered target site” refers to an exogenous sequence of DNA integrated into the chromosomal DNA of a cell comprising an engineered target sequence into which a sequence of interest can be inserted.

As used herein, the term “target sequence” means a DNA sequence within a target site which includes one or more recognition sequences for a nuclease, integrase, transposase, and/or recombinase. For example, a target sequence can include a recognition sequence for a meganuclease. As used herein, an “engineered target sequence” means an exogenous target sequence which is introduced into a chromosome to serve as the insertion point for another sequence.

As used herein, the term “flanking region” or “flanking sequence” refers to a sequence of >3 or, preferably, >50 or, more preferably, >200 or, most preferably, >400 base pairs of DNA which is immediately 5′ or 3′ to a reference sequence (e.g., a target sequence or sequence of interest).

As used herein, the terms “amplifiable locus” refers to a region of the chromosomal DNA of a cell which can be amplified by selection with one or more compounds (e.g., drugs) in the growth media. An amplifiable locus will typically comprise a gene encoding a protein which, under the appropriate conditions, is necessary for cell survival. By inhibiting the function of such an essential protein, for example with a small molecule drug, the amplifiable locus is duplicated many times over as a means of increasing the copy number of the essential gene. A gene of interest, if integrated into an amplifiable locus, will also become duplicated with the essential gene. Examples of amplifiable loci include the chromosomal regions comprising the DHFR, GS, and HPRT genes.

As used herein, the term “amplified locus” or “amplified gene” or “amplified sequence” refers to a locus, gene or sequence which is present in 2-1,000 copies as a result of gene amplification in response to selection of a selectable gene. An amplified gene or sequence can be a gene or sequence which is co-amplified due to selection of a selectable gene in the same amplifiable locus. In preferred embodiments, a sequence of interest is amplified to at least 3, 4, 5, 6, 7, 8, 9 or 10 copies.

As used herein, the term “selectable gene” refers to an endogenous gene that is essential for cell survival under some specific culture conditions (e.g., presence or absence of a nutrient, toxin or drug). Selectable genes are endogenous to the cell and are distinguished from exogenous “selectable markers” such as antibiotic resistance genes. Selectable genes exist in their natural context in the chromosomal DNA of the cell. For example, DHFR is a selectable gene which is necessary for cell survival in the presence of MTX in the culture medium. The gene is essential for growth in the absence of hypoxanthine and thymidine. If the endogenous DHFR selectable gene is eliminated, cells are able to grow in the absence of hypoxanthine and thymidine if they are given an exogenous copy of the DHFR gene. This exogenous copy of the DHFR gene is a selectable marker but is not a selectable gene. An amplifiable locus comprises a selectable gene and a target site. A target site is found outside of a selectable gene such that a selectable gene does not comprise a target site. Examples of selectable genes are given in Table 1.

As used herein, when used in connection with the position of a target site, recognition sequence, or inserted sequence of interest relative to the position of a selectable gene, the term “proximal” means that the target site, recognition sequence, or inserted sequence of interest is within the same amplifiable locus as the selectable gene, either upstream (5′) or downstream (3′) of the selectable gene, and preferably between the selectable gene and the next gene in the region (whether upstream (5′) or downstream (3′)). Typically, a “proximal” target site, recognition sequence, or inserted sequence of interest will be within <100,000 base pairs of the selectable gene, as measured from the first or last nucleotide of the first or last regulatory element of the selectable gene.

As used herein, the term “homologous recombination” refers to the natural, cellular process in which a double-stranded DNA-break is repaired using a homologous DNA sequence as the repair template (see, e.g. Cahill et al. (2006), Front. Biosci. 11:1958-1976). The homologous DNA sequence may be an endogenous chromosomal sequence or an exogenous nucleic acid that was delivered to the cell. Thus, for some applications of engineered meganucleases, a meganuclease is used to cleave a recognition sequence within a target sequence in a genome and an exogenous nucleic acid with homology to or substantial sequence similarity with the target sequence is delivered into the cell and used as a template for repair by homologous recombination. The DNA sequence of the exogenous nucleic acid, which may differ significantly from the target sequence, is thereby inserted or incorporated into the chromosomal sequence. The process of homologous recombination occurs primarily in eukaryotic organisms. The term “homology” is used herein as equivalent to “sequence similarity” and is not intended to require identity by descent or phylogenetic relatedness.

As used herein, the term “stably integrated” means that an exogenous or heterologous DNA sequence has been covlently inserted into a chromosome (e.g., by homologous recombination, non-homologous end joining, transposition, etc.) and has remained in the chromosome for a period of at least 8 weeks.&&

As used herein, the term “non-homologous end-joining” or “NHEJ” refers to the natural, cellular process in which a double-stranded DNA-break is repaired by the direct joining of two non-homologous DNA segments (see, e.g. Cahill et al. (2006), Front. Biosci. 11:1958-1976). DNA repair by non-homologous end-joining is error-prone and frequently results in the untemplated addition or deletion of DNA sequences at the site of repair. Thus, for certain applications, an engineered meganuclease can be used to produce a double-stranded break at a meganuclease recognition sequence within an amplifiable locus and an exogenous nucleic acid molecule, such as a PCR product, can be captured at the site of the DNA break by NHEJ (see, e.g. Salomon et al. (1998), EMBO J. 17:6086-6095). In such cases, the exogenous nucleic acid may or may not have homology to the target sequence. The process of non-homologous end-joining occurs in both eukaryotes and prokaryotes such as bacteria.

As used herein, the term “sequence of interest” means any nucleic acid sequence, whether it codes for a protein, RNA, or regulatory element (e.g., an enhancer, silencer, or promoter sequence), that can be inserted into a genome or used to replace a genomic DNA sequence. Sequences of interest can have heterologous DNA sequences that allow for tagging a protein or RNA that is expressed from the sequence of interest. For instance, a protein can be tagged with tags including, but not limited to, an epitope (e.g., c-myc, FLAG) or other ligand (e.g., poly-His). Furthermore, a sequence of interest can encode a fusion protein, according to techniques known in the art (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, Wiley 1999). In preferred embodiments, a sequence of interest comprises a promoter operably linked to a gene encoding a protein of medicinal value such as an antibody, antibody fragment, cytokine, growth factor, hormone, or enzyme. For some applications, the sequence of interest is flanked by a DNA sequence that is recognized by the engineered meganuclease for cleavage. Thus, the flanking sequences are cleaved allowing for proper insertion of the sequence of interest into genomic recognition sequences cleaved by an engineered meganuclease. For some applications, the sequence of interest is flanked by DNA sequences with homology to or substantial sequence similarity with the target site such that homologous recombination inserts the sequence of interest within the genome at the locus of the target sequence.

As used herein, the term “donor DNA” refers to a DNA molecule comprising a sequence of interest flanked by DNA sequences homologous to a target site. Donor DNA can serve as a template for DNA repair by homologous recombination if it is delivered to a cell with a site-specific nuclease such as a meganuclease, zinc-finger nuclease, or TAL-effector nuclease. The result of such DNA repair is the insertion of the sequence of interest into the chromosomal DNA of the cell. Donor DNA can be linear, such as a PCR product, or circular, such as a plasmid. In cases where a donor DNA is a circular plasmid, it may be referred to as a “donor plasmid.”

As used herein, unless specifically indicated otherwise, the word “or” is used in the inclusive sense of “and/or” and not the exclusive sense of “either/or.”

2.1 Transgene Targeting to Amplifiable Loci

The present invention provides methods for generating transgenic mammalian cell lines expressing a desired protein product of interest, including “high-producer” cell lines, by targeting the insertion of a gene encoding the protein product of interest (e.g., a therapeutic protein gene expression cassette) to regions of the genome that are amplifiable. Such regions in mammalian cells include the DHFR, GS, and HPRT genes, as well as others shown in Table 1.

The precise mechanism of gene amplification is not known. Indeed, it is very likely that there is no single mechanism by which gene amplification occurs but that a variety of different random chromosomal aberrations, in combination with strong selection for amplification, results in increased gene copy number (reviewed in Omasa (2002), J. Biosci. Bioeng. 94:600-605). It is clear that chromosomal location plays a major role in amplification and the stable maintenance of amplified genes (Brinton and Heintz (1995), Chromosoma 104:143-51). It has been found that transgenes integrated into chromosomal locations adjacent to telomeres are more easily amplified and, once amplified, tend to be stable at high copy numbers after the selection agent is removed (Yoshikawa et al. (2000), Cytotechnology 33:37-46; Yoshikawa et al. (2000), Biotechnol Prog. 16:710-715). This is significant because selection agents such as MTX and MSX are toxic and cannot be included in the growth media in a commercial biomanufacturing process. In contrast, transgenes integrated into regions in the CHO genome that are not adjacent to telomeres amplify inefficiently and rapidly lose copy number following the removal of selection agents from the media. For example, Yoshikawa et al. found that randomly-integrated transgenes linked to a DHFR selectable marker amplified to greater than 10-fold higher copy numbers when the integration site was adjacent to a telomere (Yoshikawa et al. (2000), Biotechnol Prog. 16:710-715). These researchers also found that an amplified transgene integrated into a non-telomeric region will lose >50% of its copies in only 20 days following the removal of MTX from the growth media. None of the selectable genes identified in Table 1 is adjacent to a telomere in the mouse genome (www.ensembl.com) and the similarity in genome organization between mouse and CHO makes it likely that these genes are in non-telomeric regions in CHO as well (Xu et al. (2011), Nat. Biotechnol. 29:735-741). Thus, the prior art instructs that the loci identified in Table 1, including the DHFR and GS loci, are not preferred locations to target transgene insertion if the goal is efficient and stable gene amplification.

In addition, in the case of endogenous gene amplification, it is clear that chromosomal sequences outside of the selectable gene sequence play an important role in facilitating amplification and in defining the length of DNA sequence that is co-amplified with the gene under selection (Looney and Hamlin (1987), Mol. and Cell. Biol. 7:569-577). In particular, it has been shown that the sequence and location of the DNA replication origin in relation to the selectable gene plays a major role in amplification. For example, it has been shown that amplification of the endogenous CHO DHFR locus is dependent upon a pair of replication origins found in the region 5,000-60,000 base pairs downstream of the DHFR gene coding sequence (Anachkova and Hamlin (1989), Mol. and Cell. Biol. 9:532-540; Milbrandt et al. (1981), Proc. Natl. Acad. Sci. USA 78:6042-6047). Further, Brinton and Heintz have shown that these same replication origins fail to promote gene amplification when incorporated randomly into the genome with a transgenic DHFR sequence (Brinton and Heintz (1995), Chromosoma. 104:143-51). This clearly demonstrates the importance of maintaining both the sequence and proper chromosomal context of these replication origins to promote DHFR gene amplification. Thus the art instructs that the region downstream of DHFR is critical to gene amplification and should not be disrupted by, for example, inserting a transgenic gene expression cassette as described in the present invention.

Surprisingly, we have discovered that DNA sequences, including exogenous transcriptionally active sequences, which are inserted proximal to (e.g., within <100,000 base pairs) selectable genes in mammalian cell lines (e.g., CHO-K1) will co-amplify in the presence of appropriate compounds which select for amplification. Thus, the present invention provides methods for reliably and reproducibly producing isogenic cell lines in which transgenes encoding protein products of interest (e.g., biotherapeutic gene expression cassettes) can be amplified but in which it is not necessary to screen a large number of randomly generated cell lines to identify those which express high levels of the protein product of interest and are resistant to gene silencing.

In addition, we have surprisingly found that the mammalian cell lines of the invention, in which a sequence of interest is co-amplified with a selectable gene in an amplifiable locus, are stable with respect to expression of the sequence of interest and/or copy number of the sequence of interest even in the absence of continued selection. That is, whereas the art teaches that amplified sequences will be reduced in copy number over time if selection is not maintained (see, e.g., Yoshikawa et al. (2000), Biotechnol Prog. 16:710-715), we have found that cell lines produced according to the methods of the invention continue to produce the protein products of interest (encoded by the sequences of interest) at levels within 20%-25% of the initial levels, even 14 weeks after removal of the selection agent. This is significant, as noted above, because selection agents such as MTX and MSX are toxic, and it would be highly desirable to produce biotherapeutic proteins in cell lines which do not require continued exposure to such selection agents. Therefore, in some embodiments, the invention provides recombinant mammalian cell lines which continue to express a protein product of interest from an exogenous sequence of interest present in an amplified region of the genome (i.e., present in 2-1,000 copies, co-amplified with a selectable gene in an amplifiable locus) for a period of at least 8, 9, 10, 11, 12, 13, or 14 weeks after removal of the amplification selection agent, and with a reduction of expression levels and/or copy number of less than 20, 25, 30, 35 or 40%.

The present invention also provides the products necessary to practice the methods, and to target insertion of sequences of interest into amplifiable loci in mammalian cell lines. A common method for inserting or modifying a DNA sequence involves introducing a transgenic DNA sequence flanked by sequences homologous to the genomic target and selecting or screening for a successful homologous recombination event. Recombination with the transgenic DNA occurs rarely but can be stimulated by a double-stranded break in the genomic DNA at the target site (Porteus et al. (2005), Nat. Biotechnol. 23: 967-73; Tzfira et al. (2005), Trends Biotechnol. 23: 567-9; McDaniel et al. (2005), Curr. Opin. Biotechnol. 16: 476-83). Numerous methods have been employed to create DNA double-stranded breaks, including irradiation and chemical treatments. Although these methods efficiently stimulate recombination, the double-stranded breaks are randomly dispersed in the genome, which can be highly mutagenic and toxic. At present, the inability to target gene modifications to unique sites within a chromosomal background is a major impediment to routine genome engineering.

One approach to achieving this goal is stimulating homologous recombination at a double-stranded break in a target locus using a nuclease with specificity for a sequence that is sufficiently large to be present at only a single site within the genome (see, e.g., Porteus et al. (2005), Nat. Biotechnol. 23: 967-73). The effectiveness of this strategy has been demonstrated in a variety of organisms using ZFNs (Porteus (2006), Mol Ther 13: 438-46; Wright et al. (2005), Plant J. 44: 693-705; Urnov et al. (2005), Nature 435: 646-51). Homing endonucleases are a group of naturally-occurring nucleases which recognize 15-40 base-pair cleavage sites commonly found in the genomes of plants and fungi. They are frequently associated with parasitic DNA elements, such as Group I self-splicing introns and inteins. They naturally promote homologous recombination or gene insertion at specific locations in the host genome by producing a double-stranded break in the chromosome, which recruits the cellular DNA-repair machinery (Stoddard (2006), Q. Rev. Biophys. 38: 49-95). Homing endonucleases are commonly grouped into four families: the LAGLIDADG (SEQ ID NO: 65) family, the GIY-YIG family, the His-Cys box family and the HNH family. These families are characterized by structural motifs, which affect catalytic activity and recognition sequence. For instance, members of the LAGLIDADG (SEQ ID NO: 65) family are characterized by having either one or two copies of the conserved LAGLIDADG (SEQ ID NO: 65) motif (see Chevalier et al. (2001), Nucleic Acids Res. 29(18): 3757-3774). The LAGLIDADG (SEQ ID NO: 65) homing endonucleases with a single copy of the LAGLIDADG (SEQ ID NO: 65) motif form homodimers, whereas members with two copies of the LAGLIDADG (SEQ ID NO: 65) motif are found as monomers.

Natural homing endonucleases, primarily from the LAGLIDADG (SEQ ID NO: 65) family, have been used to effectively promote site-specific genome modification in plants, yeast, Drosophila, mammalian cells and mice, but this approach has been limited to the modification of either homologous genes that conserve the endonuclease recognition sequence (Monnat et al. (1999),Biochem. Biophys. Res. Commun. 255: 88-93) or to pre-engineered genomes into which a recognition sequence has been introduced (Rouet et al. (1994), Mol. Cell. Biol. 14: 8096-106; Chilton et al. (2003), Plant Physiol. 133: 956-65; Puchta et al. (1996), Proc. Natl. Acad. Sci. USA 93: 5055-60; Rong et al. (2002), Genes Dev. 16: 1568-81; Gouble et al. (2006), J. Gene Med. 8(5):616-622).

Systematic implementation of nuclease-stimulated gene modification requires the use of engineered enzymes with customized specificities to target DNA breaks to existing sites in a genome and, therefore, there has been great interest in adapting homing endonucleases to promote gene modifications at medically or biotechnologically relevant sites (Porteus et al. (2005), Nat. Biotechnol. 23: 967-73; Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Epinat et al. (2003), Nucleic Acids Res. 31: 2952-62).

I-CreI (SEQ ID NO: 1) is a member of the LAGLIDADG (SEQ ID NO: 65) family of homing endonucleases which recognizes and cuts a 22 base pair recognition sequence in the chloroplast chromosome of the algae Chlamydomonas reinhardtii. Genetic selection techniques have been used to modify the wild-type I-CreI cleavage site preference (Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Chames et al. (2005), Nucleic Acids Res. 33: e178; Seligman et al. (2002), Nucleic Acids Res. 30: 3870-9, Arnould et al. (2006), J. Mol. Biol. 355: 443-58). More recently, a method of rationally-designing mono-LAGLIDADG (SEQ ID NO: 65) homing endonucleases was described which is capable of comprehensively redesigning I-CreI and other homing endonucleases to target widely-divergent DNA sites, including sites in mammalian, yeast, plant, bacterial, and viral genomes (WO 2007/047859).

Thus, in one embodiment, the invention provides engineered meganucleases derived from the amino acid sequence of I-CreI that recognize and cut DNA sites in amplifiable regions of mammalian genomes. These engineered meganucleases can be used in accordance with the invention to target the insertion of gene expression cassettes into defined locations in the chromosomal DNA of cell lines such as CHO cells. This invention will greatly streamline the production of desired cell lines by reducing the number of lines that must be screened to identify a “high-producer” clone suitable for commercial-scale production of a therapeutic glycoprotein.

The present invention involves targeting transgenic DNA “sequences of interest” to amplifiable loci. The amplifiable loci are regions of the chromosomal DNA that contain selectable genes that become amplified in the presence of selection agents (e.g., drugs). For example, the Chinese Hamster Ovary (CHO) cell DHFR locus can be amplified to ˜1,000 copies by growing the cells in the presence of methotrexate (MTX), a DHFR inhibitor. Table 1 lists additional examples of selectable genes that can be amplified using small molecule drugs (Kellems, ed. Gene amplification in mammalian cells: a comprehensive guide. Marcel Dekker, New York, 1993; Omasa (2002), J. Biosci. Bioeng. 94:6 600-605).

TABLE 1

Amplifiable Genes

Selectable Gene Name	Amplified With

Dihydrofolate Reductase	Methotrexate (MTX)
Glutamine Synthetase	Methionine sulphoximine (MSX)
Hypoxanthine	Aminopterin, hypoxanthine, and thymidine
Phosphoribosyltransferase
Threonyl tRNA Synthetase	Borrelidin
Na,K-ATPase	Ouabain
Asparagine Synthetase	Albizziin or Beta-aspartyl hydroxamate
Ornithine Decarboxylase	alpha-difluoromethylornithine (DFMO)
Inosine-5′-monophosphate	Mycophenolic Acid
dehydrogenase
Adenosine Deaminase	Adenosine, Alanosine, 2′deoxycoformycin
Thymidylate Synthetase	Fluorouracil
Aspartate Transcarbamylase	N-Phosphonacetyl-L-Aspartate (PALA)
Metallothionein	Cadmium
Adenylate Deaminase (1, 2)	Adenine, Azaserine, Coformycin
UMP-Synthetase	6-azauridine, pyrazofuran
Ribonucleotide Reductase	hydroxyurea, motexafin gadolinium,
	fludarabine, cladribine, gemcitabine,
	tezacitabine, triapine.

Several considerations must be taken into account when selecting a specific target site for the insertion of a sequence of interest within an amplifiable locus. First, the selected insertion site must be co-amplified with the gene under selection. In many cases, experimental data already exists in the art which delimits the amount of flanking chromosomal sequence that co-amplifies with a selectable gene of interest. This data, which precisely defines the extent of the amplifiable locus, exists for CHO DHFR (Ma et al. (1988), Mol Cell Biol. 8(6):2316-27), human DHFR (Morales et al. (2009), Mol Cancer Ther. 8(2):424-432), and CHO GS (Sanders et al. (1987), Dev Biol Stand. 66:55-63). Where such data does not already exist in the art, we predict that chromosomal DNA sequences <100,000 base pairs upstream or downstream of the selectable gene coding sequence are likely to co-amplify. Hence, these regions could be suitable sites for targeting the insertion of a sequence of interest.

Second, target sites should be selected which will not greatly impact the function of the selectable gene (e.g., the endogenous DHFR, GS, or HPRT gene). Because amplification requires a functional copy of the selectable gene, insertion sites within the promoter, exons, introns, polyadenylation signals, or other regulatory sequences that, if disrupted, would greatly impact transcription or translation of the selectable gene, should be avoided. For example, WO 2008/059317 discloses meganucleases which cleave DNA target sites within the HPRT gene. To the extent WO 2008/059317 discloses the insertion of genes into the HPRT locus, it teaches that the HPRT gene coding sequence should be disrupted in the process of transgene insertion to facilitate selection for proper targeting using 6-thioguanine. 6-thioguanine is a toxic nucleotide analog that kills cells having functional HPRT activity. Because cells produced in accordance with WO 2008/059317 will not have HPRT activity, they will not amplify an inserted transgene in response to treatment with an HPRT inhibitor and, so, cannot be used in the present invention. For the present invention, unless the precise limits of all regulatory sequences are already known for a particular selectable gene, insertion sites >1,000 base pairs, >2,000 base pairs, >3,000 base pairs, >4,000 base pairs, or, preferably, >5,000 base pairs, upstream or downstream of the gene coding sequence should be selected. However, if the location of the regulatory sequences are known, the sequence of interest can be inserted immediately adjacent to the either the most 5′ or 3′ regulatory sequence (e.g., immediately 3′ to the polyadenylation signal).

Lastly, target sites should be selected which do not disrupt other chromosomal genes which may be important for normal cell physiology. In general, gene insertion sites should be >1,000 base pairs, >2,000 base pairs, >3,000 base pairs, >4,000 base pairs, or, preferably, >5,000 base pairs, away from any gene coding sequence.

Various methods of the invention are described schematically in the figures as follows:

FIG. 1 depicts a general strategy for targeting a sequence of interest to an amplifiable locus. In the first step, a site-specific endonuclease introduces a double-stranded break in the chromosomal DNA of a cell at a site that is proximal to an endogenous selectable gene. The cleaved chromosomal DNA then undergoes homologous recombination with a donor DNA molecule comprising a sequence of interest flanked by DNA sequences homologous to sequences flanking the endonuclease recognition sequence in the target site. As a result, the sequence of interest is inserted into the chromosomal DNA of the cell adjacent to the endogenous selectable gene. The modified cell is then grown in the presence of one or more compounds that inhibit the function of the selectable gene to induce an increase in the copy number (i.e., amplification) of the selectable gene. The sequence of interest, which is genetically linked to the selectable gene, will co-amplify with the selectable gene. The result is a stable transgenic cell line comprising multiple copies of the sequence of interest.

FIG. 2(A) depicts a schematic of the CHO DHFR locus showing a preferred region for targeting a sequence of interest 5,000-60,000 base pairs downstream of the DHFR gene. FIG. 2(B) depicts a schematic of the CHO GS locus showing a preferred region for targeting a sequence of interest 5,000-55,000 base pairs downstream of the GS gene. Promoters are shown as arrows. Exons are shown as rectangles, with non-coding exons in white and protein coding exons in gray.

FIG. 3 depicts a strategy for inserting a sequence of interest into an amplifiable locus in a two-step process involving a pre-integrated target sequence. In the first step, the chromosomal DNA of a cell is cleaved by a site-specific endonuclease at a site that is proximal to a selectable gene. The cleaved chromosomal DNA then undergoes homologous recombination with a donor DNA molecule comprising an exogenous target sequence flanked by DNA sequences homologous to the sequences flanking the endogenous target site. This results in the insertion of the new engineered target sequence into the chromosomal DNA of the cell proximal to the selectable gene. A sequence of interest can subsequently be targeted proximal to the same selectable gene using a nuclease, integrase, transposase, or recombinase that specifically recognizes the pre-integrated engineered target sequence. The modified cell is then grown in the presence of one or more compounds that co-amplify the selectable gene and the sequence of interest.

FIG. 4 depicts a strategy for inserting an engineered target sequence into a selectable gene (e.g., DHFR) with concomitant removal of a portion of the selectable gene. A site-specific endonuclease is first used to cleave the chromosomal DNA of the cell proximal to or within the selectable gene sequence. As shown in the figure, the endogenous target site is between exons 2 and 3 of the CHO DHFR gene (although the target site could be within any intron or exon, and the selectable gene could be any gene subject to amplification). The chromosomal DNA then undergoes homologous recombination with a first donor DNA (“donor DNA #1”) such that the sequence of the first donor DNA is inserted into the chromosomal DNA of the cell. As shown in the figure, this results in the replacement of the promoter and first two exons of DHFR by the new engineered target sequence (although the first donor DNA could replace more or less of the chromosomal DNA, such as only a portion of one exon). If such a replacement is made to all DHFR alleles in a cell, the resultant cell line is DHFR (−/−). A sequence of interest can subsequently be targeted proximal to the selectable gene in the cell line using an endonuclease, integrase, transposase, or recombinase that recognizes the engineered target sequence. As shown in the figure, the second donor DNA (“donor DNA #2”) comprises a sequence of interest as well as a promoter and the first two exons of DHFR. Proper targeting of this second donor DNA molecule results in the insertion of the sequence of interest at the engineered target sequence while simultaneously reconstituting a functional DHFR gene. Thus, properly targeted cell lines will be DHFR+ and can be selected using media deficient in hypoxanthine/thymidine. In addition, the sequence of interest can be co-amplified with the DHFR gene using MTX selection. The strategy diagrammed here for DHFR can be applied to any selectable gene in an amplifiable locus.

FIG. 5 depicts a strategy for inserting an engineered target sequence into an amplifiable locus with concomitant disruption of the coding sequence of a selectable gene. A site-specific endonuclease is first used to cleave the chromosomal DNA of the cell within the selectable gene coding sequence. As shown in the figure, the endogenous target site is in the third exon of the CHO GS gene. The chromosomal DNA then undergoes homologous recombination with a first donor DNA (“donor DNA #1”) such that the sequence of the first donor DNA is inserted into the chromosomal DNA of the cell. This results in the insertion of a new engineered target sequence into the GS coding sequence. If such an insertion occurs in both alleles of the GS gene and results in a frameshift mutation or otherwise disrupts the function of the GS gene, the resultant cell line will be GS (−/−). A sequence of interest can subsequently be targeted proximal to the amplifiable locus in the cell line using an endonuclease, integrase, transposase, or recombinase that recognizes the engineered target sequence. As shown in the figure, a second donor DNA (“donor DNA #2”) comprises a sequence of interest operably linked to a promoter as well as the 3′ portion of the GS coding sequence comprising exons 3, 4, 5, and 6. (The figure shows exons 3, 4, 5, and 6 joined into a single nucleotide sequence (i.e., with introns removed), but a sequence including either the naturally-occurring introns or one or more artificial introns could also be employed). Proper targeting of the second donor DNA molecule results in the insertion of the sequence of interest at the engineered target sequence while simultaneously reconstituting a functional GS gene. Thus, properly targeted cell lines will be GS+ and can be selected using media deficient in L-glutamine. In addition, the sequence of interest can be co-amplified with the GS gene using MSX selection. The strategy diagrammed here for GS can be applied to any selectable gene in an amplifiable locus.

FIG. 6 depicts a strategy for inserting an engineered target sequence into an amplifiable locus with concomitant disruption of the mRNA processing of a selectable gene. A site-specific endonuclease is first used to cleave the chromosomal DNA of the cell within an intron in the selectable gene. As drawn, the endogenous target site is in the intron between the third and fourth coding exons of the CHO GS gene. The chromosomal DNA then undergoes homologous recombination with a donor DNA #1 such that the sequence of the donor DNA is inserted in the chromosomal DNA of the cell. This results in the insertion of a new engineered target sequence into the GS coding sequence with an additional sequence that causes the GS mRNA to be processed incorrectly. As drawn, this additional sequence comprises a strong splice acceptor. If such an insertion occurs in both alleles of the GS gene, the artificial splice acceptor will cause the GS mRNA to splice incorrectly, resulting in a loss of GS expression and a requirement for growth in media containing L-glutamine. A sequence of interest can subsequently be targeted to the amplifiable locus in the cell line using an endonuclease, integrase, transposase, or recombinase that recognizes the engineered target sequence. As diagrammed, donor DNA #2 comprises a sequence of interest operably linked to a promoter as well as the 3′ portion of the GS coding sequence comprising exons 4, 5, and 6 joined into a single nucleotide sequence. (The figure shows exons 4, 5, and 6 joined into a single nucleotide sequence (i.e., with introns removed), but a sequence including either the naturally-occurring introns or one or more artificial introns could also be employed). Proper targeting of this donor DNA #2 molecule results in the insertion of the sequence of interest at the engineered target sequence while simultaneously reconstituting a functional GS gene. Thus, properly targeted cell lines will be GS+ and can be selected using media deficient in L-glutamine and the sequence of interest can be co-amplified with the GS gene using MSX selection. The strategy diagrammed here for GS can be applied to any selectable gene in an amplifiable locus.

FIG. 7(A) depicts a direct-repeat recombination assay for site-specific endonuclease activity. A reporter plasmid is produced comprising the 5′ two-thirds of the GFP gene (“GF”), followed by an endonuclease recognition sequence, followed by the 3′ two-thirds of the GFP gene (“FP”). Mammalian cells are transfected with this reporter plasmid as well as a gene encoding an endonuclease. Cleavage of the recognition sequence by the endonuclease stimulates homologous recombination between direct repeats of the GFP gene to restore GFP function. GFP+ cells can then be counted and/or sorted on a flow cytometer.

FIG. 7(B) depicts the results of the assay of FIG. 7(A) as applied to the CHO-23/24 and CHO-51/52 meganucleases. Light bars indicate the percentage of GFP+ cells when cells are transfected with the reporter plasmid alone (−endonuclease). Dark bars indicate the percentage of GFP+ cells when cells are co-transfected with a reporter plasmid and the corresponding meganuclease gene (+endonuclease). The assay was performed in triplicate and the standard deviation is shown.

FIG. 7(C) depicts alignment of sequences obtained from CHO cells transfected with mRNA encoding the CHO-23/24 meganuclease. The top sequence is from a wild-type (WT) CHO cell with the recognition sequence for CHO-23/24 underlined.

FIG. 7(D) depicts alignment of sequences obtained from CHO cells transfected with mRNA encoding the CHO-51/52 meganuclease. The top sequence is from a wild-type (WT) CHO cell with the recognition sequence for CHO-51/52 underlined.

FIG. 8(A) depicts a strategy for inserting an exogenous DNA sequence into the CHO DHFR locus using the CHO-51/52 meganuclease. CHO cells were co-transfected with mRNA encoding CHO-51/52 and a donor plasmid comprising an EcoRI site flanked by 543 base pairs of DNA sequence homologous to the region upstream of the CHO-51/52 recognition site and 461 base pairs of DNA sequence homologous to the region downstream of the CHO-51/52 recognition site. 48 hours post-transfection, genomic DNA was isolated and subjected to PCR using primers specific for the downstream region of the DHFR locus (dashed arrows).

FIG. 8(B) depicts PCR products that were cloned into pUC-19 and 48 individual plasmid clones and were digested with EcoRI and visualized on an agarose gel. 10 plasmids (numbered lanes) yielded a 647 base pair restriction fragment, consistent with cleavage of a first EcoRI site within the pUC-19 vector and a second EcoRI site in the cloned PCR fragment. These 10 plasmids were sequenced to confirm that they harbor a PCR fragment comprising a portion of the downstream DHFR locus with an EcoRI restriction site inserted into the CHO-51/52 recognition sequence. This restriction pattern was not observed when CHO cells were transfected with the donor plasmid alone.

FIG. 9(A) depicts a strategy for inserting an engineered target sequence into the CHO DHFR locus using the CHO-23/24 meganuclease. CHO cells were co-transfected with mRNA encoding CHO-23/24 and a donor plasmid comprising, in 5′ to 3′ orientation, an SV40 promoter, an ATG start codon, an FRT site, and a Zeocin-resistance (Zeo) gene. Zeocin-resistant cells were cloned by limiting dilution and screened by PCR to identify a clonal cell line in which the donor plasmid sequence integrated into the CHO-23/24 recognition site. After expansion, this cell line was co-transfected with a first plasmid encoding Flp recombinase operably linked to a promoter and second plasmid (donor plasmid #2) comprising a GFP gene under the control of a CMV promoter, an FRT site, and a hygromycin-resistance (Hyg) gene lacking a start codon. Flp-mediated recombination between FRT sites resulted in the integration of the donor plasmid #2 sequence into the engineered target sequence (i.e., the FRT site) such that a functional Hyg gene expression cassette was produced. FIG. 9(B) depicts PCR products from hygromycin-resistant clones produced as in (A) that were cloned by limiting dilution. Genomic DNA was extracted from 24 individual clones and PCR amplified using a first primer in the DHFR locus and a second primer in the Hyg gene (dashed lines). All 24 clones yielded a PCR product consistent with Hyg gene insertion into the engineered target sequence. FIG. 9(C) depicts GFP expression by the 24 clones produced in (B) using flow cytometry. All clones were found to express high levels of GFP with relatively little clone-to-clone variability.

FIG. 10. A GFP-expressing CHO line was produced by integrating a GFP gene expression cassette into the DHFR locus using an engineered target sequence strategy as shown in FIG. 9. This cell line was then grown in MTX as described in Example 2 to amplify the integrated GFP gene. (A) Flow cytometry plots showing GFP intensity on the Y-axis. In the pre-MTX cell line, GFP intensity averages approximately 2×10³whereas in the cell line grown in 250 nM MTX, a distinct sub-population is visible (circled) in which GFP intensity approaches 10⁴. (B) MTX treated cell lines were sorted by FACS to identify individual cells expressing higher amounts of GFP. Five such high-expression cells were expanded and GFP intensity was determined by flow cytometry. All five clones were found to have significantly increased GFP expression relative to the pre-MTX cell line. (C) Genomic DNA was isolated from the five clonal cell lines produced in (B) and subjected to quantitative PCR using a primer pair specific for the GFP gene. It was found that the five high-expression clones had significantly more copies of the GFP gene than the pre-MTX cell line. These results demonstrate that the copy number and expression level a transgene integrated downstream of CHO DHFR can amplify in response to MTX treatment.

FIG. 11. (A) A direct-repeat recombination assay, as in FIG. 5A. (B) The assay in (A) applied to the CHO-13/14 and CGS-5/6 meganucleases. Light bars indicate the percentage of GFP+ cells when cells are transfected with the reporter plasmid alone (−endonuclease). Dark bars indicate the percentage of GFP+ cells when cells are co-transfected with a reporter plasmid and the corresponding meganuclease gene (+endonuclease). The assay was performed in triplicate and standard deviation is shown. (C) Alignment of sequences obtained from CHO cells transfected with mRNA encoding the CGS-5/6 meganuclease. The top sequence is from a wild-type (WT) CHO cell with the recognition sequence for CGS-5/6 underlined. Dashes indicate deleted bases. Bases that are italicized and in bold are point mutations or insertions relative to the wild-type sequence. Note that the mutations observed in at least clones 6d4, 6g5, 3b7, 3d11, 3e5, 6f10, 6hH8, 6d10, 6d7, 3g8, and 3a9 are expected to knockout GS gene function.

2.1.1 Gene Targeting to the CHO DHFR Locus

The CHO DHFR locus is diagrammed in FIG. 2A. The locus comprises the DHFR gene coding sequence in 6 exons spanning 18 24,500 base pairs. The Msh3 gene is located immediately upstream of DHFR and is transcribed divergently from the same promoter as DHFR. A hypothetical gene, 2BE2121, can be found ˜65,000 base pairs downstream of the DHFR coding sequence. Thus, there is a ˜65,000 base pair region downstream of the DHFR gene that does not harbor any known genes and is a suitable location for targeting the insertion of sequences of interest. Target sites for insertion of a sequence of interest generally should not be selected which are <1,000 base pairs, and preferably not <5,000 base pairs from either the DHFR or 2BE2121 genes. This limits the window of preferred target sites to the region 1,000-60,000 base pairs, or 5,000-60,000 base pairs downstream of the DHFR coding sequence. The sequence of this region is provided as SEQ ID NO: 2.

The human and mouse DHFR loci have an organization similar to CHO locus. In both cases, the Msh3 gene is immediately upstream of DHFR but there is a large area devoid of coding sequences downstream of DHFR. In humans, the ANKRD34B gene is ˜55,000 base pairs downstream of DHFR while the ANKRD34B gene is ˜37,000 base pairs downstream of DHFR in mouse. Therefore, the genomic region downstream of DHFR is an appropriate location to insert genes of interest in CHO, human, and mouse cells and cell lines. Further, gene expression cassettes inserted into this region will be expressed at a high level, resistant to gene silencing, and capable of being amplified by treatment with MTX. Methods for amplifying the CHO cell DHFR locus are known in the art (see, e.g., Kellems, ed., Gene amplification in mammalian cells: a comprehensive guide. Marcel Dekker, New York, 1993) and typically involve gradually increasing the concentration of MTX in the growth media from 0 to as high as 0.8 mM over a period of several weeks.

2.1.2 Gene Targeting to the GS Locus

The CHO, human, and mouse glutamine synthetase (also known as “glutamate-ammonia ligase” or “GluL”) loci share a common organization (FIG. 2B). The TEDDM1 gene is immediately upstream of GS in all species (5,000 bp upstream in the case of human, ˜7,000 bp upstream in the case of mouse and CHO). The closest downstream gene, however, is ˜46,000 away in the case of human and ˜117,000 bp away in the case of mouse and CHO. Therefore, we predict that the chromosomal region 1,000-41,000 bp, or 5,000-41,000 bp downstream of GS in human cells and 1,000-100,000 bp, or 5,000-100,000 bp downstream of GS in mouse and CHO cells are appropriate locations to target the insertion of sequences of interest. Because DNA sites distal to the GS coding sequence are more likely to be susceptible to gene silencing, the chromosomal region 5,000-60,000 bp downstream of GS is a preferred location to target the insertion of a sequence of interest even in mouse or CHO cells. The sequence of this region from the CHO genome is provided as SEQ ID NO: 3Gene expression cassettes inserted into this region will be expressed at a high level, resistant to gene silencing, and capable of being amplified by treatment with MSX. Less-preferred regions include the chromosomal region between the TEDDM1 and GS genes or the region <10,000 bp downstream of TEDDM1 (see FIG. 2B). Methods for amplifying the GS locus are known in the art (Bebbington et al. (1992), Biotechnology (N Y). 10(2):169-75).

2.2 Engineered Endonucleases for Gene Targeting

A sequence of interest may be inserted into an amplifiable locus using an engineered site-specific endonuclease. Methods for generating site-specific endonucleases which can target DNA breaks to pre-determined loci in a genome are known in the art. These include zinc-finger nucleases (Le Provost et al. (2010), Trends Biotechnol. 28(3):134-41), TAL-effector nucleases (Li et al. (2011), Nucleic Acids Res. 39(1):359-72), and engineered meganucleases (WO 2007/047859; WO 2007/049156; WO 2009/059195). In one embodiment, the invention provides engineered meganucleases derived from I-CreI that can be used to target the insertion of a gene of interest to an amplifiable locus. Methods to produce such engineered meganucleases are known in the art (see, e.g., WO 2007/047859; WO 2007/049156; WO 2009/059195). In preferred embodiments, a “single-chain” meganuclease is used to target gene insertion to an amplifiable region of the genome. Methods for producing such “single-chain” meganucleases are known in the art (see, e.g., WO 2009/059195 and WO 2009/095742). In some embodiments, the engineered nuclease is fused to a nuclear localization signal (NLS) to facilitate nuclear uptake. Examples of nuclear localization signals include the SV40 NLS (amino acid sequence MAPKKKRKV (SEQ ID NO: 36)) which can be fused to the C- or, preferably, the N-terminus of the protein. In addition, an engineered nuclease may be tagged with a peptide epitope (e.g., an HA, FLAG, or Myc epitope) to monitor expression levels or localization or to facilitate purification.

2.3 Engineered Cell Lines with Sequences of Interest Targeted to Amplifiable Loci

In some embodiments, the invention provides methods for using engineered nucleases to target the insertion of transgenes into amplifiable loci in cultured mammalian cells. This method has two primary components: (1) an engineered nuclease; and (2) a donor DNA molecule comprising a sequence of interest. The method comprises contacting the DNA of the cell with the engineered nuclease to create a double strand DNA break in an endogenous recognition sequence in an amplifiable locus followed by the insertion of the donor DNA molecule at the site of the DNA break. Such insertion of the donor DNA is facilitated by the cellular DNA-repair machinery and can occur by either the non-homologous end-joining pathway or by homologous recombination (FIG. 1).

The engineered nuclease can be delivered to the cell in the form protein or, preferably, as a nucleic acid encoding the engineered nuclease. Such nucleic acid can be DNA (e.g., circular or linearized plasmid DNA or PCR products) or RNA. For embodiments in which the engineered nuclease coding sequence is delivered in DNA form, it should be operably linked to a promoter to facilitate transcription of the engineered nuclease gene. Mammalian promoters suitable for the invention include constitutive promoters such as the cytomegalovirus early (CMV) promoter (Thomsen et al. (1984), Proc Natl Acad Sci U S A. 81(3):659-63) or the SV40 early promoter (Benoist and Chambon (1981), Nature. 290(5804):304-10) as well as inducible promoters such as the tetracycline-inducible promoter (Dingermann et al. (1992), Mol Cell Biol. 12(9):4038-45).

In some embodiments, mRNA encoding the engineered nuclease is delivered to the cell because this reduces the likelihood that the gene encoding the engineered nuclease will integrate into the genome of the cell. Such mRNA encoding an engineered nuclease can be produced using methods known in the art such as in vitro transcription. In some embodiments, the mRNA is capped using 7-methyl-guanosine. In some embodiments, the mRNA may be polyadenylated.

Purified engineered nuclease proteins can be delivered into cells to cleave genomic DNA, which allows for homologous recombination or non-homologous end-joining at the cleavage site with a sequence of interest, by a variety of different mechanisms known in the art. For example, the recombinant nuclease protein can be introduced into a cell by techniques including, but not limited to, microinjection or liposome transfections (see, e.g., Lipofectamine™, Invitrogen Corp., Carlsbad, Calif.). The liposome formulation can be used to facilitate lipid bilayer fusion with a target cell, thereby allowing the contents of the liposome or proteins associated with its surface to be brought into the cell. Alternatively, the enzyme can be fused to an appropriate uptake peptide such as that from the HIV TAT protein to direct cellular uptake (see, e.g., Hudecz et al. (2005), Med. Res. Rev. 25: 679-736).

Alternatively, gene sequences encoding the engineered nuclease protein are inserted into a vector and transfected into a eukaryotic cell using techniques known in the art (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, Wiley 1999). The sequence of interest can be introduced in the same vector, a different vector, or by other means known in the art. Non-limiting examples of vectors for DNA transfection include virus vectors, plasmids, cosmids, and YAC vectors. Transfection of DNA sequences can be accomplished by a variety of methods known to those of skill in the art. For instance, liposomes and immunoliposomes are used to deliver DNA sequences to cells (see, e.g., Lasic et al. (1995), Science 267: 1275-76). In addition, viruses can be utilized to introduce vectors into cells (see, e.g., U.S. Pat. No. 7,037,492). Alternatively, transfection strategies can be utilized such that the vectors are introduced as naked DNA (see, e.g., Rui et al. (2002), Life Sci. 71(15): 1771-8).

General methods for delivering nucleic acids into cells include: (1) chemical methods (Graham et al. (1973), Virology 54(2):536-539; Zatloukal et al. (1992), Ann. N.Y. Acad. Sci., 660:136-153; (2) physical methods such as microinjection (Capecchi (1980), Cell 22(2):479-488, electroporation (Wong et al. (1982), Biochim. Biophys. Res. Commun. 107(2):584-587; Fromm et al. (1985), Proc. Nat'l Acad. Sci. USA 82(17):5824-5828; U.S. Pat. No. 5,384,253) and ballistic injection (Johnston et al. (1994), Methods Cell. Biol. 43(A): 353-365; Fynan et al. (1993), Proc. Nat'l Acad. Sci. USA 90(24): 11478-11482); (3) viral vectors (Clapp (1993), Clin. Perinatol. 20(1): 155-168; Lu et al. (1993), J. Exp. Med. 178(6):2089-2096; Eglitis et al. (1988), Avd. Exp. Med. Biol. 241:19-27; Eglitis et al. (1988), Biotechniques 6(7):608-614); and (4) receptor-mediated mechanisms (Curiel et al. (1991), Proc. Nat'l Acad. Sci. USA 88(19):8850-8854; Curiel et al. (1992), Hum. Gen. Ther. 3(2):147-154; Wagner et al. (1992), Proc. Nat'l Acad. Sci. USA 89 (13):6099-6103). In some preferred embodiments, 7-methyl-guanosine capped mRNA encoding the engineered nuclease is delivered to cells using electroporation.

The donor DNA molecule comprises a gene of interest operably linked to a promoter. In many cases, a donor molecule may comprise multiple genes operably linked to the same or different promoters. For example, donor molecules comprising monoclonal antibody expression cassettes may comprise a gene encoding the antibody heavy chain and a second gene encoding the antibody light chain. Both genes may be under the control of different promoters or they may be under the control of the same promoter by using, for example, an internal-ribosome entry site (IRES). Donor molecules may also comprise a selectable marker gene operably linked to a promoter to facilitate the identification of transgenic cells. Such selectable markers are known in the art and include neomycin phosphotransferase (NEO), hypoxanthine phosphoribosyltransferase (HPRT), glutamine synthetase (GS), dihydrofolate reductase (DHFR), and hygromycin phosphotransferase (HYG) genes.

In some embodiments, donor DNA molecules will additionally comprise flanking sequences homologous to the target sequences in the DNA of the cell. Such homologous flanking sequences comprise >3 or, preferably, >50 or, more preferably, >200 or, most preferably, >400 base pairs of DNA that are identical or nearly identical in sequence to the chromosomal locus recognized by the engineered nuclease (FIG. 1). Such homologous DNA sequences facilitate the integration of the donor DNA sequence into the amplifiable locus by homologous recombination.

The “donor” DNA molecule can be circular (e.g., plasmid DNA) or linear (e.g., linearized plasmid or PCR products). Methods for delivering DNA molecules are known in the art, as discussed above.

In some embodiments, the engineered nuclease gene and donor DNA are carried on separate nucleic acid molecules which are co-transfected into cells or cell lines. For example, the engineered nuclease gene operably linked to a promoter can be transfected in plasmid form simultaneously with a separate donor DNA molecule in plasmid or PCR product form. In an alternative embodiment, the engineered nuclease can be delivered in mRNA form with a separate donor DNA molecule in plasmid or PCR product form. In a third embodiment, the engineered nuclease gene and donor DNA are carried on the same DNA molecule, such as a plasmid. In a fourth embodiment, cells are co-transfected with purified engineered nuclease protein and a donor DNA molecule in plasmid or PCR product form.

Following transfection with the engineered nuclease and donor DNA, cells are typically allowed to recover from transfection (24-72 hours) before being cloned using methods known in the art. Common methods for cloning a genetically engineered cell line include “limiting dilution” in which transfected cells are transferred to tissue culture plates (e.g., 48 well, 96 well plates) at a concentration of <1 cell per well and expanded into clonal populations. Other cloning strategies include robotic clone identification/isolation systems such as ClonePix™ (Genetix, Molecular Devices, Inc., Sunnyvale, Calif.). Clonal cell lines can then be screened to identify cell lines in which the sequence of interest is integrated into the intended target site. Cell lines can easily be screened using molecular analyses known in the art such as PCR or Southern Blot. For example, genomic DNA can be isolated from a clonal cell line and subjected to PCR amplification using a first (sense-strand) primer that anneals to a DNA sequence in the sequence of interest and a second (anti-sense strand) primer that anneals to a sequence in the amplifiable locus. If the donor DNA molecule comprises a DNA sequence homologous to the target site, it is important that the second primer is designed to anneal to a sequence in the amplifiable locus that is beyond the limits of homology carried on the donor molecule to avoid false positive results. Alternatively, cell lines can be screened for expression of the sequence of interest. For example, if the sequence of interest encodes a secreted protein such as an antibody, the growth media can be sampled from isolated clonal cell lines and assayed for the presence of antibody protein using methods known in the art such as Western Blot or Enzyme-Linked Immunosorbant Assay (ELISA). This type of functional screen can be used to identify clonal cell lines which carry at least one copy of the sequence of interest integrated into the genome. Additional molecular analyses such as PCR or Southern blot can then be used to determine which of these transgenic cell lines carry the sequence of interest targeted to the amplifiable locus of interest, as described above.

The method of the invention can be used on any culturable and transfectable cell type such as immortalized cell lines and stem cells. In preferred embodiments, the method of the invention is used to genetically modify immortalized cell lines that are commonly used for biomanufacturing. This includes:

- 1. Hamster cell lines such as baby hamster kidney (BHK) cells and all variants of Chinese Hamster Ovary (CHO) cells, e.g., CHO-K1, CHO-S (Invitrogen Corp., Carlsbad, Calif.), DG44, or Potelligent™ (Lonza Group Ltd., Basel, Switzerland). Because the genome sequences of different hamster cell lines are very nearly identical, an engineered meganuclease which can be used to practice the invention in one hamster cell type (e.g., BHK cells) can generally be used to practice the invention in another hamster cell type (e.g., CHO-K1).
- 2. Mouse cell lines such as mouse hybridoma or mouse myeloma (e.g., NSO) cells. Because the genome sequences of different mouse cell lines are very nearly identical, an engineered meganuclease which can be used to practice the invention in one mouse cell type (e.g., mouse hybridoma cells) can generally be used to practice the invention in another mouse cell type (e.g., NSO).

3. Human cell lines such as human embryonic kidney cells (e.g., HEK-293 or 293S) and human retinal cells (e.g., PER.C6). Because the genome sequences of different human cell lines are very nearly identical, an engineered meganuclease which can be used to practice the invention in one human cell type (e.g., HEK-293 cells) can generally be used to practice the invention in another human cell type (e.g., PER.C6).

2.6 Pre-Engineered Cell Lines with Engineered Target Sequences in Amplifiable Loci.

In one embodiment, the invention provides cell lines which are pre-engineered to comprise a targetable “engineered target sequence” for gene insertion in an amplifiable locus in a mammalian cell line (FIG. 3). An engineered target sequence comprises a recognition sequence for an enzyme which is useful for inserting transgenic nucleic acids into chromosomal DNA sequences. Such engineered target sequences can include recognition sequences for engineered meganucleases derived from I-CreI (e.g., SEQ ID NO 37-87 from WO 2009/076292), recognition sequences for zinc-finger nucleases, recognition sequences for TAL effector nucleases (TALENs), the LoxP site (SEQ ID NO 4) which is recognized by Cre recombinase, the FRT site (SEQ ID NO: 5) which is recognized by FLP recombinase, the attB site (SEQ ID NO: 6) which is recognized by lambda recombinase, or any other DNA sequence known in the art that is recognized by a site specific endonuclease, recombinase, integrase, or transpose that is useful for targeting the insertion of nucleic acids into a genome. Thus, the invention allows one skilled in the art to use an engineered nuclease (e.g., a meganuclease, zinc-finger nuclease, or TAL effector nuclease) to insert an engineered target sequence into an amplifiable locus in a mammalian cell line. The resulting cell line comprising such an engineered target sequence at an amplifiable locus can then be contacted with the appropriate enzyme (e.g., a second engineered meganuclease, a second zinc-finger nuclease, a second TAL effector nuclease, a recombinase, an integrase, or a transposase) to target the insertion of a gene of interest into the amplifiable locus at the engineered target sequence. This two-step approach can be advantageous because the efficiency of gene insertion that can be achieved using an optimal meganuclease, zinc-finger nuclease, recombinase, integrase, or transposase might be higher than what can be achieved using the initial endonuclease (e.g., meganuclease or zinc-finger nuclease) that cleaves the endogenous target site to promote insertion of the engineered target sequence.

In an alternative embodiment, a cell line is produced by inserting an engineered target sequence into an amplifiable locus with the concomitant removal of all or a portion of the adjacent endogenous marker gene (FIG. 4). For example, an engineered meganuclease, zinc-finger nuclease, or TAL-effector nuclease can be used to remove the first two exons of both alleles of the CHO DHFR gene and replace them with an engineered target sequence for a different engineered meganuclease, ZFN, TALEN, recombinase, integrase, or transposase. The resulting cell line will be DHFR deficient and unable to grow in the absence of hypoxanthine/thymidine. Alternatively, for example, an engineered meganuclease, ZFN or TALEN can be used to remove the first exon of both alleles of the CHO GS gene and replace it with an engineered target sequence for a different engineered meganuclease, ZFN, TALEN, recombinase, integrase, or transposase (FIG. 4). The resulting cell line will be GS deficient and unable to grow in the absence of L-glutamine. Such a cell line is useful because a gene of interest can be inserted into the engineered target sequence in the pre-engineered cell line while simultaneously reconstituting the selectable gene (e.g., DHFR or GS). Thus, it is possible to select for transfectants harboring the gene of interest at the amplifiable locus using media conditions that select for DHFR+ or GS+ cells.

In an alternative embodiment, a cell line is produced in which an engineered target sequence is inserted into an amplifiable locus with disruption of the selectable gene (FIGS. 5, 6). This can be accomplished, for example, using a meganuclease which recognizes a DNA site in the coding sequence of the selectable gene. Such a meganuclease can be used to target the insertion of an engineered target sequence into the selectable gene coding sequence resulting in disruption of gene function by, for example, introducing a frameshift (FIG. 5). Alternatively, for example, an engineered target sequence can be inserted into an intron in the selectable gene sequence with an additional sequence that promotes improper processing of the selectable gene transcript (FIG. 6). Such sequences that promote improper processing include, for example, artificial splice acceptors or polyadenylation signals. Splice acceptor sequences are known in the art (Clancy (2008), “RNA Splicing: Introns, Exons and Spliceosome,” Nature Education 1:1) and typically comprise a 20-50 base pair pyrimidine-rich sequence followed by a sequence (C/T)AG(A/G). SEQ ID NO: 33 is an example of a splice acceptor sequence Likewise, polyadenylation signals are known in the art and include, for example, the SV40 polyadenylation signal (SEQ ID NO: 34) and the BGH polyadenylation signal (SEQ ID NO: 35). In some embodiments, the resulting cell line harboring the new engineered target sequence in all alleles of the selectable gene will be deficient in the function of the gene due to mis-transcription or mis-translation and will be able to grow only under permissive conditions. For example, an engineered target sequence can be inserted into the GS gene sequence using a meganuclease resulting in a cell line that is GS−/− that can grow only in the presence of L-glutamine in the growth media. In a subsequent step, a gene of interest can be inserted into the engineered target sequence while simultaneously reconstituting the selectable gene (e.g., DHFR or GS). Thus, it is possible to select for transfectants harboring the gene of interest at the amplifiable locus using media conditions that select for DHFR+ or GS+ cells.

2.5 Transgenic Cell Lines for Biomanufacturing.

In some embodiments, the invention provides transgenic cell lines suitable for the production of protein pharmaceuticals. Such transgenic cell lines comprise a population of cells in which a gene of interest, operably linked to a promoter, is inserted into the genome of the cell at an amplifiable locus wherein the gene of interest encodes a protein therapeutic. Examples of protein therapeutics include: monoclonal antibodies, antibody fragments, erythropoietin, tissue-type plasminogen activator, Factor VIII, Factor IX, insulin, colony stimulating factors, interferons (e.g., interferon-α, interferon-(β, and interferon-γ), interleukins (e.g., interleukin-2), vaccines, tumor necrosis factor, and glucocerebrosidase. Protein therapeutics are also referred to as “biologics” or “biopharmaceuticals.”

To be used for biomanufacturing, a transgenic cell line of the invention should undergo: (1) adaptation to serum-free growth in suspension; and (2) amplification of the gene of interest. In some embodiments, the invention is practiced on adherent cell lines which can be adapted to growth in suspension to facilitate their maintenance in shaker-flasks or stirred-tank bioreactors as is typical of industrial biomanufacturing. Methods for adapting adherent cells to growth in suspension are known in the art (Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). For regulatory reasons, it is generally necessary to further adapt biomanufacturing cell lines to chemically-defined media lacking animal-derived components (i.e., “serum-free” media). Methods for preparing such media and adapting cell lines to it are known in the art (Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). Such media can also be purchased commercially (e.g., CD-3 media for maintenance of CHO cells, available from Sigma-Aldrich, St. Louis, Mo.) and cells can be adapted to it by following the manufacturers' instructions. In some embodiments, the cell line is adapted to growth in suspension and/or serum-free media prior to being transfected with the engineered nuclease.

Lastly, methods for gene amplification are known in the art (Cell Culture and Upstream Processing, Butler, ed. (Taylor and Francis Group, New York, 2007)). In general, the process involves adding an inhibitor of a selectable gene product to the growth media to select for cells that express abnormally high amounts of the gene product due to gene-duplication events. In general, the concentration of inhibitor added to the growth media is increased slowly over a period of weeks until the desired level of gene amplification is achieved. Inhibitor is then generally removed from the media prior to initiating a bioproduction run to avoid the possibility of the inhibitor contaminating the protein therapeutic formulation. For example, the CHO DHFR locus can be amplified by slowly increasing the concentration of MTX in the growth media from 0 mM to as high as 0.8 mM over a period of several weeks. The GS locus can, likewise, be amplified by slowly increasing the concentration of MSX in the media from 0 μM to as high as 100 μM over a period of several weeks. Methods for evaluating gene amplification are known in the art and include Southern Blot and quantitative real-time PCR (rtPCR). In addition, or as an alternative, expression levels of the sequence of interest, which are generally correlated to gene copy number, can be evaluated by determining the concentration of protein therapeutic in the growth media using conventional methods such as Western Blot or ELISA.

Following cell line production, adaptation, and amplification, protein therapeutics can be produced and purified using methods that are standard in the biopharmaceutical industry.

EXAMPLES

This invention is further illustrated by the following examples, which should not be construed as limiting. Those skilled in the art will recognize, or be able to ascertain, using no more than routine experimentation, numerous equivalents to the specific substances and procedures described herein. Such equivalents are intended to be encompassed in the scope of the claims that follow the examples below. Example 1 refers to engineered meganucleases that can be used to target the insertion of a gene of interest downstream of the DHFR gene in CHO cells. Example 2 refers to engineered meganucleases that can be used to target the insertion of an engineered target sequence into the CHO DHFR gene with concomitant removal of DHFR exons 1 and 2. Example 2 also refers to engineered meganucleases that can be used to target the insertion of an engineered target sequence into the CHO GS gene. Example 3 refers to meganucleases that can be used to target the insertion of a gene of interest downstream of the GS gene in CHO cells.

EXAMPLE 1

Targeted Gene Insertion into the CHO DHFR Locus using Engineered Meganucleases

The CHO genomic DNA sequence 10,000-55,000 base pairs downstream of the DHFR gene was searched to identify DNA sites amenable to targeting with engineered meganucleases. Two sites (SEQ ID NO: 7 and SEQ ID NO: 8) were selected which are, respectively, 35,699 and 15,898 base pairs downstream of the DHFR coding sequence (Table 2).

TABLE 2

Example Recognition Sites For Engineered Meganucleases in the CHO DHFR
Locus.

SEQ ID		Location Relative to CHO
NO:	Target Site Sequences	DHFR Coding Sequence

7	5′-TAAGGCCTCATATGAAAATATA-3′	35,699 bp downstream

8	5′-ATAGATGTCTTGCATACTCTAG-3′	15,898 bp downstream

1. Meganucleases that Recognize SEQ ID NO: 7 and SEQ ID NO: 8

An engineered meganuclease (SEQ ID NO: 9) was produced which recognizes and cleaves SEQ ID NO: 7. This meganuclease is called “CHO-23/24”. A second engineered meganuclease (SEQ ID NO: 10) was produced which recognizes and cleaves SEQ ID NO: 8. This meganuclease is called “CHO-51/52.” Each meganuclease comprises an N-terminal nuclease-localization signal derived from SV40, a first meganuclease subunit, a linker sequence, and a second meganuclease subunit.

2. Site-Specific Cleavage of Plasmid DNA by Meganucleases CHO-23/24 and CHO-51/52

CHO-23/24 and CHO-51/52 were evaluated using a direct-repeat recombination assay as described previously (Gao et al. (2010), Plant J. 61(1):176-87, FIG. 7A). A defective GFP reporter cassette was generated by first cloning a 5 ′ 480 bp fragment of the GFP gene into NheI/HindIII-digested pcDNA5/FRT (Invitrogen Corp., Carlsbad, Calif.) resulting in the plasmid pGF. Next, a 3′ 480 bp fragment of the GFP gene (including a 240 bp sequence duplicated in the 5′ 480 bp fragment) was cloned into BamHI/XhoI-digested pGF. The resulting plasmid, pGFFP, consists of the 5′ two-thirds of the GFP gene followed by the 3′ two-thirds of the GFP gene, interrupted by 24 bp of the pcDNA5/FRT polylinker. To insert the meganuclease recognition sites, complementary oligonucleotides comprising the sense and anti-sense sequence of each recognition site were annealed and ligated into HindIII/BamHI-digested pGFFP.

The coding sequences of the engineered meganucleases were inserted into the mammalian expression vector pCP under the control of a constitutive (CMV) promoter. Chinese hamster ovary (CHO) cells at approximately 90% confluence were transfected in 96-well plates with 150 ng pGFFP reporter plasmid and 50 ng of meganuclease expression vector or, to determine background, 50 ng of empty pCP, using Lipofectamine 2000 according to the manufacturer's instructions (Invitrogen Corp., Carlsbad, Calif.). To determine transfection efficiency, CHO cells were transfected with 200 ng pCP GFP. Cells were washed in PBS 24 h post-transfection, trypsinized and resuspended in PBS supplemented with 3% fetal bovine serum. Cells were assayed for GFP activity using a Cell Lab Quanta SC MPL flow cytometer and the accompanying Cell Lab Quanta analysis software (Beckman Coulter, Brea, Calif.).

Results are shown in FIG. 7B. It was found that both of the engineered meganucleases were able to cleave their intended recognition sites significantly above background within the context of a plasmid-based reporter assay.

3. Site-Specific Cleavage of CHO DHFR Locus by Meganucleases CHO-23/24 and CHO-51/52

To determine whether or not CHO-23/24 and CHO-51/52 are capable of cleaving their intended target sites in the CHO DHFR locus, we screened genomic DNA from CHO cells expressing either CHO-23/24 or CHO-51/52 to identify evidence of chromosome cleavage at the intended target site. This assay relies on the fact that chromosomal DNA breaks are frequently repaired by NHEJ in a manner that introduces mutations at the site of the DNA break. These mutations, typically small deletions or insertions (collectively known as “indels”) leave a telltale scar that can be detected by DNA sequencing (Gao et al. (2010), Plant J. 61(1):176-87).

CHO cells were transfected with mRNA encoding CHO-23/24 or CHO-51/52. mRNA was prepared by first producing a PCR template for an in vitro transcription reaction (SEQ ID NO: 20 and SEQ ID NO: 21). Each PCR product included a T7 promoter and 609 bp of vector sequence downstream of the meganuclease gene. The PCR product was gel purified to ensure a single template. Capped (m7G) RNA was generated using the RiboMAX T7 kit (Promega Corp., Fitchburg, Wis.) according to the manufacturer's instructions and. Ribo m7G cap analog (Promega Corp., Fitchburg, Wis.) was included in the reaction and 0.5 μg of the purified meganuclease PCR product served as the DNA template. Capped RNA was purified using the SV Total RNA Isolation System (Promega Corp., Fitchburg, Wis.) according to the manufacturer's instructions.

1.5×10⁶CHO-K1 cells were nucleofected with 3×10¹²copies of CHO-23/24 or CHO-51/52 mRNA (2×10⁶copies/cell) using an Amaxa Nucleofector II device (Lonza Group Ltd., Basel, Switzerland) and the U-23 program according to the manufacturer's instructions. 48 hours post-transfection, genomic DNA was isolated from the cells using a FlexiGene kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The genomic DNA was then subjected to PCR to amplify the corresponding target site. In the case of cells transfected with mRNA encoding CHO-23/24, the forward and reverse PCR primers were SEQ ID NO: 16 and SEQ ID NO: 17. In the case of cells transfected with mRNA encoding CHO-51/52, the forward and reverse PCR primers were SEQ ID NO: 18 and SEQ ID NO: 19. PCR products were gel purified and cloned into pUC-19. 40 plasmids harboring PCR products derived from cells transfected with CHO-23/24 mRNA were sequenced, 13 of which were found to have mutations in the CHO-23/24 target site (FIG. 7C). 44 plasmids harboring PCR products derived from cells transfected with CHO-51/52 mRNA were sequenced, 10 of which were found to have mutations in the CHO-51/52 target site (FIG. 7D). These results indicate that CHO-23/24 and CHO-51/52 are able to cut their intended target sites downstream of the CHO DHFR gene.

4. Site-Specific Integration into the CHO DHFR Locus using an Engineered Meganuclease

To evaluate the efficiency of DNA insertion into the CHO DHFR locus using an engineered meganuclease, we prepared a donor plasmid (SEQ ID NO: 11) comprising an EcoRI restriction enzyme site flanked by DNA sequence homologous to the CHO-51/52 recognition site (FIG. 8A). Specifically, the donor plasmid of SEQ ID NO: 11 comprises a pUC-19 vector harboring a homologous recombination cassette inserted between the Kpnl and HindIII restriction sites. The homologous recombination cassette comprises, in 5′- to 3′-order: (i) 543 base pairs of DNA identical to the sequence immediately upstream of the CHO-51/52 cut site, including the upstream half-site of the CHO-51/52 recognition sequence and the four base pair “center sequence” separating the two half-sites comprising the CHO-51/52 recognition sequence; (ii) an EcoRI restriction enzyme site (5′-GAATTC-3′); and iii) 461 base pairs of DNA identical to the sequence immediately downstream of the CHO-51/52 cut site, including the downstream half-site of the CHO-51/52 recognition sequence and the four base pair “center sequence” separating the two half-sites comprising the CHO-51/52 recognition sequence. Note that this results in a duplication of the four base pair “center sequence” (5′-TTGC-3′) to maximize the likelihood of strand invasion by the 3′ overhangs generated by CHO-51/52 cleavage. We have discovered that donor plasmids comprising such a duplication of the center sequence are optimal substrates for gene targeting by homologous recombination.

mRNA encoding CHO-51/52 was prepared as described above. 1.5×10⁶CHO-K1 cells were nucleofected with 3×10¹²copies of CHO 51-52 mRNA (2×10⁶copies/cell) and 1.5 μg of the donor plasmid (SEQ ID NO: 11). Nucleofection was performed using an Amaxa Nucleofector II device (Lonza Group Ltd., Basel, Switzerland) and the U-23 program according to the manufacturer's instructions. 48 hours post-transfection, genomic DNA was isolated from the cells using a FlexiGene kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The DNA was subjected to PCR using primers flanking the CHO-51/52 recognition site (SEQ ID NO: 18 and SEQ ID NO: 19). Importantly, these primers are beyond the limits of homologous sequence carried in the donor plasmid and, therefore, will amplify only the chromosomal DNA sequence and not the donor plasmid. PCR products were cloned into a pUC-19 plasmid and 48 clones were purified and digested with EcoRI (FIG. 8B). 10 plasmids yielded a restriction pattern consistent with the insertion of an EcoRI site into the CHO-51/52 recognition sequence. These data demonstrate that it is possible to use CHO-51/52 to precisely insert DNA downstream of the CHO DHFR gene at SEQ ID NO: 8.

5. Site-Specific Integration of an Engineered Target Sequence into the CHO DHFR Locus

A donor plasmid (SEQ ID NO: 25) was produced comprising an FRT sequence (SEQ ID NO: 5) adjacent to a zeocin resistance gene under the control of an SV40 early promoter (FIG. 9A). This cassette was flanked by DNA sequence homologous to the CHO DHFR locus immediately upstream or downstream of the CHO-23/24 recognition sequence. CHO cells were co-transfected with this donor plasmid and mRNA encoding CHO-23/24 as described above. 72 hours post-transfection, zeocin-resistant cells were cloned by limiting dilution and expanded for approximately 3 weeks. Clonal populations were then screened by PCR using a first primer in the SV40 promoter (SEQ ID NO: 26) and a second primer in the DHFR locus (SEQ ID NO: 16) to identify cell lines carrying the FRT/Zeocin sequence downstream of the DHFR gene. One such cell line carrying the integrated FRT Insertion target sequence was subsequently co-transfected with a second donor plasmid (SEQ ID NO: 27) and a plasmid encoding Flp recombinase. SEQ ID NO: 27 comprises a GFP gene under the control of a CMV promoter, a FRT sequence, and a non-functional hygromycin resistance gene lacking an ATG start codon. Flp-stimulated recombination between FRT sites in the genome and the plasmid resulted in the incorporation of the entire plasmid sequence into the CHO genome at the site of the engineered target sequence. Such recombination restored function to the hygromycin-resistance gene by orientating it downstream of an ATG start codon integrated as part of the engineered target sequence. As such, successful integrations could be selected using hygromycin.

Hygromycin-resistant cells were cloned by limiting dilution and 24 individual clonal lines were assayed by PCR using a first primer in the hygromycin-resistance gene (SEQ ID NO: 28). All 24 clones yielded the expected PCR product (FIG. 9B), indicating that the GFP gene expression cassette was successfully inserted into the DHFR engineered target sequence in all cases. The 24 cell lines were then evaluated by flow cytometry and were found to express consistent levels of GFP (FIG. 9C).

6. Transgene Amplification

A GFP-expressing CHO line produced as described above was seeded at a density of 3×10⁵cells/mL in 30 mL of media containing 50nM MTX. Cells were cultured for 14 days before being re-seeded at the same density in media containing 100 nM MTX. Cells were cultured for another 14 days before being re-seeded in media containing 250 nM MTX. Following 14 days in culture, GFP expression in the treated cells was evaluated by flow cytometry and compared to GFP expression in the parental (pre-MTX) cell population (FIG. 10A). It was found that the MTX-treated cells had a distinct sub-population in which GFP expression was significantly increased. Individual high-expression cells from the MTX-treated population were then isolated using a cell sorter and 5 clones were expanded for 14 days in the absence of MTX. GFP expression in the 5 clonal cell populations was then evaluated by flow cytometry and compared with the parental (pre-MTX) cell population. It was found that the MTX-treated clones had approximately 4-6 times the GFP intensity as the pre-MTX cells. Quantitative PCR was then performed using a primer set specific for the GFP gene and it was found that the MTX-treated clones all had approximately 5-9 times as many copies of the GFP gene as the pre-MTX population. These data provide conclusive evidence that a transgene inserted downstream of the CHO DHFR gene can be amplified by treatment with MTX.

7. Stability of Gene Amplification

The five clonal cell lines expressing high levels of GFP that were produced in (6) above were then passaged for a period of 14 weeks in media with or without 250 nM MTX to evaluate the stability of gene amplification. GFP intensity was determined on a weekly basis and the quantitative PCR assay used to determine GFP gene copy number described above was repeated at the end of the 14 week evaluation period. As expected, the clones passaged in media with MTX maintained a high level of GFP expression with no clone deviating more than 20% from the GFP intensity determined in week 1. Quantitative PCR revealed that gene copy number likewise deviated by less than 20% for all clones. Surprisingly, gene amplification was equally stable in cell lines grown in media lacking MTX. Contrary to what would have been predicted based on the existing art, GFP gene expression was not reduced by more than 18% in any of the five cell lines over the 14 week evaluation period. Gene copy number determined by quantitative PCR was also stable with less than 24% deviation over time for all of the cell lines. These results indicate that a transgene amplified in the CHO DHFR locus is stable for an extended period of time, obviating the need to grow the cells in toxic selection agents that that could contaminate bioproduct formulations.

EXAMPLE 2

Insertion of an Engineered Target Sequence into the CHO DHFR or GS Gene Coding Regions

As diagrammed in FIG. 4, an alternative method for targeting a sequence of interest to an amplifiable locus involves the production of a cell line in which a portion of a selectable gene is replaced by an engineered target sequence. The advantage of this approach is that the subsequent insertion of a sequence of interest can be coupled with reconstitution of the selectable gene so that cell lines harboring the properly targeted sequence of interest can be selected using the appropriate media conditions. A cell line harboring such an engineered target sequence can be produced using nuclease-induced homologous recombination. In this case, a site-specific endonuclease which cuts a recognition sequence near or within the selectable gene sequence is preferred.

1. Engineered Meganucleases that Cut within the DHFR or GS Genes.

A meganuclease called “CHO-13/14” (SEQ ID NO: 12) was produced which cuts a recognition sequence in the CHO DHFR gene (SEQ ID NO: 13). The recognition sequence is in an intron between Exon 2 and Exon 3 of CHO DHFR. A meganuclease called “CGS-5/6” (SEQ ID NO: 14) was produced which cuts a recognition sequence in the CHO GS gene (SEQ ID NO: 15). Each meganuclease comprises an N-terminal nuclease-localization signal derived from SV40, a first meganuclease subunit, a linker sequence, and a second meganuclease subunit.

2. Site-Specific Cleavage of Plasmid DNA by Meganucleases CHO-13/14 and CGS-5/6

CHO-13/14 and CGS-5/6 were evaluated using a direct-repeat recombination assay as described in Example 1 (FIG. 7A). Both meganucleases were found to efficiently cleave their intended recognition sequences within the context of a plasmid-based reporter assay (FIG. 7B).

3. Site-Specific Cleavage of the CHO GS Gene by CGS-5/6

CHO cells were transfected with mRNA encoding CGS-5/6. mRNA was prepared by first producing a PCR template for an in vitro transcription reaction (SEQ ID NO: 22). Each PCR product included a T7 promoter and 609 bp of vector sequence downstream of the meganuclease gene. The PCR product was gel purified to ensure a single template. Capped (m7G) RNA was generated using the RiboMAX T7 kit (Promega Corp., Fitchburg, Wis.) according to the manufacturer's instructions and. Ribo m7G cap analog (Promega Corp., Fitchburg, Wis.) was included in the reaction and 0.5 μg of the purified meganuclease PCR product served as the DNA template. Capped RNA was purified using the SV Total RNA Isolation System (Promega Corp., Fitchburg, Wis.) according to the manufacturer's instructions.

1.5×10⁶CHO-K1 cells were nucleofected with 3×10¹²copies of CGS-5/6 using an Amaxa Nucleofector II device (Lonza Group Ltd., Basel, Switzerland) and the U-23 program according to the manufacturer's instructions. 48 hours post-transfection, genomic DNA was isolated from the cells using a FlexiGene kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The genomic DNA was then subjected to PCR to amplify the CGS-5/6 target site using the primers of SEQ ID NO: 23 and SEQ ID NO: 24. The PCR products were cloned into a pUC-19 plasmid and 94 plasmids harboring PCR products were digested with the BssSI restriction enzyme, which recognized and cuts the sequence 5′-CTCGTG-3′ found within the CGS-5/6 recognition sequence. 17 plasmids were found to be resistant to BssSI, suggesting that the CGS-5/6 recognition site was mutated. These 17 plasmids were sequenced to confirm the existence of indels or point mutations within the CGS-5/6 recognition sequence (FIG. 7C). These results indicate that CGS-5/6 is able to cut its intended target site within the CHO GS gene. Because the CGS-5/6 recognition sequence is within an exon in the GS coding sequence, many of the mutations introduced by CGS-5/6 are expected to frameshift the GS gene. Therefore, CGS-5/6 is useful for knocking-out CHO GS to produce GS (−/−) cell lines. Such cell lines are useful because they are amenable to GS selection and amplification for producing biomanufacturing cell lines.

EXAMPLE 3

Meganucleases for Targeting Gene Insertion to the CHO GS Locus

1. Engineered Meganucleases that Cut Downstream of the CHO GS Gene.

An engineered meganuclease called “CHOX-45/46” (SEQ ID NO: 29) was produced which recognizes a DNA sequence (SEQ ID NO: 30) approximately 7700 base pairs downstream of the CHO GS coding sequence. CHO cells were transfected with mRNA encoding CHOX-45/46 as described in Example 2. 72 hours post transfection, genomic DNA was extracted from the transfected cell pool and the region downstream of the CHO GS gene was PCR amplified using a pair of primers (SEQ ID NO: 31 and SEQ ID NO: 32) flanking the CHOX-45/46 recognition sequence. PCR products were then cloned and 24 cloned products were sequenced. It was found that 14 of the 24 clones PCR products (58.3%) had large mutations in the sequence consistent with meganuclease-induced genome cleavage followed by mutagenic repair by non-homologous end-joining. From these data, we conclude that the CHOX-45/46 meganuclease is able to specifically cleave a DNA site downstream of the CHO GS gene coding sequence and will likely be able to target the insertion of transgenes to this amplifiable locus in the genome.


SEQUENCE LISTING

SEQ ID NO: 1 (wild-typeI-CreI, Genbank Accession # P05725)

1	MNTKYNKEFL LYLAGFVDGD GSIIAQIKPN QSYKFKHQLS LTFQVTQKTQ RRWFLDKLVD

61	EIGVGYVRDR GSVSDYILSE IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE QLPSAKESPD

121	KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLSEKKK SSP

SEQ ID NO: 2 (Chromosomal region 5,000-55,000 base pairs downstream of CHO DHFR

gene coding sequence)

1	taaaactcaa gatgccagct ttgtagctag cttaggaaac aaagtagtaa aaaataataa

61	tgggtgggtg aaggtctgaa gcatttacag agttctctca agacaaagca cagaggctgg

121	tggccacata acttggcaac tgatttgggg gaacagaata caagaaagga aatttaaata

181	ctgtttttct caatgttgaa ctatatgggc atagtcacag ctgcctaacc tatagagact

241	ggaagctgga acctcggcta tctaagatag aataatcaag aaatgtcaat tatttgagaa

301	aaacatcagg aataaatagc tgctaagtta caagttggtg ctttagacat ttggagagga

361	taggatgggg gctcccagac ctggggctcc ctaataaagc tgtgctggcc tacaagttcc

421	agggatcctc cagtccatgc ctcccactgt tgggactgcg ggcgatggtt tctgacgtgg

481	gtactgaggg cctgaactgt ccacacactt aagccacacg ccttttactg agtcatctcc

541	tcatctcaga acattttcct ttaatctttc ttaatgaaaa ggtcgcattt cttccgaggg

601	ctagcctcct gttactctct atacatgtca cataaaacta catgaaaact ttgaaggcac

661	tatatgtcca tactcagatg aaaagccatt agctgtggtc atacaaaacc ccacagacca

721	actgttggga aacatcagac ttttttcctg cagcgcctgc cctgatcttc cacagagaat

781	tcagtctcac tttttccagg atgacttctg aactatcacc gtaagatgag aatttgaaac

841	aaagatgtaa gtaatgaact tcatgtgttc tgaacacaca gcttagtgca ttgaaattac

901	gtaacacccg cttccttata agccatttct caaaatgttc ccattacacc tgcatcgggg

961	atgggtccca gaatcttcct tttaaataaa caccccagag gattctgaag ctagaacacc

1021	aaggactgac agagagaagc atgcctgtgg gcgactccag acacctggga gctgcctgct

1081	ttcttgctac tgatttagaa ggcatttgcc cccgaatggg gctgggggac tgtcactatt

1141	tctcattctc gggactttga aaggaagcaa aacagaaaac catgcaaagt ataagccacc

1201	atggaataat ggcagacgat ccggttgtgc agattagatt ttacatattg ctgattttga

1261	agctaaagac ctttcacttc ttaaatatat aataaaattc atacaagagt attttgtgta

1321	ggtaactcag tcagatacaa ggtaagcaaa gtaaatgata ggtgcccctt aacaaaatgc

1381	attctcatag ttcatttatc aattatagaa atggtggact ggagggaagg cttgaggtca

1441	ggagaatgtg ctgctcttcc agacagcccg ggttcttttc cccagcaatc tgggactcac

1501	gtctgcctgt agctccaggc ccaggggatc tggcaccttc ttctggcctc tgcaggcacc

1561	catacacaca tggcatacac acacatacac aaattctaaa attaaatagt aggttgtagg

1621	cctacacaaa aacatgcata cattaactaa ataattaata gttaataaat aaaaatcaac

1681	caaacacata cactgattaa gtaacatgac tctgtaaggt caaaggcggc tgaccagctg

1741	tgggaagggt taaataataa caatcacctt tgaaagactg gacctggtga ttaaggatgt

1801	tccagctgtg tcgtggatga gaaatcaaat gcataattga atgagtgcca ggaatagaac

1861	tggagacttt ctggtgagaa tgcttttact ggcagtagag tccctgtcta aacaggagag

1921	agacctgcag tagccctgtg gcggccctgc agtggccctg tgatggctct gcagttgtac

1981	tcttcctgag ataggagaca cactagagag tgtttctaat gagcagctcc tgtactttct

2041	gttcccctgg agaccgcacg tgtttctccg ataatacatt gacatttctg ttaaaccatt

2101	ttcttcttgg aacaaaaatg gagaacaaat cagattggtg tgtggtcttt taaataactt

2161	ggtacttaat aacacaaaac aaaattatca gaggctggat tttaggtgct ctcagcatct

2221	gccacccctg agccatcagt caggtcttgg aggaacaatc tccaaggaga aaacagttct

2281	gtcctcagaa aagctggagg aatatgagat tttctacagc actcatagca aaatcattta

2341	cggaagggat cctgagtaag atggcctctt cttcatcaca tggtcatagt ctgcttcaat

2401	ggggagaata gttcaatcta gcatcgagaa atcgaaggtt cccttttgac tggcaatgcc

2461	ccatagatag atagatatag attatgtata tattgtgtaa aacacacgta tgtatatata

2521	atacacatac atgtatgtgt atacatacat acatacatac atacatacat acatacatac

2581	atacatagat acgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt

2641	ttgagactga gtttctctac tatgtagctc tggctgtcct gaaagttgct aagtagacca

2701	gactggccag accagatcca ccctcctctg cctcctaagt gctgagatta aaggcctgca

2761	cccaccccca cccagcccat cttatatttt gcttcatttc aaagtaagct ctatgcatca

2821	tttattcctg catattatta gccatggttc agtcttgttt gtgttttgga atatttactt

2881	aacaaaactt gaaaaacatt tttcaagatt tgtttgtttt taagatttat ttatttatta

2941	tgtataataa taaatattat tatgaaaaac ggtgttctgc ctgcagggca gaagagggca

3001	ccagattgaa ttacagatgg ttgtgagcca ccatgtggtt gctgggactt gaactcagga

3061	cctctggaag agcagccagt acttttaact gctgagccat ctccccaggc ccaaaataca

3121	catcttaagt gtattgccac aagcatacat cttcatggcc caatcttctg tccatcactt

3181	cagacagctc tccttctttc cctggccagt cacaacaccc tcagctatca ggaaaggccc

3241	tatgggggtt gttttgtttt cccactccag ttcccttgcc tgctctgacc tcatgagtag

3301	actcatacag gatgtgctca cttcacttgg gatgatttct ttttcaccca ttgttgctct

3361	gcccagaatt tgttcctttt tattgtctta gtgttaatca actatcaaag ccagcaacaa

3421	aaaatagtag ggaaactttt ttgatagggt aaacctgatt gattgcaggc tttggttgcc

3481	ttgtttggtc tatccccttg agagtccctt acaatgtgag ttagttagtg gctgctaact

3541	agttgaatct caacttcctt tttctttaat gtgggtattt gtaaggaata gcccccttaa

3601	atctagattc tgttctcaaa tcaagcaagc tcaaggctgt aagcatggat tcaccaactt

3661	tcctgctcaa ggaatttaaa tgtctggtct ccatcatatt actttaatag taatagttta

3721	ttatacacat gtgccagctg tatatccctt ttcttcttga tggacctatg aactctgttg

3781	aggtgagatt tgaacccctt agaaggtgct agagaagagg tacctgatgg tcaaggcaag

3841	gctgatactt attcatgggt cccacatctg ctaatgtaag caataacaga taatatgctt

3901	tgtgtttaga cccacagtgg ttgcatgtac actaagtatg tatcatcatt gtcttatcgt

3961	tcctttagaa tacagctaat aattatgacc gctattctca tagcatttat attatatgag

4021	cattgtaaat tattttgaaa tgctttaaga tatacttgag aactatgcat atcatgcgta

4081	tgttgttcta ccagctggga ccttgaaatg agatcccttg aggccagcat aaagagaaag

4141	ttttcatctc aaacaaacaa aagatacact tgataataga tgagggataa atgtcatact

4201	ttttatatag tgattgagaa tctacagatt tgggtatcct ggtcacttag gagaccaagg

4261	gaggactatt agctctagag ctatgaactt tatctccaga ttccaaagcc aatacaaact

4321	ctagccaagt tggggtgctg ttacctgtat ccctctgtca aattccaagt gttttcacca

4381	cctttactgt atctttccaa ctgttctctt ttataaccac acatagttca tggtctttcc

4441	ttctctcact tgactgtgga gtaacctaac ttgcgtgttt ccagttttcg atctcttcct

4501	taaatctaca ctagttaacc acaaagaccc tcttttctga gctgtgtcta ttctatcact

4561	gtcaccattc cttaatgctc tcccagatgc agccaaactt cactttgggc ttgagagtct

4621	tctccaggtg acagtgacta atgtctccag attgagcatc taccatctac cctgtgtatt

4681	acacatgaat agccttagct tttcagcaat agacagatag atccatagtt agccatgtca

4741	acacccttct tcatgctgtt ctcacagtaa taagtcctaa ttcctgtttt ctcccatcta

4801	aactcaaccc tgtcctaaat accttactca aatcctaatt gtatctcttc cacaaacatt

4861	tcccccttct ctccattaca aggtggaaac tcagagatcc aggtgtcttg catgttgttg

4921	attctgtcct caacaaggaa ttccccaggt tcctgcacga aggaaagcat ggaggaccat

4981	acttgaggct actggtgtag tgggaagaca ggcccaaacc atgtcacaga aacccatcac

5041	cagaaagttg ggggaggcag cccagttgtg gagcaggaga aggagaaaac aggcttgggg

5101	aactgctagc tatgctttgt cacagtcaca agaaaaaagg gccctagcct ggcctacata

5161	ttctacaact tcctgaatct ttgctctgaa atgaagaggt ttggatggct gtctgggaat

5221	tcatcttgct tgcagtgaag ctccttgggg tatttgaaac caggaagttt gaaggagttg

5281	atgctaattg ttttctaaag tgtgtgagga gtactggcag agttcaggcc ttgtgaggaa

5341	agaatcctat atctagtctg cactcctggg cacatgagac attcagctat ctcccttata

5401	aagcatagaa agtactcttg tacttgacac agaaataatt tcagtatgta gagcattaaa

5461	aaaaagtatg aatgacttag agagatggct catcagttaa aagcacatac tgctcttcca

5521	gaggtcctga gttcaattcc caacaaccac aaaaactcac acatatgcat gtgattaaaa

5581	ataaaatctc tctctctctc tctctctgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgag

5641	tgtgtgtgtg tgtgtgtgag tgtgtgagtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg

5701	tgtgtgtgtg tgtgtgtgtg tgtgtgatgg tgggcttgtg tttgcaagcc cagcactagg

5761	gagttaaggc ctcactcaca gtgccaggcc agtctaggtt acagtgagtt ctagacagcc

5821	caagctacag agtaaggtac tgacaaagaa agaaagaaag aaaaaaagaa agaaagaaag

5881	aaagaaagaa agaaagaaag aaagaaagaa agaaagaaag aaagaaagga gagaggtgag

5941	agggagggaa ggaactggaa gggggaagga gggaaagaaa agaaaaagaa acaaccaaag

6001	gaacaaacca ctgtatgcca ttatacatta gctttgggct ttacaggtta tacactctat

6061	attgtcatag ccaatgtctc aatattccat aagaggtgtc tagttgtggg tatgttcttt

6121	cttagtcctt ttatttagac tacatgacct gtttttgcct aataggccat tagtaatact

6181	gacttctcca catgctgccc tcaaaactta ctcctggaag atctttattt aagctatgaa

6241	cgaaaatctt aaccctgtga cctgccaccc agaatgcctc tgggaacaac ctcaggcaac

6301	ctatcaagcc gcttttccaa catttggggc aacagggatt aaaattatga ttgttgtctg

6361	cctgctgagt tcaaactcac agagggacca gaagctgact cactgatatc aagcagttct

6421	aaattttcag tttaaaactc taattattaa acaggggatg tcctcagacc agcactcaag

6481	agaaggagat aggcagagct ctatgagttg agttataggc cagcctggtt ttcatagtga

6541	gttttagctc tccagagagt taccagcaag accctgtcac aaacaaataa aaacaaacaa

6601	acaattaggg gatatacata taactaaatg ataaagcctt acctagcaca ttcaagtccc

6661	caggttcaat tgctagccct gggtggggat ttggacaaat ttaaaaagac cttttttgta

6721	tcacacataa atatgactgc actggttgtt gttttccatg gaaacagaat caatgtggca

6781	tgtattttac ggcattagct catatagttg tgcaggctgg caagtgtgga atgtataggg

6841	caggccagga atcagaaatt gatacaaaat tcaggaaaga cctctgggtg caatggtgca

6901	cacctttaat tcaagcactt gaaaggcaga ggcaggtgat ctttgtgagt tccaggccac

6961	cctggtctac atagtgaatt ccgggacagc cagggcttca tagaaagaac ctgtctcaaa

7021	acacacaaac aatcagaggg aagggcttat tttgtttttg agacagggtc ttctatgtac

7081	cccaggctgg cctcaaactc atgctcttga tatgcccacc tcacaagtgc atgttaagat

7141	tacaggtgcc tgacacacac cacttttgtg aagtgctgaa gagtaagccc agggcttcat

7201	ggacgctggg caagcactgt gccagctgag ccacactccc cagtgtgcac gatactttgc

7261	aaagatagat ccatatggat gctgtgcttc tatctaaaca gaatgacaac cacactctgg

7321	caggttctgg ttcataactg agtcttattg gtcacctcct tctccatttt tcgctggtat

7381	ttctcaagga gagaccacaa atgagaagtg aagcctaact tttaatgcgg tctctcctat

7441	gtcacctaaa ttctagctca aacagggttt ctggctctta ccttttcctc gggtttctgg

7501	atacttgaag tgttaacggg catttctctt aaagaccaaa tctggccaga ttcaaatggc

7561	tggccttcaa ctcggcaaac taggaacaat aatgtccgct gcatgtggct tgtagcactc

7621	tgtttctatt catggacttg tgagtgattt ctgggaaaca cgaattataa gataagtcct

7681	tttcagtgga cttcacaagt tcaccctcag gtagtatact gtcaggtaga aacgtctttc

7741	agagaagcga gaggtgacaa gccctctggg ctggccattg tccctgctgg cattgaacac

7801	cctgttcagc acatgaaagc atcgcctgat gctcccaaag ctggagcact ggcagccccc

7861	tgcagtcagg tgtgtagggt gggttagcag gggtgcttag gcgggttttg tagttacctt

7921	ttcaacacaa atgcaaaagc cagagagaga gagagagaga gagagagaga gagagagaga

7981	gagagggaga gagagagaga gagagagaga gagagagaga gagagagaga gcaggaaagc

8041	atccaggctt tgaagcaagc cagccttcag ctctgtcctt gagccattct gagtggaatc

8101	gagtaattgt ctgcttggag aactgaagaa tagcacatgg caaagaacaa tttgtacctc

8161	gaatatattc attagcttgc atgtcaaaag gccacatgca gatagaaacc attatcttgg

8221	cattctttaa aaccttgcag ccttgagact tgaggtgcag aaacccacat gcccatgtga

8281	ctgactacct gtcgatctct ccagccctgc ctggctaaca gggacaatat agggggatgg

8341	tgggagggga cagcttagac tcctgtggac ttggattgaa agaagaacag ggaagacagg

8401	ggactgtgca aataagcact ctattaggac ctatttttgg tgtcttggga ccctcctact

8461	ggtttagctt aaattgagag gggatttggt ttgcctcact agctgtttct tcccactcaa

8521	ttcacaatta cagctttctt cattgtcatt aaaatacatt aaatgtgtac ttgttggggt

8581	aaggctttct gttgaaatct gcataaagac aatgtccaca gcccccagtc agtggaaaga

8641	gcagtaggac cagaaggcat gtgtttccat cccgagtcta tattggaatg tttgttaaaa

8701	cctgcacttg taagagacaa acactagaac catcagcttg caggtctaca ggccagtgtt

8761	gccagtgcag ataatgccca aactggaacc taaagatgaa ggcctttggg agctgaggtc

8821	gaagagtcag ctgtgatctc ccagatgtcc tcctcatgcc ccattgccac tctagcctcc

8881	cacctccaag cacatttggg atccaactgc taacccctgg tgttcttttc ttagttgaaa

8941	ttctcaggga ataacctaag agtctctgtc actcagtcta tggcatccta tgataacagc

9001	caaggctaaa tagccatcat tgttcttttt ccagatgctc agcaatgagg atgcagaggt

9061	gaacaaaggt ggttcagggc tgccctgatg atgaatttga caagccagaa tctaacaaga

9121	tcagtcggta aacagaatcc tccttcctat ccagagatgt tggcttgttc tgtcactgga

9181	tgggcatcat ttactataag tcatacaggc accagacact cagagataaa taacatgaac

9241	tttccagtct tatgcagtcc tgtctagttg acttgccagt attctcaagg aagttccacc

9301	ccagcccctg gcatccatag accaaggact ctggaatgtt ctgggaaagc tccacctgac

9361	ctcctagcac ccatatatcc aaagagtctg gaacgttatg gtggaagccc cacctctctc

9421	tccccagacc tcgccccctc aaaaagtcca ccaaagactc cccacccccc acacaccccc

9481	agatgctcaa gaccacttcc atagagtatt taaactgcct cccagaaaac agaattcatt

9541	ttttcagtct ctcttcccca tgtcctctca gggtgggggg caggggtatt agtattcaac

9601	cacctatact ggcctgtcct tggggttctg acaagatatg acctcagcta cagccactaa

9661	gatcaccacc tgtgtatatc cactatgctc ccttttaaaa gggccctgtc cacctcccat

9721	tctctctgtc tctctctctg tctctgtctc tgtgtgtgtg tgtctctgtc totctototc

9781	tttctctctc tctctgtctc tctctctctc tccttctctg cctgactctc cctccctccc

9841	ctgctctctt ctttcctgct gcttttgtcc ctagaggcta gtctcctctc tccccttccc

9901	ccttttccca ttcactttcc cccaataaaa aactctccac ccaagctcta tcacatggca

9961	tcattctctt gctccatgat tttaaaatca caatgaggag gggagcatgg aaaaattatc

10021	caggaagact ttatccatta aacctgggtg ctttttcttt cttccttcct tcctttcttt

10081	cottotttot ttottcottt ottttttoct ttottcottt ottttttoct tttttcottt

10141	ctttttgttt tgttttgttt tgagacagcg tttctctgta gctttggaga ctgccctgaa

10201	actcaatctg tagagcaggc tggccttgag ctcacagaga tccacctgcc tctgcctccc

10261	atgtgcttga attaaaggtg tgcaccacca ctgcctggct taaaactggg ctttttctaa

10321	gtcagtttga tttggattgc tgcattggca gagaggttta ttggggtgca gaaacctttc

10381	aaccagcttt tgagctaatg atagagagaa gctcaaggaa ttggagcaat gcttgactag

10441	ggatgtcaga gggaggctat ccagaggagc ttacaactga ggtaaactta aaagttaggg

10501	agtttgtcaa cttcaaccca cagaatagag cagagccagg aggagctgag gcttctgagt

10561	gttatggtgg aagcatcacc ccaacccttg acatccatat gcctgaagag tctggaatgt

10621	tatggtggaa gttccaccca agcctccctt cccggtcgcc ctccaaaccc tgctacatct

10681	cagaaatccc accaaatgat gactccctcc cccagagata ttcaagacca ctcccacagg

10741	gtatttaaac tgccccccaa cccccagaaa atagatgtgt ggttttccaa tctctctttc

10801	ctatcacgtc tctggggagc tggcaggcca tttgggagca ttgtatccat taaacgactt

10861	ctcagtggag actctgaaag ccagaagagc ctagacagat agatgtcttg catactctag

10921	agactacaga tgccggccca gactattata tccagcaaaa gtttcaaaca ccatacaaag

10981	tcaaatttaa acagtatcta tctacaaatc caatattaca gaaggtgcta gtaggaaaac

11041	tccaaactaa gattaactat acctgtgaag acacaggaaa taatctcaca ctggcaaaag

11101	aagaaaaacc tctctctctc tctcctctct ctctctctct ctctctctct ctctctctct

11161	ctctctctct ctctctcaca cacacacaca cacacacaca cacacaccaa caccaatacc

11221	atgaacaaca aaataacagg aattaacaat aattgatgtg tgtgtatgtc cctgtgtgtg

11281	tgtccttgtg tgtgtctgtt tgtgtgtctg tgtatatgtt tgtcacctga ggggtggctc

11341	ttccttggtt tgtgaggttt ctacccaatc tataactccc ttttcttcat tcacttcctc

11401	atgtccttac tagtctctat tgtggattaa ggaaactgtg tggagaacag ttttcttcta

11461	gaaaagaaca ctagccatct catgtaatca aattggtgac tatcctaatt attatgagag

11521	agcttccgtc cagtaagtgc tagaagtaga tgcagagatc cacagacaag cactgagcca

11581	agctccagga gtcctgttga aaagagagag gaaggattgt aggagccaaa gagtcaagag

11641	catgacaggg aaacccacag agacagctga cctgggcttg tgggtgggag ctcatggact

11701	cttgaccaac aattagggaa cctgcatgag gccaacctag gaactctgca tgtgtgtgac

11761	agttgtatag catggtctgt ttgtgaggct tctagcagtg ggatcagggc ctgtccttgg

11821	cgcttgagct ggcttttggg aacctgttcc gcatgctgga ttaccacacc cagccttgat

11881	gctgggggaa gcacttggtc ctgcctcaac ttgatgcgcc ttgcattgtt ggattctcat

11941	gggaggactg cccctttctg aaaaagaaca aggagaagtg aataggggag gggattggga

12001	ggagaggaag gagaggaaac tgtgataggg atgtaaaata aattaaaaaa ttaattaatt

12061	aaaaaagaac acttgtactg gtagattggc taaaatgaaa caaagataaa agtacacagg

12121	aaaaagagag gagaaacctg gggagggggg ctccaaagag aggtgagggg gggatgggaa

12181	tggcagctta gtggaggaag gaagacatga cctacacgaa tcgagctgta gtttttatct

12241	ggagcatagg gtaaagatgt ttgaggagaa ggaggaacac atgcttgtaa aacatggtct

12301	tcagaaccag caacaatcat acagagtgtc cagggtccat gggcacatga aggacagacc

12361	aacacatatt taacagtaaa gtgtccatat ttggtatgaa agtgatgggt aaattgtcct

12421	gggactgtaa tttagttgta aaggacttgt ctggcatgtg ggtattcttg ggttccctcc

12481	ttagcactga aaaaaaaaaa aaacacacac acacacacac atatattcta gtgttttgta

12541	gaaaaggatt caaagaaagc catgatttct cttttgataa atccagaata atgtaataag

12601	aacacacagt ggtgtgattt cagcaatcaa gtacaggttg cttgtctgtt tgttgtatgg

12661	gatggttggg tggttgtttg cttggtttgt aagatgggtg ggtgggttgg tgggtggttg

12721	cttggttggg tagttggttg ggtgattggg tgggtgggta tttggttggg tgggtggtgg

12781	gttggttggt cgtttggttg ggtggggtgg gttttgtttt gagacaggga tttactctat

12841	atctcagttt gtctcaaact cactatgtgc acatgagtat gtgatgagat tatctaagac

12901	catagtgtct gtgttcatgg aatgtctctc tagcttagag aatttaaaaa atggccatgt

12961	agggaaaccc ctcagaaaag gagtttctat ggcctccaag aataagaatg gatcctccta

13021	gctcggagtc agcaaggaac tgaagccctt aattttatag acacaaagga atccattgtg

13081	tggctccttc ccagccaagt ctcagatgag tcacagacct gcatggcacc ttatgcagtc

13141	ttttgaggtc ccaagaatag gatgcagata agccatgcca gaatcccaac acacaaagcc

13201	ttagtgatat agtaaatatg tattgtgtct aggctgctgc atttctggtt atgctactgt

13261	gcagtaatac acaactaata cagatgtgat ggttaatatt atgtgacaac ttgagtgggg

13321	cacagaggta cagacacttg gtaaaccatt ctgggtgcac gtaaggatag ttttggatga

13381	cataaacatt tagattagta tgctgggtaa aatacattgt ccatcccaat gggcatgggc

13441	tttgtccaac tagatgacag ctggaataga aaagtctgcc tctctcatag ttctcaggcc

13501	tttgagctca gactagacag aactcacagg ttctctgagc tttccagctt gatgaatgtc

13561	catggcagtc ttcacactta acacctgaca gacttaatga tcatatgaac caattcaaat

13621	ctgaccatca ctcgggtcat tcttttgatt ctgtcacttt ggagaactaa taccgaggac

13681	ataaaatgcc atcacatcgt tattttcttc ctgtctgtga atatttttct tttttttctt

13741	ggtttttttt tttttttttt tttttttttt tttgtttttc tctgtgtagc tttggagcct

13801	atcctggcac ttgctctgga gaccaggctg accttgaact ctcagagatc cgcctgcctc

13861	tgcctcccga gtgctgggat taaaggcgtg taccaccaac gctcggcctg tctgtgaata

13921	tttaaaatga aaactttgga aatgttctga aaccagctgg tgtcagatag tcagagaact

13981	ttcgtaaggt aggtgtgggt tatagcataa tcccacacaa gaggctgaag caggaggatt

14041	ttgtgtttga gggcagctag agccacatgg tgagtccctg cctcaaaaca caaaagcaag

14101	acaaaaacaa gctccaaata agattcactg ggccctttct ttccttcctt ctcagtgagt

14161	ccacttgctt taaaatcagg tcttaaagac gcactagatg ctgaacttaa cagtaataat

14221	aaatatcttc tcttacagta cagattatgc tctataaaca ctgcactgat aaagttcagc

14281	cttaaccttt gttctgtaaa tgtttcctag tttttctact gccgtattat aagacaaatg

14341	tcagcatgaa ggcaggtttt tcagaaaaca cagcagctcc acagatggcc tctaatccat

14401	aatcattaaa gacaagactg caactttttc aactggaaat cattcaagat gtttttctga

14461	agtccctacc aggacacaag ccaccctggt tgctgtgtga catcagttag gtagactctg

14521	aactggcttc ccaagaaatt atacaaaagc aaggtgtcac ctagtattag cataacttct

14581	gataactact gtcttagctg gggtttctat tgctgtgaag agacaccatg accacagaaa

14641	ctcttataaa ggaaagcaat tattgggtcc agcttacagt tcagaggttt aatccattgt

14701	catgattgca ggaagtatgg tggcccacag gcagacatgg tgctggagaa gtagatgaga

14761	gttctatatc agattgacac acttcttcca acaaggccac acctccactc actctgagcc

14821	tatggggcca ttttcattca aaccaccaaa gctacaaggt agcttatacc ccagcttgct

14881	atttctgatg agacttagta aatagtctta aaagcccata aaatgactca aaactagttt

14941	ttttattatt attattagtt caaattagga agaagcttgc tttacatgtc aatcccttct

15001	ccctctccct catcaaaact agttttttgt tttttaggtt ttttttcaag acagggtttc

15061	tctgtgtagc tttggagcct atcctggcac tcgctctgga gaccaggctg gcctcgaact

15121	cacagagatc tgcctgcctt tgcctcccga gtgctgggat taaaggcatg caccaccaac

15181	acctggccaa aattagtttt aagtccagtt ctaggagctc caatgccctc ttttggcttc

15241	catgggaacc aggaacacta tatatatata tatatatata tatatatata tatatatata

15301	tatatattca ggcaaatatt tatgcatata aaaataaaat aaatcttttt tccttttttt

15361	tttaaagaag tgacattgtc ttggaatttt tgtggctgct ctgcccttat gtgtaactgg

15421	acactaccag catctaaaca ctggcctgaa accagccaaa gaaaaccttt gtgccaggtc

15481	ctgtgtcaaa gtattatgtt ccttttagga tatcctatat cctaaaggat ttattttact

15541	gatagcatct taacttcctt tgaaaggttg gtcttctcaa gcagtcctcg tggagctggc

15601	tcctcagcta atgccagggg acaataatga tcccctccca aaaccaaaca gaaaaccatg

15661	gcaactctgg tttccttggg cagcacctgc tttaagaatg agcaaatgac caatcagctc

15721	atgaaactaa atactctatt attactaaaa tatttttttg agacagggca tggaattcat

15781	cacatagttc aggttggcct tgaactcaga gagactcact tacctttgcc tcccacgtgc

15841	tggaattaaa ggcatgaacc accacaccaa acataacact tgaattttgg aagagtcctt

15901	cttccaatag atttgaggtt ttgaaaatgt ggcacagaaa atatgaattc aaatataatg

15961	aaaacaagag ataactttca actaagtttc tataggttct tgctaggaat cctaagcttg

16021	tctgaaactc tagagcttct gtttctagct tctgagtgtt agtattgtag gtatgtgccc

16081	tgcctcagtg tgatgttttt gataatctta aagaaatcaa agaaatttta taaaagacta

16141	gactgtgcta cacaaaaaga atattcagat gccaagaaag agttcttaga aattaagaaa

16201	tatgctacta gtataaatcc tttataaagt ggaatgacaa atctgatgaa atcttactaa

16261	aagtagaaaa acataaacat caaagacatg aataataaga aaatcatatt gtgcatatga

16321	ttaacctaaa acattaactt gcaaaaatag aatagtccca aaaagtaaac aaaataaata

16381	aatcaccaag aacatgatac aaggacaatt cctaggatga taaaacaaga atattcatta

16441	taaaaggccc tatcactaaa gcacaacaga aacagactca aaagataaat cttcattgtc

16501	actggagaga agtccatact atcatagcac tcagaaggaa ataaaaatca aaatgtcaaa

16561	aaggacctca gcctctgaaa cacaaataca aaatatgtcc cgccttcttg acacgcatta

16621	ctcttcaatt aacattttaa gaaaactata aactgttaaa gagagcttag tattttaaga

16681	aatctgtagc tatttctttt ataagcatga caactaagtt tccctgattt aaacagacct

16741	aaaaaaccgg tgaagtgagt ggagaaaggg gatacgaaga cagcatccca catgactgct

16801	cccagtaaag gcaaggtctt catccatttt atcctgaact ctgggaaatt tataaagaac

16861	agaaatgtat ttctctcagt tctggagcct cagtccagga cactaagtct aggtactaca

16921	ctctcacatg gtggaaagta gaaagcaagc tcacttgtca ctcactacct gatgcctctt

16981	tcatcaatcc cattgataag gaagagacct ggcatctcag tttcctaagg actcagctct

17041	tactaacatt agctgtcatt tctgggtcac tgtaacagaa agcctgacag aagcaaccca

17101	ggggaagaag gatgtatttt ggctcactgt ctctgaggat ttcaacttat cccagcaata

17161	aagggataaa ggcattgcag caggaatatg tgtggcagaa gctgtttatg tcacaataaa

17221	caaataaaca cacgctagcg cgcgcgcaca cacacacaca cacacacaca cacacacaca

17281	cacagagaga gagagagaga gagagagaga gagagagaga gagagggggg ggggcagaca

17341	gacagacaga gagggagaga ggcagagagg gagagagaga gagagagaga gagagagaga

17401	gagagagaga gagagagaga gagagaaatc aaaggcccac ctccatcaga ctggtcccat

17461	atcccaaatt tctagaacct cctaaaacaa caccatcaac tgagggagac atttttggat

17521	tgaaagcata atgccattac ccaggcagaa tctgcctgtc tgggggagtc acatttaagc

17581	catggtatca attgacctca tgtaatttca gaatactaca taaaactatc agatattttt

17641	catgatgaat ttctaaagct tgaaattccc tttgaataaa ggaccaacta cagaattttg

17701	ctgagtctac aattacatac atgaaaatgt aactacgaag tggccagcca caatgaaaat

17761	taaagtgttt gggtggtctg tctctattga tgctcttctt tgccctgttt ttttttaata

17821	ttgttgatgg tttgtttttc ttttaagata cttggcccca agaaaaaaaa tgacagcctt

17881	aattaatttt gtttactctc ctgacatttt aaaagacaaa tttatgaaga cctgactgtt

17941	ccatgtagta ttagaaagat gtaaaattaa gggttgctta agctgtgtag aattgaagag

18001	cacagcattt gagtgacagg gtacaattag agatcatcag ggatgtggca caaagtgtac

18061	tcaacctcac cttttcctgc ttagcagaga acagggtgcc tcggtgagat aggaaattaa

18121	tcaaatagaa gaagaaatag taattttaga aggatcaaat tttcctggtt agaatgatca

18181	aaactacaag acttgtaact aaaatatagt caaacccatt tcaactggaa tctgtgctat

18241	tcatgtatag attaactaga atctaatttt taaattttca tcttacttcc aaaaatattt

18301	gtccaaatac tctgtgaatg cattagtttc ttatgggaaa acatcatatc ttttgtacaa

18361	tgtgtttctt agcttgaggt tctctccaaa caggaccaag acgaggccag gaccatgtga

18421	tacaacccat agtcctcaag aaatagttgt cattttctta ttccaattgc atcccaaggt

18481	ctcatctcat tttgcgtgtg cctttgacac cccataccca cataaactaa ggtggtgtta

18541	ttttttgagg ccctgaaggt atcttcagga atccataagt gagccttaag ctgcatctgg

18601	atataggaat ctgaaagtgt cccttctctg catgatctct tctttcagtt tttcaagtca

18661	gtgtgccaca ggaatcagga acgataaatg gagaggggaa gtgcagttgc ttggtataga

18721	caccccagag ggctatttgc atcctgtcct tcaaaatctc tctgagcctt cctgcctaag

18781	ctgttttgag ttgggtttgt ggtaccagaa cccctgcccc cgccccattc tgactaatga

18841	gagagagaga gagagagaga gagagagaga gagagagaga gcagcagagc atagaatgaa

18901	agtaggttag aagggcaggt aaaagcactt tagacaagag caggtataag ggccttggac

18961	tccctcccca gaacacacac atgaaggtaa acgatggtta aaggatacag ataggatgtc

19021	gaagctggac gatcacttgc ttttgtgtgc ttgaagtgac aggctgtggc tttcgggttc

19081	atggggtctg ttgttgagtt cacagtctca ccatgttagc aagcatgtca ctattaagct

19141	ctatccccgc cccccttttt tgagacatgg tcttgctaac atacccagac cggcctagga

19201	agcactttgc agtctcagct cccctgagtg ctatgatcac tcgtgtgagc tacagtaccc

19261	aaaccagaat atgtgtgttg ggtgttatga gagtttacac attgctgcct tgaatgctgc

19321	tctgcttgag ttcctgtagg aagctgagct gggaacctaa gcttcctcct cccagatagc

19381	agtaaccctg cagagacctc ccaccaagac tagctaaccc ctccttcttg tgctgtactt

19441	agcaagaacc ccaaggttct gggtccttgt gctacagttc cagaagagta tgaacaatct

19501	tagcttttct gtatatgtgt ctgtgtctgt cctgtcagat caagtcccag cctcactgta

19561	tgcaacatga aaggctgtga aaactgtgca ttttgagaat gaacatcatt agtctccagt

19621	aagttcaaaa acaaatgaag gcagccactc ataagggtct ttaatgaggc aagggggcaa

19681	aagggtggtt tctgtttgtt caaagaagcc tgtcatacat tttcagaaaa tttagaaaca

19741	cgtatcatgt catttcacgt tagtatgaag tccttataat tcatttcata ttaaatgatt

19801	tcctttggtt agaagcaaaa ttatgcataa aatgtgttcc tttgtgtttg gagcaaaatt

19861	acaagttaca ttattagtta atattctagt tcttattttt cccaatctcc aagaagcaaa

19921	atattcccct aaaccctaaa gcatcaaatt atcctatcac acagtgacca gtcatcgtaa

19981	cctaaatatt aaagcatcag attatcctgt ctatggtgac cagtcattgt aacctaaata

20041	ttattgtaat gtggattaga gttaactata ccttttcatc acactataat gtaaacactc

20101	tccaaatctt tcaaagtctt gaaaacacaa tttataaata ctgtgttctg tttgttttga

20161	gacctgatcc ggttaggaat ttcaggctgt cctcaaactc atcatcttcc tgcctcactc

20221	aggtcctaag tgctgagatt aaaggtctat gctaccacag ccatacgaat gccatgtctc

20281	catcagctta tcacttctta acttttttct tttcttcttc tacatactgc tgagtaggag

20341	catcgatgac ctcagcctag taggaatggt tcccatgtga acccttaatc tgtaggaaga

20401	tgctggactt cttccattaa gactgatctc catttgaact tgacttgtct ctctcttgtg

20461	tggagctacc atcccatata taatcttctg gtttataaac agattgcttt accctcaaga

20521	tcctttgcta gcgcagcaat gtaagtttta atacaaacag taaggtctct gattggagtg

20581	tcatggtttg gttaagtgcc ctttccaagg gcccatatag ttaagggctc aaccaccaag

20641	tgatgcttgt ggataggagg cagggcctag tggacagtct ttaggtcatg gagctatgct

20701	gttgaggggg actgtggggt cctggtcttt ttcccactcc tttttaggtc ctagctatga

20761	ggtgagtggt tttgtcctat caagcacctc tgtcctgcca tggtgtaatt gattataact

20821	acaacctctg aaactaagcc agtataacct atttatctca agatgtaact tacaggtaat

20881	ggtaagataa agctaacaaa agacaaattg ttataatcca ggcaagcctg gccccatccc

20941	ttgggggcat ggcacagagt gtgtcaccca tctgtgcatg gcaagcagta ccctgactct

21001	gtatgctgat tcaaaggtcc cttaaagcaa actcctccca cttcctctct ttttctgcca

21061	tttctctgag gagggaggcc actgtctctc tgtctctctc tgtgtctctt tttctatctt

21121	cctctccctc tcttcccttt ccccaataaa ctttccacat taagttttgt ctgaaggtat

21181	ctgtttgtct ctcacccgcc ttttaggccc cacctaccat gggatctgcc aaaggtctca

21241	cctcgagctg tattcataac acaaatgaca gacaaagatc aaccctgaag actagtagga

21301	tgtagaaggc ctggagctga cctgaagaac actgctgact tcaacattgc ccatccgtca

21361	gttatgtagc attaaagtta tagtggttcc tcagaaagca gtctcctttg aaaacttctc

21421	gttttgtgtc taaatggaat taaatacctt gttcccgaat aattgtttta gttctcttga

21481	aagatcccgt atacttacta ttaagatgta tataaacctc aagctgaaag aatgacttcc

21541	cctatggcca gatcacaaga ctctccactg atgtgcccgt tgcaacctga ttagaggaag

21601	agggtcaaag ttccccaaga ttcagctgag ttcatgcaag ttttagaaaa aaaacaagat

21661	gttcctccac agttagaaag gagtggggct ggagggatga ctcactgaga aaggttattg

21721	tcgtacaagc atgaagacct gagctcgaag cctggcaccc atgtaaaaag aaaccatgca

21781	tggtagtgtg catcttcaat cccagcattg gggagacaga gaaagagaaa gggacatccc

21841	tagagcttcc tggtcagcca gccttggcaa gccagtgaac tccaggttca gtgagagacc

21901	tgtctgggga ggaaaaaggg agggagggag ggagagagag agagacacac acacacacac

21961	acacacacag agagagagag agagagagag agagagagag agagagagag agagattgag

22021	gaagatacct gatatcaacc tcacacactc atgtacccat gtatgtaggt accttcacac

22081	acacacacac acacacacac acacacacac acacacacac acacacacac acacacacac

22141	acggatggtg ttgaattcta aggctcttat ccacacatat atggagacaa atagaagaat

22201	tacagtcgtc cctgcctttg acgctactct gtttctccaa ccctgcttcc cagatatttt

22261	tcaacatcta ctcagccttg agtggttgca ctctgacccc aggacctctt tctgtgactt

22321	ccttggcctc ctgttttgtt tttctgatgc taaaaactga atctggggcc tcatgcacac

22381	aggaagatgc tataccaatg agctacaatt ttgttgccct ttttaatttt tgagatggtc

22441	tcactaaatt gttcaggatg gcccacttgt aattctcctg ccttagcttc ccaagtagct

22501	gggcttttat acagatctgt gcttccacac ctggctgagc agacactcat gatttcattt

22561	ctgctaatca ggtagttttc ttgcccctcg ctgccatttc ctacctgcct ttccttgcca

22621	actaaactgg ttcccacaag cgacaggcta tcatttctca gctcttccac aggttagctg

22681	tgcaatttgg tatgaatcat ttagcaagcc cagttctcct ctttgtaaaa cagatgattt

22741	agatgaaatt ttttcaaagt tctctttgaa ttaaaactat cactgccttg cttgctctct

22801	gactcttgga gaccatggcc tatccctgat tagtccttgg tccacagaag gatgggtggc

22861	attggatgtg ctgaacaatc aggtactttc atgtcacttg gagtcttaca gtaactgcat

22921	gtttcaaatg aatcctttct ggctctatta gtttcttttt tgtcactgtg aaaaaaacac

22981	ctgaaagaaa caaggcacgg tttgttctga ctctcggttc agaggatata gttcaccatg

23041	gaggcaggag cttctcacag ctgtaacagc catggagtca ggtggctagt tacagtcagc

23101	tggccttagc agtcagagag ccaagagagc tcagttgagg agagtccagc caggctgtag

23161	cccttaggac ctgctcccca gagatccact ttctacagta tcttctaaac agtgtcacta

23221	gatggtgacc aggtagtcaa gcacatgagc ctgagggata atatcattca aaccatagga

23281	ttagtctaga actgaaccag atcaagaacc aggttttctt ctcacataat agataccaca

23341	catcatgttc tcatatagag tgtgatctag gtattgtttc tccaaatgga gaagccaaca

23401	ctggatgact tacatagaaa gaaagagagg gaggaaacaa gcaagggagg gggaagagtg

23461	agaattattg gaacagtacc agtgcctcaa aatccttggt ggactagaga attagcctca

23521	ggaagaagcg actaggcttc ttacagcata gacatacagt tcttaccaga ggcacagcca

23581	tcatgggtgc catggggagc atgaagttca gctccatcca gccattccta gcgatttctg

23641	gcaacctctg tcctttgaga cacttcctga agatataaga gtccagggag agacatctga

23701	ttgctttgat cccaggatct tgggatggaa ttggtgttgt ctctgctcca gctccagggt

23761	caggaaggtg aaactggaaa cacaagctag cttttcttac ttagcaaaaa cccacaggtg

23821	acataaaaga cagattgaca cgagaacagc atggcagatt tatttagtca aagttttacc

23881	agacacaagc accttcagaa aggtaaagtc agagacctta ggggaatttt cttgccagaa

23941	tttttccaga agaatcaaca gccgtgtaac aataggacta gataaacaag taagactgga

24001	cctgcagcac aaatgtgaca ataggagttg gaatccccag gactcacata aagccatggg

24061	agccgaatgt aatggtcact tgtagtttca gcctcagatg ggggtgggga ttctccagaa

24121	taagcaggct agcaagacta gccatgttgc caagctctgg gttatattga gacactctgc

24181	ctcaatgagt aagtggaaga atgatggagg ccaacttcaa ccttggactt ccacatgaac

24241	acacatacac aatgcaacca tgcatccaca gtgtatgtac acacacacac acacacacac

24301	acacacacac acacacacac acgcaaatgg acaaagaaag aggtaaaacc tacaaggaat

24361	caactgaaca gaagccaact ggtctgcctg ttcagatcct ttttggcctc tctgtgtgct

24421	tccctttctc ctgggcatgg ggcaggcagg atctgtatgg ggtgagggtc ttcagagaag

24481	cgaacagcct tcctaggttt tatggctcag tttggtggag aggggatcta gtttctctta

24541	atcatctttt taaaaattta ttaatttatt ttttatattc caatcccagt tttccctccc

24601	tcctctcttc ccctccccca cctcccatct gttccttaga gagggtaaga cctcctctag

24661	gaagtctact aagtctgccc catcatctca ttgaggcagg accaaggcac ctctccaccc

24721	ctacactctg gtgtctaggc agaacaaggt atctctccat atagaatggg ctccactaag

24781	tcagtttgtg cattagtgtt agatcttgga cccacttcca gtggcctcat atattgtccc

24841	agtcacatcg ttgtcaccta tattaaggga gtctagttcg gtcttatgca ggttccccat

24901	ttgtcagact ggagtcagtg atctctcact agctctggtc agctgattct gtggtttccc

24961	catcatgatc ttgactcctt tgttcatatt gtcactcttg cctcacttca attgtactcc

25021	aggagcttgc ccattggtta gttgtggatt tctgcatctg cttccatcta tttctggaag

25081	agggttctat cttctctggg gttgtgaatt gtagactggg tatcttttgc tttatgtctg

25141	gtatatgctt atgagtgagt acatacaaca tttgtccttc tgggtctggg ttaccccact

25201	caggatgttt tttttctagt tctgtccatt tgcctgcaaa ttttagaatg tcattgtttc

25261	ttactgctga gtagtactgc attgtgtaaa tgtaccacat tttctttatc cattcttcag

25321	ttgaggggca tctaggttgt ttccaagttc tggttattac aaataatgtt cctatgaata

25381	tagttgagca aatgtccttg tggtatgaat gtgcctcctt tgggtatatg cacaaaagtg

25441	atatttcagg gtcttgaggt aggttgattc ctaattttct gagaaatcga catactaatt

25501	tccatggagg ctgtacaagt ttgcactccc accagcaatg gaggagtgtt ctctttactc

25561	cacatcctct ccaccataag ctgtcatcag tgtttttgat cttagccttt ctgatcagct

25621	taaaatggta tctcagggtt gttttgttaa tcatcttgag aaaaaggaat tctattttct

25681	gtgactggct ctgagagaga gagaagaggg aaaggtggga ggaatgtgtg ctttcaagac

25741	cttgtgttct cccttagctc aaagtactca ccatgaaaaa ccaccagcct ttggaggagc

25801	atgctcttgc agaggcaaga tcctggcttc ctcccatctt gaatttgcca aaatagcaaa

25861	gatgtttggg tgctggacag ccaaaaatga cagctgctca cttcacagct tcctcacgta

25921	tgattacaac tccactcatc atcaagcttt aattacatca tgagcaggct tatggctgag

25981	ccgttatcct cgcatccctt cgtctcatca ctgattcaca caaatcacta ggtgctccgg

26041	ttaatgaaaa catattcatc agtacagtga ctaattcatc aggccaacat ttacatggct

26101	cctctgcatg acaaaaatga atgtttagaa tgaataatga gtcaccagag gtgggggaca

26161	tcttctgagc acaggttgcc cttgtctttc ctggtactca atcccggctg aagagctgaa

26221	caaagctgag gttatttttc ccatgacagt gcattgtggt ttagagatct gtaagcggct

26281	tatcttgatt ggcagtttga ttggttctgg gatgtactaa gagacgtgcc tcatgggcat

26341	ttccagaaag aattaactga gggggaagct cctcgccccg agaatgggta ggagcatctg

26401	gtggggtaca gatgtaaagt ggtccaaggg agaagccgca tggcctgcct gccttcactc

26461	cttgctgctg agtgtgttta tcccatctat cccgttgttg cttctgttgc agttgcaatc

26521	ctgcttctcc aggccccagc gtagactgaa cagtggctgc ccagaaattc ccaattgaag

26581	cagccgaatg gtggactgag cacctctcag tcttcagtct ctctagtttg taggcaacca

26641	ttgttggacc caactcttag tagtaagcca atctactaaa tacagaaagg ccagtgagat

26701	ggctcagtat aggtgcttac caccaagctt ggtgacccga gttcaatccc caagactcat

26761	aaggaaagaa ctaactaccg agagttgttc tctgagctcc acacatgctg aaacatgggc

26821	ctccacatgt catgaacatg ttcacacaat acatatttat ctctatatat tcatttctta

26881	taatttttag aaaatttcat tttatgtata tgagtgtttt atctgtttgt atgtctgtgt

26941	accacatgca tgcctggtgc ctgaagaagt cataagaacg tatcagattc cctctaactg

27001	gagctaaaag aagattgaga ggtacctacc atctgagtgc taggaaccaa acctgtgtct

27061	tctggaagat cagtaagcat gcttaaccac tgagccatca tgccacttat ttgtaacaca

27121	tatccatcct attggttaca gtcctgactc atacagttag atagctgagg aacctagaat

27181	tcttctgctt ttttattaca aaacaaagaa ttttatctga cttacagttc tggccttagt

27241	cagggagctg cattgggaga tggcttctct actgtcagag tccagaggtg gccgtaaagt

27301	atcatatgac atgaggcaga aagtctaact tacttgagag ttaacttgga aatgtccaaa

27361	gagacagggg gctaagtccc tcttattgaa gagaccttcc atagaagtta gcctgacaga

27421	tggccttgcc tgaactgcat tgacagtctt acttggaagg cctgttttgg ttcctaagaa

27481	attcaaggat ccaccagaga agtgtgcagc cagcaagctg gactccctat cccaagcccc

27541	agctcctcct cagggacctc agcagtcctg tgtctagctt acctcagcga tggggggaaa

27601	gatgctgttt tcctgctaag agcacactat tttatattat tgttgacaca ggttggactg

27661	catgtaacag actctccaac aacacagtga agatacaagt gtgttttgct gcatttaaat

27721	gtctccccat ctgtccctgc taagacacct actgtccttc acatgtcact gaaaactcca

27781	ccccttatga gaagtcttcc ctgatgccat ctagacaagc taagagtgct ctgctctgca

27841	ctgagcagct tctcaactct ggggttatca ttgctctgca tcacaattag cacacgtggt

27901	agtggctgtg tttgtgtttt tccacaccat gagtccagac agcatccctc tcaccagcac

27961	gccataggca caagtgctca agagtagcag gacttgaaca tgtgtggttt atcatacaga

28021	cagctgctgc tcagagacca gatcaaattc aaagcaaaat agagagatga tggttcctgc

28081	catgagcgta ctgaacaagg acaaacatca ccatcataag gaactcagct gacagggagc

28141	ggtcaccaaa cttttttttc tgtaaagtga caaaaatagt taagtatttt gccctagaca

28201	tagtgggtgg tacacatgta atctcagcat ttgtcagagt gaggcagaga gttgaatgct

28261	gggctacgta gatagtctca aaaaataaat aaataagtaa ataaataaat aaataaataa

28321	aaggaagaaa taaaaaaaag aatttgttac tcaactctgc acaatggtgc aaaagaaaca

28381	ataagcatta tgtaacctag tgggtattgg ctgtttcact ttactaacag gcattgaaat

28441	ttcaattttg caaaattttc atgttccata ttacccttat ttttattctc ccctataaat

28501	ggtgactcac caatacgcaa ctggataaga ttagggtatt tttattaggg aatatgcctt

28561	acttacagag cacctaacca gccagcagga aacatagtaa agtagcgcat gccgatgaaa

28621	caaggaaaaa gaagaactac catgtgtgac ccctaaccct taaaacctct cccacatcac

28681	cctgaccatg cccattaggc gtggtcacct agccagcccc taggaggcat ggttacggtg

28741	tccccctaca ctcccctaat catttaaaga tgcaaatgca tgcttggtga tgggctaacc

28801	ttggctcatg ggctaatctt ggctcatggg ctaaccttag ctcatgggct aataatcaag

28861	gtttactaat ctctgtcaga cagccatttt ttttttgcag agaagaatcc ccatctttgg

28921	atcatttatt tattcctttt gtatatttga tgcaatttat aaccacaaga acctactatg

28981	tgactgcact gtgccagatg gcagagaaag ctaagccccg attcttgtgg catggactca

29041	cacaactcca gtacaggact gttagtgaca atctccttaa ggcataagca tactgcagtg

29101	gcagcctctg ggttaggaga caaggataca gtttatgaca cctggtatct ggaaggcatg

29161	aaacatgtca aatgctggct acacctaaga atcagcaaca tctagtctgg ccatagccta

29221	ggatgaatgt cacagggtct taggccagaa atgtatggcc gagctgtagc agggtcctct

29281	ctagggccag aattaattcc agtgtgatgg acagccaaga ccacagggat aacaaatgag

29341	cagtgccaat gacacgtgct tctccttatt attgctgcac agtgtttgtt acacatagca

29401	ttttcgcaca gtaatataat gtgcttgggt catcttgctt catatcccat cactccctcc

29461	atctccctag tgcctcccct gttacctttg cttctcagtt ttgtttctgc tttgatgtca

29521	acagcacata caagatttta tgcaatacat cacttcctga atggctctat ttggaaatca

29581	ctaaaaggta atttatggaa catttggggt ctttttgatt ttctaattta ccaaaaaatc

29641	cacctgggga aagacaatgg agttcaagga cttctaagag gggaatgtac catggtatgc

29701	tccagccagg ggaaccagtg cttcccagga gctatggctt acaaagtggg ttatcacatg

29761	aaagcaagac taaaataatc atctcaaata ttcattagat gtgggactcc taaccatctc

29821	acaatgcctc cctcggtcta cattaaataa gaaacctcca ttttgtgctt tgcgagaaaa

29881	tgactgaaga ttatacattt ggccttgaag tggaagtatt tttgaaaatc atgaatagga

29941	aaataataaa tctctcattt caacataaaa tataagggac aaggacatct actcatgctc

30001	caaggacgga cactgaattt tccatcaggt agttgcagaa cgctgtgtcg ctcaatcaaa

30061	aattcaggat gcattgctca gagtgcatta tattaaaaga tagcatcttg gaacacagga

30121	tgctcaggaa atgggaggga cattaatctg catgcagtga tcatctcctg caaagcgggc

30181	atgagagcct gatgggagac aagccatcca gatgcccata cccaggggag ctgtactggg

30241	ctgcagccct gcgccattca gccatgcacc aggctactcc ctcctcttcc agctttctcc

30301	ttctgatggc cataggatta gaagataagg gactctagtg caggtcaact gctgaccagt

30361	gtgaaaatgc acagactaca tgctggtaga tcagcacttc aaactactgt tcaccatcat

30421	ctctggaata agcactacat ttacagggtt caaacctcaa tgaatataaa caaacaaaac

30481	acacctccct tccttcactg tctcccattt ctttggttcc catctccaca tagaatttat

30541	aattaaaatt tctaagtatc tttccagaaa tacttcacac atgttataag caaatgtgct

30601	tttaaagata ctattttaaa ttatgaaaat ggttatatta gttgagataa aagaatagaa

30661	tgggaagttc cagaatttaa ggcctcatat gaaaatataa agcgctttct cttttaagtc

30721	tagggtaggt gtactagatc agcgctcagc tccataccat gaagccatcc aggagtcaga

30781	cctctctgac agccctgcca ttgtcacaga gaagtttctg tcaccagtgc tcatgctgtc

30841	agaggagcga aggagaaaag atgtgagacc tcccaagtca aagtcatcta tggataaaac

30901	cttagttgca tggcacacca gtgttaggga gtcggggaaa cacagccata gcccagcttc

30961	ctctctgttc ttgctcttat taccaccaga aagaggttgc ttagacaacc caaaccaaga

31021	cacagggctc tgtgggaggg aatcagtccc aggcttctgg cacatgctat gtcaccggaa

31081	agccccagcc ctactccgaa tccccacaag tacagcaaat atcagattat agcatttaaa

31141	ggggcactct tgccaaagag aagcaccatt ggaatagcca tgcttgagaa ctggtcctac

31201	ttactgcaga accatggata caggctccct tttgtagatg ggcttaataa atacttctat

31261	aagtgatact ctgctttgtg aaaatgacct cgtcaatatt caaagtaatc ctctggttta

31321	ggactactat gaacctgtgg ggttcattgt tcatgtggtt aaacagcaaa gagtagttag

31381	acagttgtcc tacgtcacag agggggacat atgctatgct tggttaaata gctgtcctgg

31441	tcagagggga ggcatgctat tctgcccttt ctgacagacc ctgattgcat agacatttca

31501	gtgagataaa ggaaggaagg gaagaaggag gaaagacaac attttttgct tctgttaagg

31561	tagagactat ctgtgatcca gttcagcaca gtgcctgtga gtagaagcta caggtcaggc

31621	aggagccaag gaaatgtatt gcttttctaa ttgaacaaag gacacacagc tgccatttat

31681	tttcttcatt ttgacccttc agccctgcac tgtggatatg acatcaagaa actaagcagc

31741	cattttgtga aaatgagatc taagttagta aatgtggctg aaaaagaagc cagctgcatc

31801	ctccctggat ttacgagggg gaaatgtagg catactaaat taaaacacta aaattgaccc

31861	aaagctattt tgactgatat ttaaatatag attctgctcc tggacattcc agagttcata

31921	ggacagttgc ttctgttcag aggattcctc ttcggggttg cctctccttc cttaggcctg

31981	cttgtcctgc ccaaagctgc ccaagtgcat caggccccaa accaacttct ccatcctgac

32041	gcacagcaga ctaaatatgc aactttgtgt ctcttcatcc caggacaaaa ctttcaccca

32101	gcccctgaca tctgagactc tactacaggt tatctattaa atcttttata aagaccaaga

32161	aacaaagtgt tggcatccaa actttggtaa atcatagcct tttaataaag tcaaatggac

32221	caatgtactc taacaaaaaa atatgggtct ctcatttctg aatggcagat ttcaagccct

32281	aagaaccaca atgctcacct actgggcaac actgagttac agagacccag ctcccccacc

32341	cctcaccaag ccagagaaac actctatctg aacaatcctt ggtccatgga gcaagaatta

32401	gacatagaat ttgtatctca ttgtttttta ggaaaacccc aaaggctatt atgaagtcag

32461	tttttctggg caccttttct ttcccatgac aacgagttgt gggcagtctc agcagaatac

32521	tgaagctgtg gcttggggag acagagcata tactggattg gagttcatgg gtgggtgcat

32581	ggaatcaatg ccgggcatgg gattcaagac cttatgcatg tgggtagatg ctttgttact

32641	gggataaatc ccccacctgg gatctgactt caagcacaat ctttggaagg cggcattggc

32701	tctctgctaa tttttctagc acttttattc cacttatttt ctgcttgttt gctttgggag

32761	ttttgttcgt tataagacag tcttgctgtg tatcctaggc tgatcacaaa cctgtggcag

32821	tccttttgtc agcaggccaa aattcccact ttatctctga agacagaaag tagattgagg

32881	aatatatgat aaagacactc atcaaagcca ggcatctatc tttacttttc ttaaagcatg

32941	tttttgaatg gcataaaacc atgtagacaa ggagtcttat gttgtacatg gtcctacttt

33001	gtcacttaca atataggata ctttcaataa gcttggtagc ccttgcccta ttctacttat

33061	tctgttctct cttcctcggg tcttggggag ccttcttacc aggtggggtg gcataaaggg

33121	aaaagtcaca aagctcttcc tattcctggt tcccctccta agtgtacctt gctggtggcc

33181	ttgctagcaa atgtagtata acatctgact tatctcctct cagatatggt tgttgtactt

33241	agataaattt aatctagaaa ctcaagctgt atgtctttgg ggaccagcat tacagagctc

33301	ttcccttcct gtccttacct caccttggct actgtagtaa gttaatcctg atgattcctc

33361	catgagtcct gaaactgatt agttccaaga gctggaggat gagaagggat atagcctggt

33421	gcagggacac tttccaatga ccacaagacc ttgcacaagg tacacatgga atgtgttaga

33481	ctgtctcctt tctgtcccta gcctcagttg ccccagtgtt tatcaatgtt tattaacatt

33541	gccctagcaa aaatactaca gactaggaag cttgggtaca attgaaaaga gcttctcagg

33601	gttctggata ccgggaagtg caaaggttca gcatctggac agggctgcta ttgtagtttc

33661	aaatggttct gctgcaacac ccctttgaga gaatgaacac tgcttttcac atggtggaga

33721	gtgcacagac accaacccaa ctcctgaagg ccctttctcg agggctctaa tccatcatga

33781	gggccatact ctcaggactc attacctccc caacatcccc tctctaaata gtaccacact

33841	gcatttgcat ttcaatatat cactggagat atataaatct ccagaccaca gcataccata

33901	aatcagataa ggcaggcctg ccttctatag cctttcactc agcaaaggtg tttctagccc

33961	aaagcagtct ggactctcac tctgaaacct cttgggagtg gtggccagaa atgacttccc

34021	atcatccctc tctcctgacc tggtccagca ccaggtcacc aggaaatcct ccaagtttca

34081	ttatccccac ccccaattgt ctcttgtctc tagcaaacct cttccaatac ttccttcctt

34141	ggtgggtgta gcaagccaga tgatagcctg ccaaagaagt tcacagcctc atttctggag

34201	cctatgaata tgttacattg tgtggtaaaa ggaactttgt aggtgtgatt aaattatgaa

34261	tcttgaagtg ggcagattat ccaagtgagt ccagtgaaat tgcaaaggta catcaccaac

34321	agtgaggcag gaaggccaga gggggagaag gaagcagaga ggcagaggga ggaaaagaca

34381	agccagggga ggggagtggg gggaaagaaa ggagagagag agagagagag agagagagag

34441	agagagagag agagagagag aaatatcaca cacacacaca cacacacaca cacacacaca

34501	cacacacaca cacacctgaa cctgattgtg gaggaagaaa ccactaacca aggcattcga

34561	ggcagccttt gaaagtcaca agagacaggg aaaacagatt ctctccctcg gcccttcaga

34621	atcaacacag ccccacaact gctgatttta gtcatgttaa agccaagttg gacttctgac

34681	tgccaaaact ttagacgagc aaataaatct gcactatttt aagataccaa tgtgatttgt

34741	tcatgaaaac aatcaataag gaactaataa agtagaagtg aaaattggat cacttctgaa

34801	gtttggtaat atccacagaa actggacaca tgctgacttt gtgagccata gctccacacc

34861	caggtatgcc ccctacagaa atgtgtatat aggtgggcag gagatgtcac ctgctgtgtt

34921	catagtcgca cctttagact ttcccaagcc tgagaatagc ccaaacacct accaggagca

34981	aaataaattg agatatacag acgcagtggg atactacact tctaaaagaa tgagaaaacc

35041	acgctataca ctgtatatcg tcggaacagt aacacagggg tgacaatcag gcaataggac

35101	atattctcta tggctttaga aaacataaaa atagcataac agttctgtta gtggcaatgt

35161	gttctgtttt gtgatctgta tgatgcttcg gtttgtgcaa aagctctgga cttacctttt

35221	aaatgtatgg tggtctatac cttttaaatg tatgctagat atacatgagt aaaaatgatt

35281	aaaagagatg gaggggagga gactcatgcc ttcataaaag tttgttctgt cctttctggc

35341	actgtccaag tgaatgtgtg taaacaaaga gtgacccacc ccaggtagtc caccttctta

35401	gaacctactt ctgctacaac atgtcctgtg aatgtgcacc aaatgtttac taagggatca

35461	tgccacaggg ttttgtttaa ataaagtatg tctacctagg ggtatattga ttgtctttcc

35521	ttttgagggg gggtctcaaa actacaaact agtttgtttt gagacaagta tgtagcccag

35581	gatggccttg aactcacacc ttctgtcctg cctctttccc agcactagga tggcaggtga

35641	gactatcagc ctggccccag gaaactatct ttgattgaca ttatctggtc agaaaagatc

35701	taccttttcc tccaccaggt cctccaaata catgaagagc tgaaacagtt ctgtctaccg

35761	aatttccttt tttcttgatg tttctgtgga atttaataca taaattttaa tttgcatttt

35821	tagcttttct attaagcctt aattagagta taatgaagtt atgaatttat aaaaataaaa

35881	acaaaacggt tgctcccaca atcactcagt cttgaagtga ggttctgact ttacctgaag

35941	tgggggaaga gagtgaggaa agggacctgc ggaagctgaa tctcagaccc acaagatgga

36001	tctgagatcc atccaagcga acgtggacgc agacccggag tagggacatc caggggtcat

36061	cttcatctgt cctcgctgtg cttctgcccc tttgctcctc taccagtctc agctgtcaaa

36121	gctcagtggc ctggagggga gatggggcgg ggcttaggat cgaaggcgga gcctcggaga

36181	gcatcttctg gcccccgggg cctggactgg cccgccgccc ccacctgcag cgcggcggag

36241	cgcgggcgcg tcactcccag cggaagcgcc agcctcgcgt ctggcgaggt gcgcgcttcg

36301	cggctcccgc tccagagctt cgtggcccgc ctgtgtctgc agagcagggg cgggggcccg

36361	gcggcaccga ctgggcactg agatccaagt agccactgaa tcgtagacag tcacccagct

36421	cggacagcgc gtcggggcgg gagcagatcg ggaaggtgaa ggaccactgc ggatccgaca

36481	gcgcgtccca ggtcagtcct cccgctgcac ttggggaaac tttgggatgc ggtgacggct

36541	gcgagatgag gacactgagg gtcgcgaggc cgcgtggccc ctgtgaaccc cgcgaacccg

36601	tacctgccgc gcacctgaca ccgcagctgc cagggcgggg accgaNaccc tgctgccgcg

36661	gaccactgcg ggccaccaag ggctagcggg cttcaggggc ctctcgggag cctccggctt

36721	gcccgcgccc agccgcgcgc ctccggtcct cgcgggtccc cagctccttt tggcggctcg

36781	cgcccggacc ccgcggggct gcggattccg ccgtcttcgg gcctcgtggc gctggaggag

36841	cggcccgggg gcccatggct gcagggtggc ggccccgcgg cgggagcggc gcgtgctcgg

36901	ccggtggagc gcgcgggtcg cggggttcgg ctggagcgcg tggccgcagg tgcctgtggc

36961	cgctgggcag cggaggtgag agcgcgggct ggggacgcgg agcggattgc aacctctggc

37021	tgcaggaacc agggtcgctg ggtgagcagt cctgtccccg cggcttccgg gcgtgcacat

37081	ccctggcacc cggcatccag accccatcag ctggaggcgg gctgcagagc ggcgcctgcc

37141	cgggccgagg accagtgcct cctgctctga cacgccatct caccaacgag ggcggggtgc

37201	tagattggcg ggctgcgcgg ggaccactgg ccagggcctt ctggcacaag cccttttcgt

37261	ggacagctgc ctgctctggc ttggagtgga ggagacgaaa tgagtacccc gcccccatca

37321	gcgccccaac actgtcgccc cagtcacctt cctttgccct tctccgacag caccttggac

37381	ttgctccctc ccgaattggg gaaaatctga ggaaaccagg cagggacctt ggagataccg

37441	cagcctgcat actcaacagc ctggaaatcc agtcaccttg gtacctcgct gcttcccaga

37501	cactttggag gagcaggttt gccatttcta ccccacatcc gtaccccatc ccccgtccgt

37561	ctctgctgag gaagggactc ttatgagaga agttgggatc taggtacccc ttaaggtagc

37621	cccagagtct gtggtaacta ggctcatagg taactaaaag gcatcctagc tctgtagctt

37681	tgtgagggaa acaaacctta ccaactaatt ccttcccttt ctgaatattt cttagaagac

37741	tggagaccaa cggaagccga ctgttctggc cagtctttgc accctttgct tggctctgac

37801	tctccttcct aggcagagaa acattttgct tatgacctct ggctggcctc cttccaatcg

37861	ctgcctggcc ttggactgcc catcaggact gtgatttttt ttttttttta agacctgatt

37921	aggaaaggct gcaagcctcc ggttctagaa ggctcaaact caggggtata ctcttctctg

37981	atacccatgt gctccctaat tccactgtgg caacacctct gcccttcact cccacaagaa

38041	aattggttgt caaacctctt ggggaagatg atggaggcat ccctgtggga gcagatgcag

38101	gatttggaag caaccaggaa acaaccagga gtgaggaatc ttttttaaag gctcacatga

38161	ttctggaact aagaaaagat ggagatgcca ccagtgtatg aagcttggcc tctcctcggc

38221	ccatcccacc caactcaggg aactggcata tgcaggacct gtattgggtg atgcatattt

38281	ggaacctagt acttattgaa ttcctaagca gtaaacacat tccgaatttg aaattcctca

38341	caatcatcta ctgNaatgta gatattaaac ccccaactta tgaatgatag ccccaaaatt

38401	gttaacattg agagagccca ggttccctgc cacctcttcc acaacaggac aggaactagg

38461	acaatgaata ggaccatttg agctttaggg tcatgtgccc actttacagc tccatagcca

38521	gacaactgtt ttataagaga gggcacaaag gaaaatcact gtcctgtcca aatgaataga

38581	aagctgggga tggtggcagg acaaaggcaa caggaaaaat catctccaac aaggctttcc

38641	aagcatatca gtcttatact actgccatgt tgggtaccac acaaatcagg tatctcaaac

38701	tggacgctgc ctagggaggt ctgtcatcta aaaaggcagg gagatattga gataaaatac

38761	acagaagcta gtatttaact ccaggctggc agataatagg aatgaccttg ggagggtgtg

38821	cttacctttc cttctctctt gaacaaaatg tggactggac cagatgagca ccaaggctcc

38881	accaactcta acagaccttg tgtggtgggc ttgcctgcaa acagacttga gctaggttgc

38941	tgtgcgtggg atccattcca gactcattta caaactcgta gtcagtgaaa tgtgataaac

39001	cgaacactgt agggatttct aaacaaggaa ttaaaaaact cgactccaaa tgggagagat

39061	gcaggcaaca aatcgacagt gtttatgtgc ctctgaatag ctttgatttc cttcggtagg

39121	agctgacagc tggctgacag aaagctcacc cagggagaga agagagaaaa atcaagtatg

39181	agattaggaa taatgttttc aggtaacttt ctattcccat tcggagtggg tgtctggaag

39241	ggcgagtgta gttatggctt gaattgctcc atttatccac agatattttc ttcccaaggg

39301	ctcctgattc taagatgctg ggctttgctt ctgtctccta gtttcctggt agcagggtag

39361	agagctgggg gtcccagcat tcagcctgca tattcttcct ctatcctcac tatctgctgc

39421	ctccattatt tgtggtcttt tggatctatt tggtcagaga gtcagtcttt ggtttcttgc

39481	cctggaaact gcttgttgct acttgtggtg ggggcagcat ttggaagtcc aggtgctctg

39541	cccacaaact ttcaacccat catttgtttt tcatcccttt ctcattgcca ctttgtgtgg

39601	tgcctgggac ttctgggacc tatagttcaa gggtcatata taccaatggc tcacatgaca

39661	gcactgatca ctctgccagc tctcctctct ttgcaaaact tatttcagat ttttcatttg

39721	acaatacctt tcctccagtt gtctttattc ttggcagcat atgccttgta acctttaaaa

39781	aggaaggtaa ataatttgag aaaaaatgta ccaagtcctc agtgatacat tcttactaaa

39841	gactcccagt tttaacaagg agttgggctg gagccatggc tcaacagtta agagcactac

39901	ctgctcttcc aaaggacaca aattccattc ccagaaccca catggcccct tccaaacatt

39961	gataactctc gttccagggc acctcatgcc ctttcctggc atctgagaga accagcataa

40021	acatacatgc aggtgaacat tcatacacat aaaatgaaca ttaaaaaaga aatgaaatag

40081	agaaagggtt tacataacta tttaataact aagactgcct aataatgtag ggacccataa

40141	agaaaatcta gtaagttttt acaagattcc actcaatcag accaaacatt actgttactg

40201	acagagtaaa aagtcacttc caatagtcca agaacaactt tgtttcattt ctcaggcact

40261	gtctgttttg tggcatatgt gcatggtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg

40321	tgtgtacagg tgaatgctgc tcgtgtatga gcacatgcag gtgtgtgttt gcatggtgtg

40381	tagacagagt ttctgacctg cctggtccca cagctgtttg gccacaaata aacatacaga

40441	ggcttatatt aattagaaac tgtttggcct atggcttagg cttctcactg gctatctctg

40501	tcttaattat taacccataa ctactaatct atgtatttct acgtggcgtt atcttaccgg

40561	agaatacttg gtgtcctatc ttctcagcaa ctacatggcg tcttctctct gcgtcttctc

40621	cccagaattc tcctcgtctg gttgccccgc ctatactttc tacctggcta ctggccaatc

40681	agtgttttat tcatcagcca ataagagaaa catatgtgaa gaaggacatt tccctatcaa

40741	tggtgtgtgt gtgtgtgttt gtgtgtgtgt gtgtgtgtgt gtgtgtatgt gtgtacatgg

40801	gtatgtgagc acatgtgggt atatgggtgc atgtgcacct gtgtgtgtgc atggtggcta

40861	gagttgaggt tagatgtctt ccttggctgc tctccacctt ttttttattg aagctctcac

40921	tgaacttaga gctcactgat tcagctagtc tagctacccg gcctgctctg ggggtcccct

40981	gccttcactt tccatgtggc taccatatct actttacatt tatgtgggta atggggatct

41041	gaactatggg gtcctcatgc ttgcatggca agtgctttat ggactaagac atctttctag

41101	cctttacctt tttttttttt gaaagagttt ttttttgcta actgggaact caacaccaga

41161	tagctagtct actggtcact gaggcccagg gatctactat ttctgcttct cttcccaagt

41221	gctgggacta cagactgtac caccatatcc atatttcttt tagcatgagc tctggaagtc

41281	aaactcaggt cctcacgctc acaaagtaag tgttttatct accaagccat cttcccatct

41341	ctgttgtttt aaaaggcttt gaatatggga tgtgatgaag ggaggtgaaa ttctgagata

41401	aatttcttga aaagaagaat gaatcaagta ggagaacctc ctcctggtgc tgtctttcag

41461	ttccatgtcc acacagcata aacattatga ttatcattcc acagattgta attagtcttt

41521	ctctgttttg ccagtctgct cccaaaaaat gacacagaga gacttcttat taatgatgaa

41581	agctttgcct tagcttaggc ttgtttctaa ctaactcttg taacttaaat taacccattt

41641	ctattcatct acctgctgcc acgtgattca tgacttttac ctctctctca ttctgcatat

41701	cctgcttcct ctgcttctgg ctcatgatcc cgcttttctt cctctccgag tgctctgtcc

41761	ccagaagtcc cgcctaacct cttcctgcct agcaattgcc catttggctc tttactaaac

41821	caatcacagt gacacatctt cacgcagtgt aaaggagtat tctgcaacaa caggtgatga

41881	agccaacatt ccaagaggcc agggcttgcc tagggcacat agctaactta agaaaattag

41941	gatcgcattc tacatctgtc tgactctgaa ttggatctga actgtgactt gcatggaaga

42001	cccaaagacc ctgagaaagt acaatgacaa aggggctgac tctgtccaca tggtgttagc

42061	ccaggtttcc cacaggagga aaacccatcc taggcaagag aagtggtctt catcaaacac

42121	tctatgaaaa gcaaatcaga ctcaaatgtc aggatttgtg ctttacagat cgatccggta

42181	agatgaaaga acttcctgaa agtgtgtgaa ggcctaaagt cagggctgtt catggaaggc

42241	actgactaca gaatgaggtg ccagaagcct agtcagagcc tctagggaat aaagtgtcag

42301	atgatcttct aaaaaagttg aagtttcacc agtaacagaa tggccccact attaaaatgt

42361	gagcaaactc agaagtcatt gtagcatata gaagcacaga cctatggatt gctggatgga

42421	gcccaggtat tcactccatc ctgaatagcc agctggggag ctagctcagt cagttaagta

42481	tttgctatgc aaatctgagg accagacttt ggtctcctgc atccacagaa atggtgcaca

42541	cttgtaatct cagcactggg gaagcagtca gccagatcca acagctgcct agccagcgga

42601	aacagcctta tcagaaactc atgggtcctg gtgaaagata ttatctcaaa taacaaggtg

42661	ggaagctcct gaaggacact ggaggttaac ttctggataa acataggctc gccccaccac

42721	cagtgagcat gtgcctaaat ccgtacataa caatgatgta aagatggaat tcattccagt

42781	gaaaagtaag cctcctggac tctttttttt tttttgttgc tagatattct cgagacctca

42841	ggagagaagg tttgccatca tctatataac atggtactca acttccctgt agtccacaac

42901	attcctattt ctatatgatg gagaagaggc cactgcccct cccagacatc tcagtctcaa

42961	atttgttacc agttccctct cctaataagt gcttagggtt agtgttgtag agaagggctt

43021	tacatgaagt gtgtgtgtgt gtgtgtggtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg

43081	tgtgtgtgtg tgtgtgtaac ctaaaggctt tccatgtttc cacactgaaa ggttcttaag

43141	actgagaaca accagataag agtccaaatt ctagaaacca tgggaaagtg taatattgaa

43201	agtcagaaca aggcatggtg gtgctcacct tgaaacccac cacttggggc agaggcagtc

43261	agatctctgt gagttcaagg cccagcctgg tctacagact gtacatagtg agttccaggg

43321	ccagaactac atagtgagat cttgtctggc caaaaatata taagtaaata aaataaatca

43381	gtacatggta acttgttctt atttcagtgt ctgtttctca agcatgactt tggcttaagg

43441	atttttccca acttgttttt gtgattgcca ctgtatcatt tctttgtgtg aagttactaa

43501	gtggtttctg tatttgatat tatgttctga cctagtttct tttcatatta aacccatttg

43561	tatatgaaaa ctgcaaagaa gtgggttttt tgttttttgg gttttttttt gtttgtttgt

43621	ttgtttgttt tttcttggtg ttctcatgtg acctttccaa tgtttgcttc cagaatagac

43681	ctgcaagttg ggatccacac tgccatctga agtcctgcac cccaagtttc aggtatgttt

43741	tgatggcaga atagcttttc tagactgtga caataggggc ataaagccac aaagcattcg

43801	ctttcctaca ggttatgcac ccactctctg agtgattggc tgtgcatcat gaatattatc

43861	aaaatggagg cagttcagtt tggagtgctg tcttttatgc gcttattcat ggcaatgcca

43921	atggaacatt cggcaacata tactactaat catgcatggt aactgaactg tgttgtgcaa

43981	ggaagacctc atatgaccta cctttgcata tgctgacctt ttctgtgaca gactcctata

44041	atactgagag tggtactgta tggaagagtg tgtgaaaatg tattgtttaa ataacagaca

44101	gatgcctcta aatacaacac ccaagcagag aaatggagca tcactggcac tttggaggcc

44161	tctgggtaac ctttccagat cacactgttt tccttcctcc accaataacc actttccctt

44221	tggatgctac tcatagttaa catctttact tttgttgttg tcccactgat gctaagaaaa

44281	ataacttcaa ctagcaagca caacactaga tgaattaaga gtgatattga ctgtgtgtgg

44341	tgagtctcag aagactagct gcctcaggat tcatgaatgc ttacaggaac cctttagcaa

44401	ggtcaggaat gagtcttagg atccatgtgg ctcatagtct ccagcctgga catggagtag

44461	cacagtgtct gagtgcccca agggaatggg cttgttcagg ctcccctccc cgtccccagt

44521	tccaacaggt ctcagatcca ggacatcaga gctgagtgaa gagcagagct aaaaggagca

44581	ccatcggagc cctagaagca gaataggggg ggacacagca cacagagaca agaactgagg

44641	ccaggctgct gtgtgctttg ggcctaagtt gacagatgaa acatggtagg gtgaccacat

44701	ggaggatgtc tgtgcacatc catcaaactg gcaggtcccc ccagcatttt ctgggagctt

44761	ggggtcctct tttccatgat cttcagcttc tgtattctat gtgcgctgtt accatttcat

44821	cttggtagag tctatccttc tgttatttct tgagagtatg tcccaattct tgcctggagg

44881	tttggctaaa tatagaattc taagcagagg gtcatttctc cttcagatat ttaaagacac

44941	tttctgtatt gtgcctcatt gccattgttg atatacctga atctaaattg atcccttggt

45001	gcgtgactta tccccacagc caagggcccc ttcccttctg gtctgtgctc tggaagtctg

45061	caggcacatg gtatgggtag ccactgtttc attcatagtt caatgctccg ataggccctt

45121	ttgatttgat aactctatcc ctttccccca ttcccgttga tgatttcttc ttttgttccc

45181	cttttgatat agtttccttg ctgatgctgt gctaaaatat tcctaccaaa aacaacctgg

45241	ggaggagagg cttcatttgg cttacaattc cagctcacag tcattgaggg aagtcagggc

45301	aggaactcaa ggcagggagc atggaggaat tgcctgctgg cttcctctct gacttactca

45361	caggttcttg taggctagct ttctgataac atctcaggac cacctgctta gcaatagtgt

45421	ggtccacagc aggtttgaac cttctgcatc agttactaat caagacattt gcccaaagac

45481	atgcccacag gccagattga tgtaggcagt tcttaaatca agtctttttt gtcaagtgac

45541	tctagactgt caagtcgaca gttgatgcta actaggacac tattctacca cttttcttgg

45601	tagaaatatt attcggatat tggagttctt ggactagttt ttctggttct ccttttcttt

45661	cttttcctgt tatttatatt tgttttatga gatagggtct ctctgtgaag ttgtcctaga

45721	ccttctggcc ctcctgctta taattcctaa gaactgatat tacaggcagg tgccatgagc

45781	ccaacgtttt ttcttttctt ttcactgcac tctgtttgag agtctcatcg tcacagtcat

45841	tcacatcttc tattgtcttg tttttctttt taaatgtgca ttggtgtttt gcctgtatgt

45901	atgtctgtgt gagggtgtca gatcttggaa ttacagttcc aaataatatt tctaccaaga

45961	aaaagtggta gttgtatcct agttggcatc aaatgtcacc ttgacagcct tgagtcacct

46021	gagaagaaag acttgattta ggagctacca tgtggttgct ggtaattgaa cccaggacct

46081	ctggaagagc acccagtgct cttaactgct gagccatctc tctggcttcc ttctattgac

46141	ttttgcaggc ttctttcttg ttcttttgca atttcatggt ctctgactgt tcttcacaga

46201	ctcttacctc atgcttaaga tgtctcttac tccttcaagg atactgagtt tttgaagttt

46261	taattctcct gactactgtc ttttccctcc tgtttgtcat tctctgtttg ccctggcctc

46321	tgtctttcat gcaggaagac ttttcatttg cttttaggtt tttattttaa ctattggttc

46381	atgactaaag ggctagatga aaaggccagt gagaaggctg gagcatatgg gtgatacttg

46441	tcaaccggga gcctcactgt ggaatgcttc agtggcatgt gaaatcctgt ggtatttgct

46501	caggcaagtg cagctgttga atgcagacca gagcagcttc cttcgaagga gtcagatgtt

46561	gctgactgtc tttctgcagc tggtcaggaa ggtgggatag acttcagctc ttttcaaaca

46621	gtggtcacca aacaaccact tgcccagaga ctttgtgctt taccattctc agagaacaga

46681	cctctggatg gccccatggt ggaagcagcg cacctgtcta tcacaggtgc tctgaaggag

46741	ttggaagaac tacccattgt ccacatttcc cacattttca catgccagct tcactctggg

46801	atctgggtga cagtggggct gacataatgg caggggttgc agtttcagac tcagagtatg

46861	tggtaggaat gctgctgtct gagggaagac tcatctgagc agtggaggct ttgcctgttc

46921	cctggcatca tttgacctgc ccctccttag aactgggaac cccagttcta aagctccctg

46981	ctttaaagat tctgtgttgg ggtaagttct tagctttctc aggctaggtc ctctgctctt

47041	gggtttccac ggcactgttg ttttccctct ggctttgtga gtggttgtct tttgaaaaac

47101	tagttagttt ggaaaatttt gggagggagt caaataagat gtatgcattt tgccatgtaa

47161	gtcctaacca agccatctgc tgtggtattt tcctgagttt ggttctgccc ctataggcag

47221	agtctgtcat cacagataat tgcattttga acttgagcat ctcccttcct tctttgtctg

47281	cctgaaaaag tctctttata aaaaaatgta atgttaattt aaaaagtatt cattattctt

47341	gtgttgtgat acatgagtat atatatgcta tgatgcatat gtgcaggttg gaggacaact

47401	ttctgtagtt ggttctctct ttctcccttc atgtaggttc tggggatcga acccaagtca

47461	tcaagcttgc acaacagcac ctttaccttc taagccttct catcagccct ttttttattg

47521	attgattggt tgattgattg attgattgat gctagggata gagcctaggg tcttttacat

47581	gctaagaaaa tgctctacca ctgaactgca ctcctagccc aacctgctaa attcttacac

47641	tgtcttcaaa aagaagctct gatgctggat tctgcaaagt ccatttttat ccctaaattc

47701	ctaaagctgt ttaaatctcg tgagtcttac tgtacagacc agctctgtgc accatcttcc

47761	acaatctcca tgacctcctc aggatgggct ggtatctctg cagctctgcc cagtgcctac

47821	caggaactta caggtgtcac caatgaattt attggtgcat gctcacttca tcttgtccct

47881	atccactttc tgctttgact ccttctggta agagacaagt gtgttaacta cttgtgctat

47941	caccacacag aaatccatat cccataatct tagtcctttt tatttactta tttttgagac

48001	agggtcacac tctgtagctc ccacactggc cttaaacact gacctcgaac tcatggtgat

48061	tctcctgcct aaacttctca aataccatga ttacaagagt gacacaccat gctgggagtc

48121	ataatcttaa gtttaaaagt gagggactgg tcagtttact gtgctaggtt gacattgtat

48181	agaaatgaac agccatgttg gtctggaaat gttcctagtt ttcatttgta caaggatatg

48241	cagtgtgtga aatagggaga gtcttaccta tgtgggtttg atcacagcaa ttaataaaat

48301	atgctctaaa taatgaaaaa agccagtaac tagtagtgtt tctgaatcct cactaaagct

48361	ttaatacatc ataaataata tatcactgca gattatgtct acatgttata catatcacat

48421	ttatagtaca atctgatctt tgtcacctac tgtaagcaca actgaaaaac aaattttctc

48481	atagctcaat attaagtcat tattatcccc ataataagta attattatcc ccataatgaa

48541	actatctatt gagggagtca gaatctgaga tagttaaata aatttaagca tgtattttta

48601	gtgtcaatgg taaaaattaa atgttcataa agcctgtatg actcctttta aagtagtttt

48661	aattttatgt gtatacatat atgcatgttt tgccttcttg tatgtctgag taccacttgt

48721	atgtctggtg cctgaggagg ccagaacgta tcagatcccc tgaaactggt attacagttt

48781	tgagctacta tgtggctgtt gggaattgaa cctggatgct ctgaaagagc agccagtgct

48841	cttaatgact aggccatctc tccattttct taaaaaaaaa tttaaaacat ttactctaag

48901	atttactttt atgtaggtgc gtgtgtgaat gtgtatggtt tatgcattgg ggtggggagg

48961	atggattagc acagtcacag aagactagag gagggtctct actattgctt tctgtcttct

49021	acccttgaga cagggtctct cactaaacct gaaactcacc tttgcagctg gggtagctgg

49081	tcagaaagat cctggaatct gtctttctcc ctggccctaa tgcttgagtt acaggcccat

49141	gtgaccatac ctgtcgtttt actggggttc tacagagtca aacccaagtc ctcacgcttg

49201	catagccagc gattttaccg actgagacat ttatctgccc caattcataa ttcttctctg

49261	cttccattaa taatcccatc tatgtcccct tcatacatat ttctgaaata gacaaaatga

49321	atacaagtta gacatcgagt ctgattaatc ttcaacttct ttgataacca ggtattgatt

49381	tctgactttt gaagatggat gaaggcacag aagtctccac tgatggaaat tccctgatca

49441	aagctgtcca tcagagccgg cttcgcctca caagactttt gctcgaaggt ggtgcttaca

49501	tcaacgagag caatgaccgt ggcgaaacac ctttaatgat tgcttgtaag accaaacaca

49561	ttgaccagca gagcgttggt agagccaaga tggttaaata ccttctagag aacagtgctg

49621	accccaacat ccaggacaaa tctgggaaaa gcgctctgat gcacgcatgc ttggaaagag

49681	cgggcccgga agtggtttcc ttgctgctca agagtggggc tgacctcagc ttgcaggacc

49741	attctggcta ctcagctctg gtgtatgcta taaatgcaga agacagagat accctcaaag

49801	tcctccttag tgcttgccag gcgaaaggaa aagaggtcat tatcataacc acagcaaagt

49861	caccctctgg gaggcatacc acccagcagt acctcaacat gcctcccgca gacatggatg

49921	agagccatcc gccagccacg ccttcagaaa ttgacatcaa gacagcctcc ttgccactct

49981	catgttcttc agagacggac c

SEQ ID NO: 3 (Chromosomal region 5,000-55,000 basepairs downstream of CHO GS gene

coding sequence)

1	GGGCTCAGGC ATTTATCGTT CAGAGATTGA CTGAGCTGTA AAGATGGAAA GACAAACTTT

61	TTTTTTTTTT GATTGAGTCG GGGTTTCTCT ATGTAACAGC CCTGGCTGTC CAGGAACTCA

121	CTCTGTAGAC CAGGCTGGCC TTGAACTCAC AGAGATCTGC CTGCCCCTGC CTGTCGAATG

181	TTGGGATTAA AGGTGTGAGC CACCACCGCC CCGCTGACAA ACTAGACTTT TAGAATGTAT

241	TATGAGATAA GGTTTTGTTA TGTTGCCCAG GCTGGACTCA GATCTGTAGC AATCTATCTG

301	CTCCAGACTC CTGAGTGCTG GGATATACAG ACCTGAGTTA CCTGTACAGC TTTCTAATCA

361	TCCCCCGCTC CCCCAGAGAC AGGGTTTCTC TTTATTGTTT TGGAGCCTGT CCTGGCACTG

421	GCACTCACTC TGTAGACCAG GTTGGCCTCG AACTCACAGA GATCCACCTG TCTCTGCCTC

481	CTGAGTGCCG AGATTAAAGG TGTGCACCAC CAACACCCTA CTTTCTAATT CTTAAAGCAA

541	GGCTCCCAAC TCCTCCCTTG TGTGTAATCA ACAAGGTTCT TAGACCCTGT CTGCAGTGTG

601	GATTCCCACT AATAAGACAG TGGCGGCACA GTGCTGTGTG GCAGAGCAAG CGTCCATCTA

661	GTTCCTATTG TCATTCTATG ATTTGCTCTT CTGGGAGCCT TGTCATTCAG CAAGTTCCTG

721	GGCTTGTCTT GGGATTGCAA TGTGCCTCAG CTTGGCTAGT TCCTCTGCGG CAGAAGCAGT

781	GTTTGAACTC AGTGGGCACT CAGTCACTAC ATCTAACTTG TTTGAGGGCT CTCTGCATTT

841	GCTTTCCAAT TAAGGTTTAG GATGACTCCT CCCTGTGACT CTTATCATCC TGCCTATTAA

901	TGCTAAATTA GAGAGGCATT CAAGATAACT GCCGAAGATC TAATAAATAA ATGGGGTGGG

961	TGGGTAGGAC TATAAACCAG TTTATAGCAT GCAAGAAAGC TCTGAGCACC ACATTCAAAA

1021	ATAAAGTGCT GTGAGCCTGG TGGTGGTGGC TCACACCCTG ATCCCAGAAC TCAAGAAGTA

1081	GACAGAAGGC TCAGATTCAA GATTCAAGTT CTTCCACTAT ACAGCCAATT TGAAGTCAGC

1141	CCAGACTACA TGAGACCCTG TCTCAACTAA GCAAATGAAA GCAAACTGGG GTCCAAATAG

1201	GCACTATTCG ATGTTTTGAT GCAAGTTTGT GACTGAGGAG TGGAGGTGGC AAATGAAGAC

1261	TTTTTTCTTC CTCTTCTTCT TCCTCCTGGG TCCCGTTTTT TTTAGGGTGT TCTTAGGATA

1321	TGTATGTCTC ATTGGCACTA CTAAGAAGTG TGGGGTCTAG GGAACTTCCT GTTATGTATA

1381	CAAGCTAATC TTCAAACAAT TGTGTGGGCT GTTTTGGTAA CTACTCAAAT AATGCTATAG

1441	AAAATTGTAC AATATATTGG GGAAGGAAGG GAGTTTTACA CAGGAGTCAA CATGACTCTT

1501	GTCTCTGGAA AGCAACTTGT GATCCAATGA GGAGCTAAAT TTAGAGACAC AATTCAGGAA

1561	GAGAATCCAA TCAGAGCTTC CTTGTAAAAC AACTCACCTT CACAAACAAG TTCATTCCTA

1621	ATCGAATTTA AGGTCTAGAA ACTGCCAACC TATTAATGTT TCTATAAATA CACTTGGGGT

1681	CAACTACGTA GCCAAGGAAA TCTTTAATAA ATTGAACACA AATTGTCAGG GGAAGGTTAT

1741	TGCTGGGACT CCTGGAAGCA TGTATAAGCA GGGTAGGGGT GACATAGGGG TGGGGGGCAG

1801	TTAACTCACA GATATTAGTC TCAGATATTA ATGGCTTGTG TGTGAGCTGT CTGCCACACT

1861	TAATGTCAGT CACCTTGCCC GGAACTATTT TTCTCTCTGA TTCCAAATGT AGCTATTGGT

1921	CTATTAAATG ATTAACTTCC ACAGAAACTG ATAATATCCT TATGGAATCT GACTGTGGTA

1981	AGCCTGTACA CCCCCGCCCC AATTTCCTTC TAGATTTAGA ATTCCATTCC ATGAGCCATC

2041	ACACCCACGC TGAAAAAAGA AAACCTGTTG AATCAAATTT GTGTTTTGGA GGGTAAGAGC

2101	CACCCTTCCA ATTTATAAGG CTGTCTATTT CTTTGGGGGG GGGGAAATGA ACCAGTATCT

2161	TCTATTAGTA AAAGGAGTGT TTGAGCATGG GCACTACAAC CCACTTCTTT CAGGGAGATT

2221	CATTTTTCTC TGAGAACTCA GCCTCTCTGT GCTGGTGCCA CAGGAATTCT TAAACTCTTT

2281	CAACTCTCCA ATTAACCAGA GAGCAAACCC AGCACTTTCC ATCTATGAGA AATCTACACC

2341	ACTCATGGAA TCATTGTGTG CCCTCTCTCA CTGCCTAACA GGGGTACCCT TGCCAAAGAA

2401	AAGCAACTTA ATGCCAAAAA GGTGCATCAC CTGGCACTGC TTCCGAGGAT GGGCAATGTG

2461	CAAGCACTTT GTTCAGTGGC TCTGCCTTGG GGTCTCTTGA GGGGCGGCAG GTTACCTGGG

2521	GTGGGGGCGC ACACTCTCTG AAGGTGGGCT GCGTTCAGTT TCCTGCTTCA GGGGCTCCTT

2581	CATAGTACCG CCCCCTGATG AGTTTCTGCT CAGACTGGAA GGTGTCAGGT CCCAAAGAAA

2641	CCTGGGACAA GGCTCACTCA GTACCTGTCG CTTCTCCCAG CACGTCTCAC CCCACCCCTA

2701	CCCTAAACTT CTCTAGCCCA GAGGCTGGGC TCCCCCTTTC TCTTTCCTAC ATAACCCTGC

2761	CATTTTAGCT GTGAGCTCTC TCCGTCTTTA GCTCCTCTAC TGTTCTTTTA TCCTCTCTTT

2821	TCTCTCTCCT CTTCTTCTCT CACCCCCACC CCCACCCCCA TCTCTCCCCC CATGGTCTGG

2881	TTCAGTCTGG ACCCTTTCAG ATGCCTCTGT CTGAACTCTC CCTCATATCT CAATAAAACC

2941	CTTCTCTTCA GCCACGCCTT GGAGAGGTCA TAGGCTCATT TTCGTTCAGA AGGCCTATCA

3001	AAGAATCTGT GGGCTTATCT TTACATTCAC AATAGGCAGC TTGGCCCTGA GACCACAGTC

3061	CAGGTTAAAG TGTTACCTTG GAAAGAAAGT CTTTTATTCA AGGTGTCTGG TTTCTTTTCT

3121	TGTTTTTGTT TTTGTTTTTG GAGACAGGGT TTCTCTGTAT TATTTTGGAG GCTGTCCTGG

3181	AACTCGCTCT GTAGACCAGG CTGGCCTTGA ACTCACAGAG ATCCGCCTGC CTCTACCTCC

3241	TGAGTGCTGG GATTAAAGGC GTGAGCCACC AACGCCCGGC TCAAGTGTCT GGTTTCTTTT

3301	GATGTCTTTA GTTTCTTTAA TCCCATAATT CCTTTAATTA TACCCTCTTG TCTGTCGGAG

3361	AATGACATCA AGGATATCCA GTTCAAGGTT TCCTATGTAG TTCAGTCATA GAGTGCTTGC

3421	CCAGCTGCCA GACTCTGTCA GATGCCCAGC ACCACACACA TACAAAGCAT TTCCAGCTCT

3481	GTGTCTGTGT CAATTACTCC TGTCTGCTTC TCCATCCCCA GACACCAGGA GGGCCCACAA

3541	GAAGCTTGGA GCAGGGAAGA ATAAAGAGAC AATATCCATA GACACACAAA ACCTCCAAAG

3601	TACTTATGCA TTGAGGAATT ACAGCTTACA AATCCAGTCA CAGTATCTAT ATTCATGTTA

3661	GCCTGATTTC AATCCCCCAG CTACATATTC TTCCATGAGC TAGCTCCTTT CCTATTCAAG

3721	ACTCCCTTGA TAATAGTTGT TATCAGACTT TACCCCTATT AAAATATTTG GACCGTTTGA

3781	GAGCAATAGC TCACCTCTAT AATCTAGAAC CCAGGAAGTT AAAACAAGAT GTTTGCTGCA

3841	AGTTTGATGC CAGCCTGGGC TACATAGCAA TTTCCAGAAC ATCCTGAGCT ACAGGGCAAA

3901	ATTCTATCTT AAAAAACAAA AAGTAGACAG ATCAGGTGTT TCACCTTGTT TCAAAAAATG

3961	CAAAAAATAT TTTTTAATTG TAGAAATATA TACGCTAATT CCTTTGGTAC CCTAGGCCAA

4021	GTGACTAGAT GGGTTAGTCT TCCTTCTGGT CCTCACAGAA GAAAGTTAAG TTCTCAGCAG

4081	GAATAATAAA AAATATTAAA AAAAAAAACA AGCTGCAAAA TTCTGTTGTG GTTCTGCCAA

4141	AGTGTTCTCA GGAGTGAGGG CATACTGGGA TTTAGTCAAG CAGATATTTC TGTTTGAATA

4201	ACTAGGATCT GGGAGCCATG GGACACCACC CCCACCCATA AGGGCTACTG AAAACCACCC

4261	CTGGAAATCT GTAAATATTG CTAAGGCTCT ACCCTTTTGC TCAGAGAACA ACCACCCACA

4321	AGGATAGGGG ATAAGTTAGT TCTGTAGTAG AGTGCTTGCT TAGCACACAG AAAGTCTTTC

4381	TCTCTCTGTC TTTCTCTCTG TCTCTGTCTC TGTCTCTCTC TCTCTCTCTC TCTCACACAC

4441	ACACACACAC ACAAACAAAC ACATGAGTGC ACAAGAAACT TCTAGGTGCT ACTAAACTAA

4501	TGTAAAATCA TGCAAAGTTC ATAGAGAATT CAACAGCTAG TGACAGGATG ACCCGAACAC

4561	AAGATTCTGC CCTAGTCCTT GTATTCTGTA GTCCCCAGTT TCTCTTTACT GCCACAGTCT

4621	CCTATCTCTG ACAGCCTCCC TCTTTGCAGA TCTGGCAGTT TCTGGGCCTG GAACTGCTTT

4681	GGTAGAATGT CTGTACAGCA TGCACTAGGC ACTGGGTTTG ATCCCCAGCA CTGCATAAAT

4741	CAACTTTGAT GTCACACCTA TAATTTCAGC ACTTGGCAGG GATCGAAGCA GGAGGATCAG

4801	AGGTGAATCA AGGCCAGCCT GGGCTACTTG AAACCCTGGG GAGAGGGATA GAAGAAGGGG

4861	GAGGGGGGAG GGAGAAGAAA GGAAGGAGGG GGAGGGAAGA GGAGAGGAAG AGAGGAGGGA

4921	GAGGGAGGGA AACAGGGAGG GAGGAAGAGA AGGAGGGAGA GAGGGAGGAG GGAGGGAGAG

4981	ACTAGTGTAA GCAGAACCTG TAAGTTCTCT CCTCAGCCTC AACACACCCC AGCTCCCTGC

5041	TGTCTCCCGG TCCAGGGCTT CAGGGCCTGG CAGGACAGGC AGCAGGTTGT TTTGCTCTCA

5101	TAAAGCCATG TTACATAACT AACTAATGTT TTGAGCAGTG GAGCTGAGCC AATCTAGGTC

5161	ACATCAAGAG GGAATGGGGA AAGAGGATGA TCACGGAAGT GGTGAGAGGA AGGGAAACAA

5221	GAAGGGAGGA ATAAAAAAAA GAGGCGAGAG TGGAAATGGG GTGCGATTAT TTAATATCTG

5281	CTGCCTGTTC ATAGTTCCTG GTCCTTAGGG ACAGCATATA TTATCCTGAA AAGTCCTCTC

5341	TCTATTTTAT CTAGGCATTC TGTCATCCTA TAGCCCCCAC TCTGGATGGC TGAACTCTGT

5401	GCCAGCAGCC TGCAGGTATC ACCCCTTATT GGAGTGAGGT CTATTCCTTA TTGGAAGCAG

5461	TGGCAGGCTG GTAGGAAACA AACAGGCCTG GTGTTGTGGA ATGCTGTCCT CCCAGCATGA

5521	CCATCATTAG ACCTTATGGA AGCAGAGCGA GGGGGGCATT GTCCTCCTCC CCAGGCTCCT

5581	GCAAGCCTAC TCAGCTCAAC TGGTTCCCCG GGCCAGACTT AGGTGCAAGA GTTGCTTTGG

5641	TTTGTTATTG GTGGCCTGTG TAGCTGAGTA GACACATGCT CACCTACATG ATATATGATG

5701	GCTTGCAACC TTCTAAAAGT TCAGTTTCAG GAGATCCAGA ACCCTCTTTT GCCCTCCAAG

5761	GACACCAGAC ACCCATGTGG TACCCATACG TACATGCGGG CAAAACACTT GTGCATATAA

5821	AATAAAAAGA GATGGCTCCG TGGCTAAGAA TGCTCCCTAC CTCCAGCTCA CCCACATCTT

5881	CACAACTGAC TGTGAATCCA TCCATGGTTC TCTTCTGACC TCGGAGGGCA CCTGTGCCCA

5941	TGGGGCATAC ACATACACAT ACACAAAACA AGTATGTAAA TAAATAAATA TTTAAAATTG

6001	GGGCTGGAGA TGGCTTAGTG GTTGAGAGCA CTGGCTGATC CTCCAGAGGT CCAGAGTTCA

6061	ATTCCCAGCA CCTACATGGT GGCTCCCAAT CACCTAAAGT GGGACCTGAT GTCCTCTTCT

6121	GACATAAGGT CATACATGCA GATAGAGGAC TCAAATGCAT AAAATAAATA AATAAATCTT

6181	TAGAAAATAA GTACATAATA AATAAATATT TAAAATGACC CAAATTAAGA AAAAAATGAA

6241	GCCAGGCAGT GGTGGTACAC TCAGAAGGCA GAGGCAGGCA GATCTCTGAG TTTGAGACCA

6301	GCAGTTCCAG GACAGCCAGA GTTACACAGA GAAACTCTGT CTCAAAAAAA AAAAAGAAAA

6361	AAAAACAGAG AAAGAAGAGA GGAGAAAAAC AAGAACAAAA AATAACAAAA CAAAAACATG

6421	GCTTTCCCTT CATGGCATCT GCTTCATCTG CCTATTTGGT AATGATCAGG GCACTACACA

6481	CCCAGTGCTT CATACCCTGG CCATGTTTCT GTTCTTGGTG TCACCACCAA GTTTACTAAA

6541	GATGGTTCCA GAGTGACATT AGCAGCCCCA CACCCCAATT GCAGCTAGCA GTTGAGGAGA

6601	TTTCTGGCTT TTTGTCTAAG AGGAAGGTTC TTTGGCTAGG AGATATACTG AGAAGGACTA

6661	GGAAAAGGGG TGTCTAAGAA ACTTGGAGAG CACATTTTTC AAGTCAGAAA GAACATAGAC

6721	ATATTCTGGG GGTGGGGGTA GTAAGATAAT GGACCCTCCT AAGGGAAGGA TTGTGGGGTT

6781	TGCCTGAAGG GGCTGAAGCA GACCACTGAG CAGGCCAGAC CACCAGCAGC TTTTGAGAGG

6841	TGGGAACACT GCAGCTGAAG TCACTTGTCA CCTTCCCAGG TAGTTCTTAC TTCCAGCTCT

6901	GGCAGGGCTA GATAGCCTAG GAACTCCCAG ATAGGAGTTC TAGTTCTTCT TCTCCCAAGC

6961	TGACAGAACG TGAGCTCAGA GTCTAGGGAC ACTCCAGGTT AAGGACGGGG CCATTCTTGA

7021	TTGTCAGCAC AGATAGATTT TAATTAGAGA GCAATGACAT GACAGATAAA CAGCCCCTTA

7081	TCTAAAGGGG TACATCCCAA GACCCTGGAG GACTCTTGAA AACCCAGATA GGAGCCAGCC

7141	ACGGAAGCAT ATACCTTTAA TCCTAAGATT TGGGAGGCTG AGGTAGGAGG ATCTCTGTGA

7201	GTTTGAGGCC AGTCTTGTCT ACAAAGTGAA TTTTGGGACA GCTACACAGA GAAACCCTGT

7261	AAGAAAAAAA AAAAAAAGAA AGAAAGGAAG GAAGGAAGGA AGGAAGGAAG GAAGGAAGGG

7321	AAAGGAAGAA AAAGATAAAG GAAGAAAATC CAAATAGGAA AGAATCCCAT ATATACCATA

7381	TTTTTCTTAA ACATACATAG GTTTATTCAT TCTCTCTGTG TCTGTGTGTC TGTGTGTCTG

7441	TGTGTCTGTG TGTCTGTGTC TGTCTGTCTG TCTGTCTGTC TGTCTCTCTC TCTCTCTCTC

7501	TCTTTCTCTC CCTCTCTCTC TCTTTCTTGT CTCATAAATC TCAACACTCA GGGACCCAGA

7561	AGATATCCCA GTGGTTAAGA ATACACACTG CTCTTGCAGA CCTAAACTCA GTTCCTTGTC

7621	CCTACTTGGG GCAGCTCACA ACCACACCTG TAAGTCTAGC TCCAGGGAAT CCACACCTTC

7681	TGGCCTGTGC AGGCACCTGT GTGAAGGAGC ACATATCCTT CCCCATAATT AAAAAACAAT

7741	CATTGAAAAA TAAAACTCAA CCCCCTCCCC CGGGACTCAA ACCAGAGGTA GTCTCCCTGC

7801	CGTAGGCGCT CAAAAACTGG ACTTTCAGGT GTGAGCCTCT AGGCCAGGCT GCTTTTCTTA

7861	ACTGGCTACC GTGCTCTTGC CTGAAACTTC CAGCTTGAGA CCTCATAGTA AAAAGAACAT

7921	ACACGTCTTC TGTCTGTACT ATTTTACAGA CGGCTGACAT GTTCATACCA CGTATTTTAG

7981	CAATTTCAGC ACTTGGTATA TTTTCTGTCA TTCTCAAATA ACTTTCACCT TGCCACTTAG

8041	GGCAGTCCAA GGCTCCTCTT AGATATATCC AAATTATCAG CCACCACTTC TGCCTTTACT

8101	AAGTAAGACA GGGTACTTAA CATGGAGTAC TTAACACAAG CACTGTGATC TGAAGGTGGA

8161	GACTGCTTGC TACTCAGTCA CAGCTTAGCA TTGCTAGAAC AAATCCTGAA CAAAGGGTAA

8221	TTCATGACCC AGGCAGGGCA GAGGCGGATG GCTGTTCTTG CTCCTCAGAA ACCCCTGTGT

8281	ATAATTTCAA GCTTAGGAGT TGTTTGTCTT TGGATGGAGA GGGTCAGACC TAGGGCTTCA

8341	CTCACACTAG GCAAGCACCG CAGGTCTACC TTCGAAGAGA AGAATTTTCA CTTAGCGTTT

8401	TCAGATATAG GTCAACCTCA GCTGGCTGAA ACTTTGACTA AGTGAGCAAC TGTGAGGGTG

8461	GGGAACACAT GCATGCATTT CTTCATGTTA TAACATCTAT TTATACATAA ACATATCATA

8521	TAAATATATT CTATTGCATA TAAATATACA TAAATGCACA CTCATGTATA GATATCAATC

8581	ACATAATTTA TGCTTTTATT CATAGATTAT CTCTGGGAGG TGTACAATTA CTGACAATAC

8641	CTGCACATGA TAGTACACGT TGTTCTAGTT AGGTTTCTTT TGCTGTGACA AACACCACAA

8701	CCAAAAGCAA CTTGCAGAGG GAAGGGTTTA TTTCAGCTTA CAGTTGTATT CATTATGAAG

8761	AGTTGGGAAG TCAGGACAGG AACCTGGAGG CAGGAACTGA AGCAGAAACC ATGGAATAAT

8821	GCTGCTTACT GGTTTACCCA CCATGACTCA ACCTGCTTTC TTATATCACC AGGACTGCTT

8881	GCCCAGGGAT AGAACCACAC ATGGGGACTG TACCTCCCAC AACAATCATT GATCAAGAAA

8941	TGCCCTAGAG TCAGGGATGG TGGCAAATGC TTTTAATCCC AGCACTCGGG AGGCAGAACC

9001	AGGCCTTGAC TGTGAGGTCA AGGCCAGGCT GGTCTACAGA TTGAGTTCCA GGACAGCCAG

9061	GGCTACTCAG AGAAACCATG TCTCATGGAA AAGAAAAGGA GGAGGAGGAG AAAGGAGAAG

9121	GAAAAAGAGG AGGAGGAGGA GGAGGAGGAG GAGGAGGAGG AGGAGGAAAG AAGAAGAAGA

9181	AGAAGAAGAA GAAGAAGAAG AAGAAGAAGA AGAAGTAGAA GAAGAAGTGT CCACTGGACA

9241	ATCTGATGGT GGCGTTTCCC AATTGAAGTT CCCCTTCCAA GATAACTCCA GGATGTGTCA

9301	AGCAGACAAA AACAAGAACC AAGACACATG TTTATAATCC CAACACTGGG GAAGTGGAAT

9361	AAGAGGTTTG GCAGTTTAAG GCCATTTTCA GCTACATAGG GAGTTCCAGA CTATCCTGGC

9421	TACATGAGAC CCTGTCTCAA AACACCAAAA TGCAAGGGAA AAACAAAAAG CAAAATAATG

9481	AGTACAAATA GCAGTGACAT TCTGGGGAGA CAGCCTGGAG GGGGGGATTG CTTATTATCT

9541	CTCCCTACCG TTTGGAGTTT TTAAAATCAT GAATCTAACC CCAGAAAAAA AAGCATTGAG

9601	ATTCTGGGAC ACTCGGGTGG TAGAGAAGAT CATCTGATCC TGTCACCTTT CGGGTACGTC

9661	ACTTTATTAA TCTCTCTGAG ATTCAGTTTC ATCACCTCTG AAGTGGTTTG TGTCGACGTA

9721	CAGTCCTCAG GACTAAGTAA GGCCACTTGG TGGCTGTGCC AAAGCACTGT GTCAGGGACA

9781	CGGCAGATGT CTGACACATC TTGTTAGATT CCTTTTCTGT CCTCCGCTCC CCTACCCCAG

9841	AGGTGGGTAC AGCCCCATGG CACCTCATCT TTAATGGCTT GGGTTTCTTT TCTCCAGCCA

9901	GGAAAGTTGT CGCTTTGGTG ACAGCTATTT TAAGTCAACT GACCTTTCCT GCAAATGATC

9961	CAGATGCCTC TATCTTAGGC TGGTGATGAC GAAGATGGCC TATGACGGGG TTCCTGGGGG

10021	TGTGTTGGGA GGTGGGGCAG GGGTGGGGCC CGGCATTTGT CAGACCCATA TGATCTTCTG

10081	GCTCCCGGGC TCTGCAGATT TCTCCTGCTG GAGATGCCTA CCTGCCAGCA ATCTTGGAGA

10141	AGACAGAAAT AGCAGCTTTG GGTTCCAGGT CCCCTCCTCC CTTTGGCCCA ATGTAGCTAG

10201	AGCTTTGGTT TCCTGCTGCT GTCTTGGTGC CTGGAGCCCT CTCTGGATGG TCATGGAGTC

10261	TTGTCAGAGA AGCAACTTTG GGCTGGCAGA CAGTCATTCC AGAAGACATG ATCTGGAAAA

10321	ACTGCTTCAT CGTTTCCTTC AGAGGCACTG TCCCGAGCCC ATTTCCTTGT CTGGTTCCTG

10381	AAATCTCAGG GATGCCATCA GAAGAAGGTG TTCTTGTGTT TACTTTGGAC ATGGTTTTCT

10441	GTAGTGCAGA CTGCCCTTAA ACTCTACGTA GCTGAAAATG ACCTTGGTCT CCAGACCTCT

10501	TGATCTGTCA GCATCCCTGG GAAATCCAGG GTTCTGTAAT CCTCCCCTCT CACCTTGACT

10561	TACTGTACCA GCATCAAACA TCCTAAACAA ATCCAGTGTT TAGCCAAATA CAGCGGTGCA

10621	TGTCTGTAAT CCCAGCCACC TGGGAAGCCG AGGCAGAAGG ATTAAGGGAG CTGGAGGCCA

10681	GTCTGTGCAA TTTAGCAGGA CTGTCTCAAA ACAAAATTTA ATGGTTAGGG GTGGGCATGT

10741	CATTTATTTG ACTCTTATCA CATGAACACA CCTGTAATCT CATCACGAAA CGACAAGGCA

10801	GGAAAATCAA AAGTTCAAAG TCATCTTTGG CTACATAGCA AGTTCTAACC TGACCTAGGG

10861	TATGTAAGAC CTTGTCTCAA AAGCAAACAA ACAAACCCCA AATAACAACA ACAACAAAAC

10921	AAAAAGCAAA CAAGGAGAGG GTGTGCAGCT AGGGATATAA TTCAATGGGT GAGGGCTTAC

10981	CTCACATGCA CGAGGCCTTG GTTTCAACTT CCAGTTGAAA TGAAGTTTAG TGGTAGAGTT

11041	CTGTGCAAGG CTGTAGTTTC AGCTCTCCAT ACTGCAAACT GGAAAGAACA ACAGTGACAA

11101	ACAGAAACAA AAAACCCCCA CAAACAATGT GCTTTCTCAC TCAATAAAAC CACCTCTTTA

11161	CATACAACTA CAACTGCTAA GAAAGTTCTT CAGTGTTCTA GAGCCTGAGC ACCTCAAATG

11221	GTTTCCATAA AGCTGTATGC AAACACTGAT AAGCCACGAG AAGCAACTGT ACAAAGCACC

11281	CTTTGATTTT CATAGTTTAT CTACACAAGG ATTCTAGGAA AGTGTGCTAG GAAAATTTTA

11341	TGTATCAGCC TTGCGGGTTT GTCCAATAGT TTTAGATTTT GCCAGTGAAG ATTTTCCTTT

11401	CTTTATTTTT TACATGGGAA GGAAGTTTAA TTGGGGGAAG GGACGGGAGT GGGCTTTATT

11461	TTTATTTTTT AATGAGACTA GCATTTGCAT TGGTGGACAT TGAAGGAAAC AGTTTCCCCT

11521	CCCTAATGTG TGTGGGCCTC ACCTAACTCA TTGAAAGTCT TAGATAAAAC TAAGCTGAGT

11581	GAGTGAGTTG GCCCATACCT GTAGATGGAA GGAAAAGGGT CTTGAGTTTT GGTTTATCCT

11641	AGAGAGAACT TGATCCCCCA AACACCAAAC TTTCAAACCA AACCCCAGCC TCCTCAGTGT

11701	GAAGGGATGC TGTTACATGA CCACCTATGG ACTCAGACAA CCTCTCTTCC CTGAGTCTGC

11761	TGGCTTACTC ATCAGAGTCT GGGCTCACGA AGCCGCCACA CATATATGAG CCTCGTTCTC

11821	CCCACTCTTC TCTTGTGGCA CTGAGGTTCA AACCAAGGAC CTCGCACATG ATAGCAAATA

11881	CTGTACTGAA CCATAGAGCC AGCCCTTGTC AGTTTCTTAA CACAAACATA TAGATGTATA

11941	TGTATATGAA TATTTCCATG CTACCAATTC CATTTTCTCA GAGAACCAAA GAATACACCA

12001	AGTAGTCACA CTTGAAATTC TGTTCTGAGA TTGAATAAAA CCTGATCAAA TGTGAATTCG

12061	GTCCCTTCTC CCCCATCCCT GACGCCACCA CGTTGCTATA CAGACCAGGC ACAAACTCTT

12121	CTCCTTGTGA ATGTGTGTAA CACATGTTAC CACTGTGCTT GGCTTTTGTA GTTAGAAGGT

12181	TGGTTGATAT TTAAAAAAAA ACTTTAATAT TTAGTCATTA CTTTTTAGTA AAGATTTGCC

12241	TTGCTTTTAT TTTATTCATG TGCATGTGTG TGTATCTGTG TGAGTGTATG CCACGTGTGT

12301	TTGGGTGCCT CTGGAGATTG GAAAAGAATG TCAAAATCCC AGGACCTGGA GTTCCAGGCA

12361	GTTGTAAACT TCCCAATGTG GGTAATTATA ATGAACTTGG ATCCTCTAAA AGAGCAGAAC

12421	TCACTCTTAA CTGATGAGTT ATCCTTCTAC CCCCAAATTT ATTTGTTTTG TTTATTTGTT

12481	TATTTATTTG AGAGGGTCTC ACTGTGTAGC TCTGACAGTA TTAGAATTTA CTATGTAGAC

12541	CAGACTTGAT AAATGTCTAA CCCTAGAAAA AAATAGTTTT GTTTTGATTT TATGTCTGTG

12601	CCATCCACTC CTTGAACATA TATTTGGTAT CTGTGAAGCC AGTGAAGGCT GTTGGTTCCC

12661	TTAGGACTGG AGTTACAGAT GGCTCTGAGC TACCATGTGC ATGCTGGGAA ACAAACTCAG

12721	GTCCTTTGGA AGAGCAAAAA ATGTCCTTTG ATGGTGGTGG TTTGAATGAG AATTGCCCTA

12781	TCGAGCATAA AAACTTGGCA GCTTTGGCTA CATGGTTCTG GATTAAGAGT CAAGAAGGAT

12841	ACAAGAAAGC GGTTGTGGAA TCATCCCCCA TGGTTAAGGA AAACCACCAA AGCCAGGCTT

12901	GTGGCAGGGG AGTTCCTGCA TGGAGGCCAA GAGAAGCCAC TATGTCAAGC TGTGAAGGTG

12961	AAGCCTGGAT TGTGTTGGAG ACCCAAGCTA CTGGAGATGT AAGAGATGTG AGATAATGCC

13021	CAGGAGAGCT GCAGACAGGG CATGGAATCA GGCCAAGCGA GAGAAGTGTG TTGCAGTCAG

13081	CAGAACTGGG AGGGAAGAGT CATCTAAGTC CTTTGTCATC AGACATAGAG ATACAGGATC

13141	TGAAATTTGC TCTGCTGGGT TTTGGTCTTG ATTTGGCCCA GTACTTCCTA ACTATGTCCC

13201	CTTTTCTCCC TTTTAGAATA CTAATTTATA TTCTGTGCCA TTGCCGGTGG ATCAGGATGG

13261	TTCTCAGATA CTGTTTTAGT TCCATGCCTG TCTACTTCCC GTCATGACAG TCATGCACTA

13321	ACACTCTAAA ACTGTAAGCA AGCTCCCAAT GAAATGTTTT CATTTATAGA GGTGCCTTGA

13381	TCATGCTGTC TCTTCACAGC AATACAACAG TGATTAAGTC AGCTGCTGAG CAATCTCTCT

13441	GGCCCCAGAA GTATGCATGT GTGCAATTGT GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT

13501	GTGTGTGNNN NNNNNNNNNN NNNNNNNNNN NAGGAAATGT CATTCTGTAA ATATGTTTAT

13561	CTTATTGGTT GATGAATAAA ACACTGTTGG CCAATAGGGC AACAAAATAG GTGGGGCCAG

13621	GATATAAGGA GGATTTTGGG AAGTGTAGGC AGAGGGGAAT TGTCATATGA TCCCAGGAAG

13681	AGACATAGAT GGGCAGAAAC TGCCTCTAGC TAACCATAGA GGTCTGGAGG TCTGTACAGA

13741	CAGGCAGGAA GTGATGTAGC TGGAAGAATC AGAATATAAG CAGGAACAAA CAGGAAATCG

13801	AGCTCTTCTT CTCTCTCCAC TTCAGAGATG CTGAACAGTT GAGATGCAGG ATGCCAGAAG

13861	AGTAAGAGGT CCCTGGACCT TTCTCCAGTA AGATAAGACC ATGTGGAAAT AGATTGATAG

13921	AAATGGGTTA GAGATTAAGT CAGAGCTAGC CAATAAGAAG CCGTAGATAT TGGCCAACCG

13981	TTTCATAATT AATATAGCAT CTGTGTATTT ATTTGGGGGA CCTGGTAGAC CAGAAAACTC

14041	GTGTTAGAGA CATCTTATCA AAGTTGAAAA AAGAAAAAAT GTGATAAAGT TAGGAAAAAA

14101	TATAGTAAAT GTTAAAAGCT AAATTCTAAA ACTACAACTT ATTTATCATT TCCTAAATGT

14161	TTAAAAATAT TATTTTATAA TGAAGATACT TAAAATTCAT TTCTCTGTCT TTTGAGACAG

14221	GGTCTCAGTG TCCTGGAACT CATTATATAC AGCAGGCTGG CTTGGAACTC ACAGAGATCC

14281	ACCTGCCTCT GTCTCCTAAA TGCTGGGATT AAAGGTGTGT GCCACCAAGC CTCAATTAAA

14341	ATGCGTTTCT TTTTCTTTCT TTCTTCCTGT CTTTCATTTT TTTGTTTGTT TAGATTTTTT

14401	TTTTTAGACA GGGTTTCTCT GTTAGCATTA GTTGTACTGG AACTCACTCT GTAGACCAGG

14461	CTGGCCATGA ACTGAGAGAT CTGCCTGCCT CTGCCTTCTG AGTGCTAGGA TTAAAGGCAT

14521	GCACCACCAC TGCCAGGCTT AAAATGTATT TCTTTTTTTA ATTTAGAAAT TTATTCTGTT

14581	TAATCCACAC GCTTTATATA GCTTTAGTTA AGAAATAAAA TAAAATGAAA CAGTGAAACC

14641	AAGAGACTAT GTCCAAGTCC AGGTCCTCCC AGCCTGCCAA TGCCAAGAGC TCTTTAGTTC

14701	TGTGTACCAA TTGGAAGAGT AAGAAAAAAA TATGGATGGG AACCACACAG TTTCATAAAA

14761	CAGATTTATG GAACTGAAGG GTCCTTGCTG AGTCTAGCAA ATTGCCTTTA CAAAAGAGAA

14821	AGAAAAAAGG GGGAGGTAGA AAAACAAAAC AAATCAACCC AAAGAGGACA AAATCCCAGA

14881	GTTCTAAATT GACTTAGGAA CCTGTCACAC TGGGACAGAA GCTTCAGCAT CCATGAGCTG

14941	TGCCTCCCCT GCTCTCTAGA GCTGGGATCT CGAGGTGTCA GCAGAGACCC CACAGGTAAC

15001	AGGAGCAAAA ACACTCACTC AGACCTTTGT GGTACTTCAA CAGTGGTCTC ACTTCTGGGC

15061	AAGCTTACAA ACCTATACAA AGTTGAAGGT GTACTTTACA TGAGTGCTAA ACTTCAAGAG

15121	GAAGGAAGAA AAAAAGGGAG GTGGAGGGGA CAGAGAGAGA GAGAAAAAAA CAAAACAAAA

15181	CAAAAACAAC CACCTCAGGA GAGGCAAGGG CATTTAAAGG AACCACAAGA ATGCCAACGA

15241	TATTAAAATG TATTTCTTAA TAGTAAATTT TATGGGAAAA GAGAGTCTCC TCTTCCTCCA

15301	AGTAGGCTAG GTAAGTACCT TGCCACTGAG CTCTATCTAT ACCCTTCAAA GTGGACAAAA

15361	TGACAAAGAT AGTTCATCTC CCCCAAAGGC CCTGTTGGGG TGCTGATTGT CACATCTGGT

15421	GAGATTTCTG TTTTTGTTTT TATTTCAAGA CAGGGCCTCT CTACATAGAT AGTCCTGGCT

15481	GCCCTGGAAC TCACTCTGTA GACCAGGCTG GCCTGGAACT CATAGACCCA CTTGCTTCTG

15541	TCTCCCAAGT GCTGGTGCTA AAGGTGTGCA CTGCCACTCT TTTTAAGTAA CTATGAGTTT

15601	CAAAACAAAT TAAAGAGCAC TGTTAAAGTG GCTTGTTGTG TAAGCCTAGC TTCAAGTCAA

15661	AGGCCCGAGG CTCCCCTACC AACCAGCTGC TATCACCTAG ACACTGTCTG TAGATCTTGC

15721	ACTGACTCAA AACTGTGGCC TAAGGTCAAA ATAATGGTCT TCCTGGATTC TGATGTGAGT

15781	GAGATTGTGT AGGAGGGCTG GCCGCTGGCC TGGCTTGAGT CACTCTCAGC TGGTTTCATC

15841	CCATTCCTGC AACTCTGTGT AAGAGGTGGA TGATCCTTGC TTAACTGATG AAGAAACCAA

15901	AGCTGTAGAA AGGATCATTT GCTTAACTCT TCACAGATGG CAAGAGGCAG AGTCAGGATT

15961	GGCAGAGTCA CTTCTGCCAA CTTCACCCTC CTGCTAACTC CACCCTCCTG CTAACTCCAC

16021	CCTCTTGCTT ATACTTGACA GTGGAGGAAA AGCCACTGAG GGAATTAAAA GTTGTTACTG

16081	GTAATGGTCA GGAAAAAAGC TGAACAAAGG AGATTAGATT CAGGGATCTT TTTCTGAAAA

16141	GAAAGAAAGA AAGGGGGACT ATAGTCTAGA AATGCTGAGA TAAAAGGGTG GATTATCATA

16201	TCTACTCTCA AACTAAAGAA GCAACTACTA GTCTCAAATA CTTTATATTG GTATGGATTT

16261	TTGTGTATTG GTACAAATTT AAGGTTATTT TTGTTATACT GTATATATGT TTTTCTTTCT

16321	TGTTTAAGGT ATTGTACCTG TATAGCTTAT TTAAAAATGC AATGTAAACA TATAGTCCTT

16381	GAAAACTATT TAAGATAATA AAGAAATACA GGTTAATAGT CATCTATAGC AATCAAACTT

16441	ATAGTCATGT TAGGTATGTT TTCAAGGGCA TACAGAAATA AATTTGAGAT AGATAGGTCA

16501	TCTTCAAACA CTCCAGAGAT CTACAGAAAA TGGCATTTAT AAAATGTTTT AATGACATAA

16561	GATTTTTCAT GATAGTGAGA AATGTCTACT CTTGGCAGCA CCAATTTACT TCAAAAATGG

16621	ACAATGGGCA TTGAAGAAAC TCCATGTGGA TTTTGCTTTC TTTGTGGCAA AAATCTAGCT

16681	ATCTGGGCAA GAAACTTCCC TTACCTTGAC TGCTGTCCTA ACTGGACAAG CAGGACATAA

16741	AAGAAATTGA CTGCTGAACT TTGCCAAGAT AGTATACATT AGTCTTTCAA AAATCCCTGC

16801	TTTACAAAAA AGTCTATCAG ATATTCTAAG CTTCTAGGCC AAAGATGGAT GCTTCAATGT

16861	TAACAGAGGA ATCTTCTGTG ACTGATGTTT CTGTCATTTC TATAGTTTTG AAAATTGCTT

16921	GCTCTGTTCT TCCCTGTTTG CTCAGGTAGT ATTATTTCCT TCTTGAGTGT CTAATGGAGT

16981	TAAAGACTAG ATAGTTATAG CTACAGTTTT CCTTGTAACC AAATTCAGAA AAGAAACTCC

17041	CAAAAGAGGT GTAAAAGTAT GAGGCTGAGA AATATAAAAA CTTAAATTTA TCTAAGAAAA

17101	TGTTTTGTTA TCTAAAAAAA AATAATTTTG GGTTAGTAAT ACAAGTTAGG ATAGAAAATG

17161	AATTAGGTAC AAAACTTTGG ACTCATCAAG AAAAAATAGA TAATGGAGTA TTTTCTCTGA

17221	ATTTGCCAAA TACAAATAGA CTGGGTATTG TAAATGTAAT TCTTACTTGA TAATTGTTCT

17281	TATTGTTTAT AGTTTATTAT GTTAGAGTCA AAACCTTTCT TTTTTATTTA GACAAAAAGG

17341	GGGAATGTAG AATATTTCTT TACACTGTGT GAAGATGTAT CACTGTGATT GGTTTAATAA

17401	AGAGCTGAAT AGCCAATAGT TAGGCAGGAA GAGGTTAGGT GAGACTTCTG GGAACAGAAG

17461	TCTCAGGGAA GGAAACAGGC TAGGTCACCA GCTAAATGAA GAGGAAATAG GACACTCAGG

17521	AGGAGAGGTA ACAGCCACAA GCCAAGTGGT GGAATATAGA TGAATGGAAA TGGGTTAATT

17581	TAAGTCATAG GAGCTAGTTA GAAACAAGCC TGAGCTAAAG CTGAGCTGTC ATAACTAAAA

17641	GTGGAGCTTT CATAATTAGT AAGTCTCTGT GTCATGATTT GGGGGCTGAC GGCCCAAAAA

17701	AGCCTGCTAC CCAAGTTCTT TTCAATTTTC AAGTTCTAGG ATTCTGGCCT TTTATTGGAA

17761	AACACTGTCA AGTTTCTATA GAGGTCTGAC TCCACAGTGT TGCCTGTGCA ATGAAATTTA

17821	TTTAATTTAT TCCGAGGCCT TGTGCACTCT GGATAATCAC TGTACCACTT AATCTATATT

17881	CCCATCCTTC ATTATAATTT AAAATGGTCT TATTAATCTG GTCACTTGGC TTTTTTTTTT

17941	TTTTTTTTCT GAGACAGGAT TTCTCTGTGT AGCCTTGGCC ATCCTAGAAC TTGCTCTGTA

18001	GACCAGCCTG GCCTGGAACT CACAGAGATC CACCTGCCTC CCCTCCAGAG TTCTGGGATT

18061	AAAGGCGTGT GCCACCACCT CCCAGTGAGT TTATGTCTTT GCAAATTATA CATGGTTTCA

18121	GTTTTTTTTT CTGTTTGTAA GTCACTTTAT TTCAAATGTA AAGTTTAAAA CAAGAAGCAA

18181	ATTACTATGA ATTTTTGTTA ACAGTCATTT TCCTTAACTA ATAAGTTTTA AATTTTCATT

18241	AATATGTTTT GATCATATTT TTTCCATGCC CCAACACCTC CAAAATCTCC CCACTCATTC

18301	AGTTCTTTCT CTATCTCAAA AAATGAAAAA TCCAAGCAAA CAACCATTAG ACAAAAAATA

18361	ACAAAACAAA ACAAAGCAAA GCAAAATAAA AGCACACGGG CTGGAGAGAT GGCTCAGAGG

18421	TTAAGAGCAC CGACTGCTCT TCCAGAGGTC CTGAGTTCAA TTCCCAGCAA CCACATGGTG

18481	GCTCACAACC ATCTGTAATG AGATCTGGTG CCCTCTTCTG GTGTACAGAT ATACATGGAA

18541	GCAGAATGTT GTATACATAA TAAATAAATA AAATCTAAAA AAAAAAAAGA AAAAAGCACA

18601	CAAAAAACCC AGAGAGTGTG TATTGAGTTG GTTAACCCCT ACTCCTCTGG AGTGTGATTG

18661	ATACAGCCAG TGCCGCTATT GGAGAACACT GATTGTCCCT GTCCTTACAG GTATCAATTG

18721	TGTGTAGCTC CTTGGTTAGG AATGGGGCTT TGTGTGCACT TCCCCTTTCA GCTTTGTAAA

18781	GGGTGTCCGA TTGAAGTTCG TATCTTCTGG GAGAGCATAA AATCAAAAAA AGATAAATGG

18841	ACTCCAGTGA AAAAGGAGCA AGCGGCACCT ATCTTTAAGG TAGAGAGGCA GAGGAGTGTG

18901	GTGTGGCCTG TCACAAACAC CCAATTCCCA ATCAGCTGGC GTCTACCAGG CTGCTTTCAC

18961	TTAGATGAAC CCTGACCTCC ATGTCTCCTT AACATTGCCA TTGTTTAACT GTTAGTGAGT

19021	CTGCCCTCTG TTCACTGAAA GACTTTCAGA AGGTGGTGTC GCCTGCCTTT AATCCTAGCA

19081	CTCGGGAGTC AGAAGCAGGT AGATAGAGCT CTGTGAGTTT GAGGCCAGGC TGGTCTGCAG

19141	AGTTCCAGGA CAGGCTACAG AGTGAAACCC AGTCTCACAA ACACCGCCTC CACCACAAAA

19201	AAAAAAGGAA ACAAGATAGA GTGAACAAAC CCAGCTACCT AGACATCTAT CTGGTAAACT

19261	GACTCATCCC AATCCTCCCT GCCCTCCCAA AGAGCTTGGC TGGCTCACTT CCCCAAATGC

19321	TCTTCCCCTT TAACATTTAA CTAGTTCTTG TCTCTTGTAT GGTTTCCTTT TAACTGTATC

19381	CACCACCCCT ACCTTGACTT TTGTCCTGGT TGGTTTTTAA TTGTAAACTT GACACACAAA

19441	GTCACCTGGG AAAAGGGAAC CTTAATTGAA GAATTGTCTT AGATTGGCCT GTGGGTGTAT

19501	TTATAGGGCA TTGTCTTGAT TGCCAATTGA TTCGGGGTGG GGAGTGGGAG GGTAGGGTGG

19561	GGGTGGGAGC AGCCCACTAT GGGACTCACT TTCCCTAGGC AGATGGCTAT ATTAGAAAGG

19621	TAGCTGAGCC TAAGCCAGCG GGTGAGCCGA GCCAGCAAGT AGCATTCTTC TATGGTTTCT

19681	TTCTTTCTTT TTCTTTTTCT TTTTCTTTTT CTTTTTCTTT TTCTTTTTCT CTTTCTTTTC

19741	TTTTCTTTTT TTTTTTTTCT TCCCGAGACA GGGTTTCTTT GTGTAGCTTT GGAGCCTATC

19801	CTGGCACTCG CTCTGGAGAC CAGGCTGGCC TCAAACTCAC AGAGATCCTC CTGCCTCTGC

19861	CTCCCGAGTG CTGGGATTAA AGGCATGCGT CACCAACGCC CAGCTCTTCT GTGGTTTCTG

19921	CTTCAGATTT CTGCTTTGAG TTCCTGTCTG ACTTCCCTCA ATAATTGTTT GTAACCTAGG

19981	AGTGTAAGAC AAATGAACCC TTTCATCCCC AAGTAGCTAT GGATTTAGAG TGGTTTATCA

20041	CAGCCACAGA GTGAAACCAG AACAACTTTC TAGTAGCCTC TTGTTCTACT CCAGCTGCTC

20101	CTCTGACTAT TCCTAAAAGG TAGTTGGGCT CAGGGAACCA CATCCCGAGA GATTCAGCCC

20161	ATATGAAAAT AGCTCCATTG TGTTGAAGAA ATGTGACCCT CCAGGATTTC AGGCATCAGG

20221	ATTCCATGTT GAAAATGAAA ACAATTATTT TCCTCTCTCT CAAGATTCCT TTAGTCACCT

20281	TCCCTTACCC CAGTTCCTGG CTTTCCTTCT AAACAAATGT TCAGGGAGGT TCAAACAAAC

20341	AGCTGTGAAG AGCAGCATCC CATACCCCCA CCTTCCGACC CAACACTTGC CAGTGCTATA

20401	AGTAGACTGG GATCATCCCT GGACACTGTG TTAAATTACC CATGACCAAC CTTCTAGCAA

20461	GCTCTCCTTT TCAGGATTTT GTTGTTTGTT TGGGTTTGTT TGTTTGTGAC TTGATCTCAT

20521	GTAAGCTGAC CTGGAATTTG CTTAATAGCC AAGGATAGAC TTACAACCTG TGATGCTCCA

20581	GCCTCTGACT CCTGAGTACC AGGGATTACA CATGTGTGGC ATCACAATGA AAGATTTTAG

20641	TTTGCTGAGA GAAAAAGTTT TTAAAGATTT TAGTTCACAG AGAGAATAAG TTTCCCACAG

20701	GCCTTGGTCC AGGACAAGGA AGTTGGTCCC AACCCGAGGG CAGACAAACA ATCCTTTTTG

20761	GGTCACACCT GGCTGGCCAA CAGACAATAA AGGACTTCTC AGGGTACATT CTATGGTTGA

20821	CCACTCTAAC ATGAGATCAT ACTTTGTAAT CAATCACTTT GTGCCCCTTG CCTGTATGCT

20881	GATCTGCGGT TTTTTACAGG CTCCTATATA AGGAGTCTGT AACCCTTGCT GGGGTGTGCA

20941	GCTTCCCCGA TATTGCTGAC ACCCGAATGA GCATTCGTTC AATAAACCCT CTTGCTTTTG

21001	CAGCTCTTGG TCTGGTTTCT GAGTCTTGGG GCCTCCTTGG GATCCTGAGA CCCTTAAGGG

21061	TCTGGGGGTC TTTCAACACT TAACTTTCCT GTTTTTAAGT AGGAAGATCT GAAATCCCAG

21121	ATTCCTGACT CCATTGCACA TTTTCTGTAT TAGAGGCTGT AGCTCTGTAT AGTGGGTTGT

21181	GTGGCTTACA CATGCTCTGA GCTGGAGATT CTAGGGACAC TTAGGGTAAA GTGGAGTGTC

21241	AGCCCCTTTC CCTGCTAGAC TGAGGCCTTT CTGTTCTTTC CTAACTGGGA GGCTGTATAG

21301	CACCCAATGT GTTCATTAAA CTCCATATGT TAGCACTGCA TGGAATCTGA CACACACACA

21361	CACACACACA CACCCTCTAC CACCACCATC ATCAGCACCA CCCCCATCAG CACCACCCTC

21421	ATCCCCCCAC CCCCCACCCT GCCCCNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNC

21481	AACTGGAGGG TAGCATTAGC ACCCAGATGC CATTAATGTG CCAAATATTT GCTTGCTTGC

21541	TTGCTTGTTT GTTCCAGCAT CCTTAGTGAA TGCTCCTGCC CTCCTGGTTA AAGATGGCTT

21601	TGGCATCTCT TGGCATCTTT CTTGTATTCT AGGCCTGAAA TAGGGATGAA TGGTGAAGGG

21661	CAAGGAGCTC AAGTGTCACT TACCACCTGC ACTTGTCCCT TTAAGGGGTT TCCCTAGAAG

21721	CAGTCTACAT TTCATTAGCC AGAGCTTTGT CACCTGGCTA CTTGTGAAGG AGGTGGTGAA

21781	GAAGCCTTAC CTTTGACTCT GCCACTTGGA GCCAAGTCAG GATTCTCTCC CTGGAAAGGA

21841	AATGGAAGAT TAATACCTTG TTGGTTGTTA GACCTAGCCC ATTATGCGCC ATGAGGAAAG

21901	AGAGACAACA GTGGGTCACT GATTGATCAG GGTTACAGGA CAAGGAGCCT TGTTTCTCCT

21961	AACAGCTCTG AGCGGAGACA GAAGTGGAGT ATATAGGCAT AAAATTCACA AACATTTGCT

22021	GCCACGTTAC AGGTACATTT TTTCACCAGT CAGAAATCAA AGATTAGGGA CTTTGCTTGT

22081	GTGTTCCATC ACTGTCAACT GACATACACG GCAAGCCTTT TAGTCCAACC AATCAGAATC

22141	ATTTGTTCCT TCTGTTGTTA GGAGCAGCCA TAATGATTCT AAAGAACTAA CAATGCATAA

22201	TGACTATTTT TGTAGTTTAG GGATGAGGTA TGTCAGCCAT TGGACAGTTC TCAGCTCCCC

22261	TAGGGCTTGG GAACTTGAAC TTTATTTCAT CCTGCATGTA ATGGAGTCTG AAGTCAAAAT

22321	GGCAGTACTT AGGTCAAGGT GCTCGTGCCT GCTGCCTTCA AGGTGGTTTC CCATTCCCAC

22381	CATACCAGAG ACTTCCTACT GCATCTCCAG TCAAGGACAC AAACACTTTT AAGTCCTGAC

22441	TGTTGATTCA ATCTATATAG TTACCAGCAT AGAGGCTAAG AGTCACACTG GCTTGCAGGG

22501	GACTTCTCTA GCATATGTGA AGCCCCGTTT GAATCCTAAA CACAAGAGTC TAAGCTTTGG

22561	AGTCAGAGAC AAGCATGTTC AAATCTGTAC GTCACCACCC TATAGACATA GACAAGTCCC

22621	TTGGGCTCAG TTTTTTCACT ACAGAGAGTA ATTGTTATTT CAGATTCCTA GGGTTGTGGT

22681	AATTAAATAG TTGAAAGATA TAGCCCATGG AACATAAAAA AAACTCAAAA CCAGGCACAG

22741	TGGCACATGT CTTTAATTTC AGCACTCAAG AGACAGAGGC AAGTGGATCT CTGTGAGTTT

22801	GAGGCCAGGC TGGTCTATAT AGAGAGTTCC AGGTCTACAC AGAGAAACAG GCTCAAAACC

22861	AAAGCAAAAG CAAAACCTCA ACTAATGTTC ATAAAATTAT GAAATTGCTG GTACCAGTGA

22921	CATGACTCAT TGGTAAAGAC ACTTGCTAGC AAGTTTAATG ATCTGAGTTT TATCTCCGGG

22981	ATCTACAATG TAGAAGAAGA AAAACAACTC TCAAGAGTTG TCCTCTGATT TCCACTTATG

23041	CAAAATAGCA TGGGAACACA CTTAAGCAGG TAGGTAGGTA GGTAGATAGA TAGATAGATA

23101	GATAGATAGA TAGATAGATA ATAGACATAA TTAAGAACGT TCAGTTGCAG CACAGTTCAT

23161	ACTGAACTGC ATTTGGACAC CTCTGTGAAA AGTCAGGAGC TCTCCTGTCC TCCTGGTGAC

23221	ATTTAAACAT TGAAGGCAAC TATTTTAACT GTCAGTTATA TACAAATCCA CTGGCCTTGT

23281	AAAATTTTAA AACATAACAG AGGAGGCTAA AGTCCTGTTT AACAACCCTC TCCTTTTACC

23341	ATCCCAGGAA GCCAAAATTG TTCACAATTT GTTCTCTTCC CTCAGGCCTT CCATATTTCA

23401	AATACCACAT AAAACACCTA TGGAAAAACA TGAGGTATTA AAAATGTCAC TTGGAAATCC

23461	TTCTTCAAAC AAGCTTGTTC TTTCTTTTTT CTTTTATGTA CAGTGAATGG AATCCAGGAC

23521	CTTTGCAGAT GCTAGGCGAG TCCTTTACCT CATTCCTCTT TCGATTTAAA ACTTTTTCTT

23581	GTTTTGTGGA GACAGGGTTT CTCTGTGTAG CCATAGATGT CCTAGAACTA GCTCTGTAGA

23641	CTAGGCTGGT CTCAAATTCA GAAGCCAGTC TGCCTCTGCC TCGGGAGCGC TAGGATTAAA

23701	GGTGTGGGCA GAGTGCTAGG ATGAAAGGTA TGCACACCAC CACTCCTGGT TGATTTTAAA

23761	AAGATGCTTT TTAAAAAAAA TGATGTGTAG GTAGTGGGGG GAGAGACGGT TTCATGCCTA

23821	AGAGCACTGA CAGCTCTTCT AGAGGACTCA GGTTCAATTC CCAGCACCCA CATGGCAGCT

23881	CATAACCATC TGTAACCCCG GTCCCAGGGA ATCCAACACC CTCTTCTGGT CTCTGTGAAT

23941	GACAGATATG CATGGGATAT ACAAACATAT ACGCAGACAA AACACTGTAT ACATTAAATA

24001	AGTACAAATT TAAAATATGT GTAGGCATGT ATGTCTGCAT GTGGGTATGT GTACACTGAA

24061	TGCAAGTTCA CTTGGAGGCC AGAGATATAT AGATCCCCTG GAGTTGCAGT TACAGATACT

24121	TGCGAGCTGC TGTGAGTGTG CTGGGAACCA AATCCTCTGG AACAGCAGCA AGTGCTCTCA

24181	CCTGCTGAGC CATTTCTTCA CCCGCTTCTT TCTACTTTTT ATTTTGAGAC AAGGTCTTAC

24241	TAAGTTATAT ATTCACTTGG GGCTTGAATT CATTTTGTCA GCAGGCAGAC CATAAACTTG

24301	CCTTCCTCTT GCCTCGGGCT CCTGAGTAGC TGAGACTTCA CCATGAGGTC TGGCTTTGAT

24361	TACATTTTTC TTTGTTTTCT TTTTGGGGGT GGGGCTGATC ATGAACTCTA AATAGCCAAG

24421	GATTGATAGT GAAGTCCAGA TTCCCCCACC TATCACCGGG TGGAATTACA GGTGTGCACT

24481	ACCACACCCA ATTTGGTTTG ATTTTTTTTT TTTTTTTCAG GACAAGCTCT CCTTTTATAG

24541	CTCTGACTGG GTTGGAATTT ACTATGTAGA CTAGGCTAGT GTCAAAATCA CAGAGATCTT

24601	CCTGTCCCTG CTTCCTGAGT ACTGGGATTA AAGGCATGTA CCACCACACC TTCGGGTGTG

24661	GTGATGCACA GCTTTAATCC CAGCACTCAG GCAGGCGAAT CTCTCTGAGT TTGAGGCTAG

24721	CCTAGTCTTC AGAGTGAGTT CCAGAACAGC CAAGGCTACA CAGAGACACT TTGTTTCGAA

24781	AAACAAACAA AAACAAAAGA GGCTAGCCTG AAACTCCTGA TTCTACCAGC ACCTCCCAAG

24841	GGCTGGGATG ACAGGTTGTG GCCCCATGCT CTCTGCCGGG GCCTCTCTTT TCTTTCTTCT

24901	GTTTGAGGTA GAGGCTTACT AGGTTGGCTG GGTGAGTTGT GAACTCACTC TGCAGCCCAC

24961	ACAGGAACTG ATCTTGTGAT CCTCCTGCCT CAGTCTCCCT AGCAGCTAGG ATTGCAGGCC

25021	TGCACCATCA GGCCCATCGT ACACTGTTTT CTGAGTTTGA AAATTGCCTC TGTTGTTGAC

25081	TATAAGGCAT GCTCTCCTCC TAACATTGTC CTTGGTGCCT CTGCCACCCT TTGGGACTAG

25141	AGAGAACAGA TCTTATTCCT ATTTCACATG CTGTGCCAAC CCAGTAACAA ACTCAGATTC

25201	CTGCTTCCGC CCCCACCACC CCCATCTAAT TGTTCAGTGT TTCTGTGAAG ATAAACACGA

25261	TCATCTTTGT GAAAGCCACT TAAGTTCCTT TCAAGGTTGG GATATAAGTT AGAGTGATAG

25321	CTTGTTCCCA GGGTGGGGAG AGCATGTGAA TTCCCCTCTC GCTCAAGTAG GCTATACTAA

25381	TTTTCATTTA GATATTTCTG AGGCAAAGTC TCATGCTGGC CATCCACCTG CCTTAGCTTC

25441	TCAAGTGCTT GGATTACAGG CATGAGCTAC AATATCTGGC TTAGTTTCAA GGTTGTGAAA

25501	ATTATACTGT GTTCTGATGA CCTGAGTTCA ATTCCCTGGA CCTGGGTGAT GGACGGAGAG

25561	GACAGACCCC TGCAGATTGT CCTTTGACCT CCCTGTCACT ATGTGAACAC TCGTGTACAC

25621	ACACACACAC ACACACACAC ACACACACTA AATGAATGTA ATAAAATATA AAAAGGTGTT

25681	CACTAGTTAA TAAGACATGA GAGAAAAAGC TTACCATCCC TAATCAATGG GGAAGCATTG

25741	AATATAAGTG ACTGTGGTCA TGGAAAGCAG TATAGAGGTT CCTCAATAAA CTGGAATATA

25801	GCAGCATATA CTTGTAAGCC TCCCACAACA GGAGAAAGGT AAAGAGGGGC GGCCACTCTG

25861	GAATATTATT AATATCCTGT TTCATAAACA AGTAAATAGA ACAAACCCCT CAACAACAAG

25921	AACCGGTGTG CTGGCACACA CCTGCAATCC CAGCATTTGG GACTTGGAGG CAGCACAATT

25981	GAAGTTCGTT CTTGGTCATC CTCAGCTATG TATGAAATCT GAAGCCTGCC TGGCCTACAG

26041	GAGACCCTGT CTCAAAAAAA TAAACTAAAT AGATTAAAAT GAAAATTAGA AGCAGGTAGT

26101	GTGGAAGTTG AATAAGAATA GCCGCCATGG GCTCATGTAT TTGAATGTTT AGTGGCACAA

26161	CTTGAGTGAG TTAGGAGGTG TGGCCTGTTG GAGTTGTGTG TCACTGGGAG TGAGCTTTGG

26221	GATTTTAGAA GCCCAAGCCA GGCCCAGGGA CTTGCTCTCT TCCTGCGATC TGAGGAACTG

26281	GATGTAGAAC GCTTAGCTAC TTCTTCAGCA CCATGTCTGC CTGCATGCTG CCATGTTCCC

26341	TGTCAAAATG ATAATGGACT GACCCTCTGA AACTTGGTCT CTTTTGGCTG AGGAGTTAGC

26401	AAGGTAAGAG GTGGCTGTGG CTTGCTCTTG TTTCTCTCTC TCTGATCTTT CATCATTTTC

26461	TCCCGTATCT GGCTGTGGGT TTTTATTATT AAGAGTAATT AGAACTCATG TTACAGTGGT

26521	ACATGCATGC CACAGACCCA GTGTGGATGC CAGAGGACAA CATGTGTAAA TTTTTTCTTT

26581	CCTTGTATGT GCGTCCAGGC TAGTTTCAGA CTTGTGGGCT TCTGCTTCAG CCTCCCAAAG

26641	GTGGGGACCA CAGGCTTATA TACCTACACT CACCTCTTTA TTCCCAGTGG ATGTGTGTGT

26701	GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGTTTGTGT GTTTTACACA GACCTGTACC

26761	ACATTCATTT GGTTACTTTT TTTTCCTGCA TTTTGTTTTT AGGTAGGGTC TCACTATGTA

26821	ACCCTGACTG TCCTGGAACA TGCTATTTAG ATTAGACTGA CCTGCTGGTC CCTACCTTCC

26881	GAGTGCTGGG ATTAAAGGTG TGTACTACCA TACCTGGTGA TTAGTTTGTC TTTTGAGACT

26941	GGGTCTCTTG TAGCCCAGGT TGGTCTTGAA CTCCTGGTTT TCCAGACTCT ACCTTCCAAA

27001	TATTGATATT GCAGGTGGTC ACTACCATGT GTGGAATTTA TTTTTGAGCA GTGTTCTGTG

27061	GGTGGATGAT AAGGTCATGT CTATGGTAAA ATTGTTTCTA ATAATGATGA ATAGCTTCAT

27121	GTGTGTATGC ATCTATCAGG TTTGTTCAAC CTGAAGTGTA GGCCTAATAT TTGGATTTAT

27181	TTAGCCAGTG ATAGCTATGA ATTGAGCCCA GAAAAAATCA TAAACTTGAC TAAAACATCT

27241	TAAGAATTTT GTAACTTCTT TTGTAACTCA ACTGTATTGT TTCTGAGCAT GAATGTTGTA

27301	AATGACAATG TCAGCTGCCA TGTCAAAAGG TTGAACATTA CTTGGCAGTG GTGGCACACA

27361	CCTTTAACTC CAACACTCAG GAGGCAGAGG CAGGCAGATC TCTGAGTTAG AGGCCAGCCT

27421	GGTCCACATA GGGAGTTCCA CACCAGCTAA GGTGACAGAG TGAGACCTTG TCTAATTTTT

27481	TTTTAAGGTT GGACATGTAT AATTCCAGAG AATAATTTTT CACTAATCGG AAAAGAGGCA

27541	GTTTCAACTT GGAGTTCACA AGATTTAATC TTTCTTTGAA GATTTATTTA TTTTTAGTTA

27601	TGTGTGTGTA TATATGTATG TATGTATGTA TGTATTGGTG TGTTAAACCC CTGGGGCTGG

27661	AATTACAGGT GGTTGTGAAC CTGATGTTGT AATAAGCTCC CAGACCGTAG CACAAATGAC

27721	TCTATGAAGA AAGTACCATT CAGGCTGTAA AATCCACATA GACAGCACCA CCTGGAAAAA

27781	CTAAAACAAA AATCCAATCC ATCAAACTCC ACAGATCTGG GAAAGTATCT AAATGCACTA

27841	ACCTTGATTT TTGGCTTCTG TAGTTCTGCT TCTGGCTAAC TATTCTTGTT AACTGAAGTA

27901	TGTGAACCCA CAACATGGTT TTTGTGCTTA AAAGTTCTCT GTTCTACAGA ATGAATTCCA

27961	GGACAGCCAG AGCTGCATGG AGAAAATCTG CCTCAAAACA AAACAAACAA ATAAAAACCT

28021	TGAGAAAGGC TCAGGGCTAT ACTGGTATCC CATACACTCA GTGTAGTCGC CAACTGTCAA

28081	AGACTTTTTG TTGACTTAAA CCCATTTCTA AGCAGTATTC TCTTATGGAT ACCCCTTACA

28141	AGTGGGTGCT GGGACTTGAA CTCAGGTCCT CTGGAAAAGC AGAGGATTTC TCACCTGCTG

28201	AGCACCTCTC CAGGCCCATA AGATCTATCT TAAGACAAGA CCTGAGCAGC CTTATGGAGA

28261	TGGCAGTCTG GGGAACCACT GGTGCGCCTT TTCTTCTGCT GGTCACAAAC TGCTGTGGGA

28321	ATTTCCATCT GAAGTTCCTG CCTCTTCTCA CATTCCATGA TATGAGAAAG CTATCAATGT

28381	TCTAAATCTG TTTGCTTTCT GCTTTGCAAG ACCTTTCTCT TTCCTAGGTC ACCCTCCAAG

28441	AGTTCTTGAC CTCAGCCCCG ACTGGTGTCT TGGGATGGGT GACTGGGTTC TGGGGGCTTC

28501	CCTGTGCCTT GGAATATGGT AAAAGAGCAT CTCAGGTATT CACTCAGTAG ATGCTAGTAG

28561	CACTCCCTCC CTCCATTTCT GTCTACAGAT GTTGCTAGCT GGCCCCTATG AGGTAGTCTT

28621	TGCCCCTTTG TTATTGCTGC AGACTCAGAA AAAAGAGGAA ATATAGAACT CCTCGTGGTC

28681	TTCTACTCAA TATCCAAGCA AGGGGGAACA ACTGAGCATC CATACACTGC TGTTTTGGCT

28741	TCTCAATTGC TTGCTTGTAC ATCACCAAGA AGCTTTCATT GGTCAGTGTA AACAAGATCT

28801	GGGAGTTGAT GGTAGAGCAG TTGGATGAGT GACTCTGTCT TTCACCTTTG TTGAGTCATT

28861	TGGTGTGTGC ACATTGTGGG TCCCTGCCTC GCTTCCCATT AAATGTCAAG GTGAACTTTA

28921	TGAGGTTGAA ACTTTTATAT GTAGTGCAAC TGTACTCCTT CCTCTCTATC TCTTCCTTCA

28981	TTTTTCTTCC TTCACCTTCT CTTCCTTTAA AAAAAGAAAA ACTTTAAAAA ATGTGAATCT

29041	GATGTATCCC AGGATGGCCT CAAACTGTTT GCTTTCTCAG AAGATGACCT TGAACTTTCA

29101	ATCCTCCTGC CTCCACCTCC CAAATGCTGG GCTTACAGGA ATTCATCACC ATGCCTGGTT

29161	TTCCTCTCTC CTGGTGAGTG AATCCAGGGC TTCATGCTTG CCAGGCAAGT GTTCTGCTGA

29221	CTGAGTTACA TGCTTAGCCT GTATCCACAT CTTGACTGAG TAATTTCTGC ACCAAAACTT

29281	TAGGTTTCAT CTCAGTGACT CTGCCAATGT GTTTCCATTT TAGAGTGACG ACTGGCCTTA

29341	GAGGAGAGTG TAAGAGAAAT AGAGTCTCTT TCCTTGGTCT GCTTTTTAAA TTTTAATTTC

29401	TTTTTAGACA TCTTATATTT ATTCATGCAT GTGTGTGTAT AACTAGCAGA ACTCAGCTGT

29461	CTCTTTCTAC CACTCAGGTC ACCAGGCTTG GTGGCAGGGA CTCTTACCTG CCTTCGAGCA

29521	GGCTCTGCCC TCCTTTTGGA GAAACTGGTT TGCAGAAGGA AGAGACAGCA CAGCTCAGAA

29581	GACAGCCGTG CTTTCAGATG CCTGAGAATC CTGCCAAGGA CACTGCTGCA TTCTCCTATT

29641	CTTTTGTAAG GGTCCCATCT CTGCTGAGCT AAACTGGGCT TTCTCAGCCC TTCTCCTCTG

29701	ACAGTATTTT AAAACCCTAC CTAAAGGGGG ATGGAGAGAT GGCTCAGCAA TTAGGAGCAT

29761	ATCCTACTCT TCCGGAGACC CCTACTTCTG TTCCCAGCAC CAATGCTGGT CAATTTACAA

29821	CTGTAACTCT GCTCCAGGTC ATCGGATGCT GCTATCCTCC TCAGGCAACT TCACTCATGT

29881	GCACATACAC ATACTTAAAA ACAAAATAAG TCTTTAAAAA TCACCTAAGA AATATAAAGG

29941	CACATATCAT AATTCAGCCT GCTGTGACGT ATAGCTATAG TCCCAGAATT CTGAAGGCAG

30001	AGGCAAGAGG ATCACCTCAA GCTTGGGGCC AGCGTGGTCT ACAGTGAGAC CCTGGAGACT

30061	TTAATCTCAA AATATGTAAC AAAACAAATA TGTAAATAGA CATATATCAC AATTTATATT

30121	TAAGTAAAAT GGGGGGCATT GGAGAGATAG CTTTGTGGTT AAGAGCATGT ACTGTTCTTG

30181	TCAAGGACCC AAGTTTGATT CCCAGTGTCT ACACTGGTTG GTCTCCAACC CAATTCCAAG

30241	AGATCTGCTG CCTTCTTCTC CTCTCTACTG GAACTGCATT CATGTGCAAA TGTCCATATG

30301	CACACACATA CCCACATGCA TACACACAAA CACATACATA CTCATTTTGC CTGACATCGT

30361	GGTAAAGTGG GAAGACTTGT TGCCCTATTA CTTGGTCTTC ATTTGCCTAT GAGCACCATG

30421	TTGGCATGAA CTCATTCATT AATATCTTTC CTGTACAACT CCCCAATAAC CAAGATGACA

30481	CTTGGCACAC ATTAATTGCT AAGTATAATG AAAATTTAGT TTAAATTAGC TAAATAATTT

30541	AAAGTTCCCC CTCAAGCCTC ATGCCTGATT TAAAGTAGTA CTTATTAATG CTGGGCCTGG

30601	TGGCATACAT TTCTAATTCT AACACTTAGG AGGCTGAGGC AGGAGGATGG CCAATTCAAG

30661	GCCAGCTTAG CCAGCTTAGT AAGACCTTGT CTCCAAGCAA ATTACAGCAA AGTCTGAGAT

30721	ATAGTTCAGT AATTAGGGTG TTTGTCTACC ATGTGTGAAG ACCTGAGTTC AGTTTCTAAC

30781	AACAAAACAA AACTAAACAA ACCAGAACCT AGAGGTTATC ATTTATTTTT TTATTTTTAT

30841	TTTTTTTTGG AGTTTATGCC TTTGGATTAT CCATTCTATG TCCAGACATC AGTACTGCCA

30901	TGTTACAGTC AATAAAAGTC TTCCTTCATC ACCCTTAATC TTATCACCAC TAAAGTCTCT

30961	ACTTGACAGA CATGCCATAC ATAATTATAG CTGTTACCTT CTATCATAAA GTAGACATTT

31021	TATTTTATTT GTGTATTCAT TTTCATTTAT TTTGTTGTTG TTGTTGTTTT ATGAGACAGA

31081	GTTTCTCTGT GCAGCCCTGG TTATCCTGGA ACTCACTCTG CAGACCAGGC TGGCTTCAAA

31141	CACACAGAGA TCCACCTGCC TCTGCCTCCT GAGTGCTAAG ATTAAAGGAG TGTGCTGCCA

31201	TCTTCCCAGC AACATTCTAA ATTATTTTTT GTTTATGTTT TGAAATGGTC TAATGTAGCT

31261	GAGGTGGGCC TCAAGCTTGT TATATAGCTG GGGAACCTTG AACTTGTGTT CTTCCTACCT

31321	CTAGAACTCT GGAGTGCTGG AATTACAGGT ATGAACCATC ACATTCCAGT TTTAATCAAA

31381	TCCAGACTTC ATGGGTACTA GGAAAGCACT CTACAAATTA AACTTCACCC CTAGTTCATA

31441	TATATATATG TGTGTGTGTG TCCATGTATG TATGCCTACA TGATTTTATG TGTGCCACAT

31501	GTGTGCAGGT GCTCTTGGAG GTCAGAGGGT GTCAAATCCC CTGGCACCTG AGTTATAGGT

31561	GGTTGTGAGC CACCTGATGT GGATTCTGGG AACTGAACTT TGGTCCTCTG CAGGAGAAGT

31621	CACTGTTCCT CTGAGTGAAC GTTTCTACTT TTTAATATAC TTCCCATTCG AATTAGAAAG

31681	TAGAAGCTCT CGGAGGTTGA GACCTTACCT AAAGTCACCC AACTAGTAAG AAAACTAAAA

31741	TATCAACTTG GTTTTCTGAG TTTTAAATAT TTTTTCCCAA TGTGTAATTA CACAGGAGAA

31801	TTAATGGGGA CACTTCAAGG TAAAACAGAA GCTTTAGACA TAGCAAGGCA TGGTGGCACA

31861	CATCCCATTG AGAGGCAGGA GGATCAGGAG GCCAGCTTTG GCTGCATACT TAAGAGGCAT

31921	CCAGGGCTAC ATGAGGCGCT ACCTAAAAAA ATTAAATTAG GCAGGGCGTT GGTGGCGCAC

31981	GCCTTTAATC CCAGCACTCG GGAGGCAGAG GCAGGCGGAT CTCTGTGAGT TCAAGGCTAG

32041	CCTGGTCTTC AGAGCGAGTG CCAGGATAGG CTCCAAAGCT ACACAGAGAA ACCCTGTCTT

32101	GAAAAACCAA AAAAGCACTG GTCATTGTCA TTTTCTTTCC TAACAGGGCA CTGGAACCCT

32161	GATGTTGGTT GGCTCCTAGA TTTCTTCTCC ACAGCAGAGA GTTCTTGCCC TGTTAGAGCC

32221	AGAAGGATGC TCTGGAGAGT CAGTATATAG CAAAGCAGGG TCATCTGGAG TAGTAAAAAC

32281	CCTCTGGCAC AGTCAGACCT CATTTCCTCT TGTCCTGTGC TCGTGGCTCT AGCATTATGC

32341	AAGGAGAGGC GCAAACAGCA AACAATTTGG AAGGGCTAGC ACTTGAGCAA CTCTTTGTAG

32401	CTTCCTCTTC TCTACTCTTT TGCCCCTGGC TTCTACTGGA ACAGGTGACT TTCCATTGCA

32461	TTGCATTCTC CAAACTCAGA TGATTTTGAG AATGTGGCAC TACTAAAAGT CACATGGACA

32521	TACAAGGTAC AACTAGAACT ATCCCGGGAA ACAGTGATAC ACGATCTAGT TTGAGGCCTT

32581	GAGCCATAGC TTGTCAGAAG CTCAGAAATG ATTGAGTCTC TGGGAGCCCT CACCTCAGCA

32641	TCCCTGCTTG CAAAAGGCTT CTTGAAGTAG TAAAAACTGC TGGGACCTTG TCTAGGCTGG

32701	GTAACCTTGC ATAATTACTC AACCTTACTG AGCTCAGTCC CCTCCTCTAT AAAATAAGTG

32761	CAACAGTATT TACCTTAGTG GCCCACCTGA AAACATCACA GCTGCCATAG CTAGCTCTTG

32821	GCTTTTGTTC TATCTCCTCC TCCCCCTACT TTCTCTTCCC TCCCTCCCTC CCTCCCTCAT

32881	TTTTCTTTAT TCCTTTCTTT GTATTTTTTT CTTTTTTCTT CCTCACACCT CTCCTTATTC

32941	CCCACCCTCC TCTCTCTCTC TCCCTTCCCA CTTCTCTTTC TTTCATGGCA GGATATCATG

33001	TATCCTAGCT ATACTTGAAT TCACTATATA GCTGAAGAGG AGCTTCCAGC CCTTTTGCCT

33061	CTGCCTCCCA AGTGCTGAGA TTATAGGTGT CCACCTCCAC GTCTACTTAT GCTTTGCTAA

33121	GGATCAAACC AGGGCTTTGT ATGTGCATGC TAGGCAAGAG CCAACTACAT CGCCAGACCT

33181	ATATAATACC CCTTTCTCAG CGAAACTGGG GTTGCTGATG GCTGGTGTTG GGGGAAGGCA

33241	CTAAATATTT AGCAGAAGTA TAGGAAAACT CTAGAAGTCT AGAGATCCTC AAAGTAAGTT

33301	TGGAGAGCCT TGGCCTTTTC TTAGTTGAAA GTCATGGTGC CTACTCACTT TGACTGCTCA

33361	AGGAATATCC ATTCACCACC TGGAAATAAG AAAGGAGGGA GAACCAGCTA GGGATGTGAC

33421	TTAGTAGTAG AGCACTTGTC TAGCATGAGC GTGGTCCTGG GTTCAAGCTC CAGTACAAAG

33481	GCTGGGTGGG GGGGTGGAGA AAGGCTTCTT TCCCATGGCG TTCTAGAGAT GGCGGGGAGA

33541	AACCACCAAT CCACATCTAT CTACAACAGT TCAAGTAGAA CTAATCTTGG TGGTATGGCT

33601	ATAGTAGTCC TAATCCCATC TCAGGGATGC TTCTCTTTGC AATTGATACA AAACACATTA

33661	CAGAAAACCA CAGTGAATCA AAATGCAGAG TTGTGGTGCC TAGTTCCAAT GGATGCATCT

33721	ACAGTACAAC TCCCATGCCT AAGGCTCAGG GATCATTGTG GAAGACAAAG ATCCTCCCAG

33781	GAGATCAGGG AGTTTGCTGT CTCCTAGGAA TTTCAGAAAA TACATCTGTA AAGGCTCACC

33841	AACGTGAATT CCTAAACATG AGCTGAACAA GGATGACAAT AGACATGCTA ACAAGGATGG

33901	GAAAAAGCCC TTGAAGCCTC AGACCTACAC AAAGAGCCGC AGTTGATTAA GGAATGCTGA

33961	TTGTGGGAGA AACCATCTTC CCAAATTGTT ATCTAATACC ACATAGTCAG CCCTGAAAAC

34021	ACACATGCAA ATAAGATTAT ACAAAACAAG GGGGTTGTAC ATATGTATTT AGGAATATAT

34081	ATATATATAT ATATATATAT ATATATATAT GTAACAATAA TTAATAGAAA AAGAGACCAT

34141	GAATTTGAAA AAGAACAAGG AGGGGTACAT GGAAGGGTTT AGGATGCTTT GACCCTTTAA

34201	TATAGTTTCT TGTGTTGTGG TGACCCCAAT CATAAAATTA TTTTTGTTGC TAGTTCACAA

34261	CTGTAATTTT GCTGCTGTTA TGAATTGTAA AGTAAATACC TATGGTTTTT GATGATCTTA

34321	GGCAATCCCT GTTAAACTGT CATTCAGTCC CCAAAGGGGT CAAGACCCAC AGGTTGAGAA

34381	CTGCTGATTT AGAGAGAGGA AAGGGAAGGG GGGGTGAAAT GCTGTAATTA TAATTCCAAA

34441	AAAAAATTTT TAAAAATTTC TTAAAGGAAC TGAAGAAAAG AGCTGAACAT TCTAAGCTTA

34501	AGGGGGGAAA GGTTCTGGAA TGTTACATTT TTCTGGTTTC CTTAGTCTCA GCAACAGGCT

34561	CCCAGCCTTC TGTTTGGACA GTGGTTTACA GGCATGTGAG CTCAGGGAAC ACTCTTCCAA

34621	GTGAATCAGA CTTCAGGAGA AGACATTCAG TTCAGGGCCC TGGGGAAAGT AAGGACAGAA

34681	CTCCATTCCT GAGAATTACC AGGTTTGCTC AGAAGATAAA ACTGGTGAGC CCAATGGCTG

34741	TGTGCACAAC CCTGACCTCA GTGTCTAGGA TAGCTGGACT CTAGCTGCTA GAAGATAGTC

34801	AGAGGGCCAT CCTTTCCCTG AGGCTAATCT GTGAATCAAG TAAACTACAG TCAGGAAGGG

34861	AGCTGGAGAT GGGGGCCCAG CAAACAGGTC CCCCTTAAAG CCCAGCACAT AGGTGGGGAA

34921	CCCAACCTCC CATTTTGTCT TCACCCCACC ACCAGGCCTT TACCAAGGCC CGAGGTTGCC

34981	ACTATTTTCA GCTTGCCAGG CTCTTTGCAG TTTTAGGGGG ATGAGGAGGA GATGCTCTGA

35041	GGTGCTGGGA GGCACATGGC GGGTGCTATT TATGGCTTGG GCTGAACTCC GATGTCCTAG

35101	AAAGAGTGTT TCTGACACTT TCTGCCTTCT GGGAATCAGG AGACTCATGA CAAACACTGC

35161	CTGGCAGTGT TTCTTTCTTG TTCACAGCAA GAAGTGTGCA GTCCATGGCA CGAAAGAGGC

35221	CTGAGCAGGG CAAGATGGAC ACGATGACAT CACTGAAGGA GCTTCCCAGG GGCTGTCTTG

35281	ACTGCTTCAT TAACTCATTC ATGCAGTTTA TTCAGCAGCT ATGCCTGTCA GACCCCATTC

35341	TGTCTGCACA AGACACATGG CAACAAAGGA GACTTACTAT TCCCATCTTC ATGGGTTTTA

35401	TGTTCTGGCA AGAGGAAGAT AGTAATAATT TTTAAAAAGT AACCAGTCTT GAGAGCATGA

35461	TAAATATGGT TGATAACAAT ATGCTATATT TTAAAAGTTG TGAGATAGTA TACTTTAAGT

35521	GTTCTCAAAA CAAAATGATG AATATGGGTG ATATAACATG TTAATTGGTT TAATTTAGCC

35581	ATGCCTTTGT GAACATACTG TATCGTGTAT CATAATTGTG CATGACTTTA TTTATGAGCT

35641	AAATAAATGA ATGGAAAAAA AAGTAACCAG TCTTGATGCT TACCTGCCAT CCTGGAAGGA

35701	AATGGAAATA GGATCTGCCG CCGCAGCATT GCCCTATGCT CTTATTTCTT CTCTTGAAGA

35761	GGTAGGGGTG TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTTACTAGA GACTGAGCTA

35821	CCGGCCTCAC ACATTCTAGG CAAATGCTCT ACTTTATATT AAACACTTTA TAAAACATTA

35881	AGCCTTTCAG GGTCAGCAAG GTAGCTCAGA GAGTCCGGGC ATTTGCTACC AAGCCTGACA

35941	ACCTGAGTTC GATTGATGAT CCCCCAGACT CACGTGATAG GAGGAAGCTG ACACCTGTGG

36001	GTTGTCCTCT GACTATAGGC ATGCACACAC ATCCCATGAA TAAATATTTA TACATTTTCA

36061	AATCATACTT ATTTTACAAT GATTTTTATT TGTTTGCCTG TCTTTCTGTC TGTGTAGAGA

36121	CAAGGTTTCA TGCAGCTCAG GTTGGCCTCA AACTCACTCT GTGGCAAGGA TGCCTTAACT

36181	TCAGGTCTTC CAGGTCCAGG TAACAAAATG TTCAGGAGGA ACCTGGTACC TCATCATAAC

36241	CGGTTCTAGA TGGTCTTCCC AGGGCTGCTG TAAGAAAGTG CTACACGACG AGTTATTTCA

36301	AACATTCTCA CAGTTCTGGG GATTAGAAGT TTGAAACTAA GGTGCTGAAG AGATTAGTTC

36361	CTTCTGGAAG CTCAGAAGAG CCATCTGGTC CATACTTTTC TCCAGGTTTC TCTTAGTTTT

36421	TGGCAATCCT TGGAATCCCT TGGTTTGTAG ATGCAGCTTC CAAAGCTCAA GATCTCTCTC

36481	CAATGCTGTG TGGCATTTCC CCGTGTTTAT GAGTGTCTAA ATGGCTTTTA AAAACATTTT

36541	TGAGATGTGA AATTCTGGCT GACCCAGAAT ATATAAACCA GGCTGACCTT TGTCTCCCAG

36601	AGATCTCCCT GCCTCTGCTT CCCAAACCTT TTGATTAAAG GTGTGTGTCA AGTGCCCAGA

36661	CCAAATGCCC TTCTTGTAAG GACAACGGTC ATATTGGATT TAGTGTCTAA GTGAGTCCCC

36721	TATGAACTCA TCTCGAACTC AGTTTGCATA GAACACTGTA CCATGCAAAA TAAATGACAC

36781	AGAGACTGAT ATTGGGGTTC ACACTTCAAG CTGAAGGTCA GAAAAGCAAA GCATTGGGCC

36841	ACTAGCTCTT ACCACTACCT CAGGCTGAAC GGGCTGATCC TGCTGCCTCT CCTCAGCATG

36901	GCTGGAGAAT ATCTTCATAT CCTCATTGTG GCTGGAAAAT GAATGCCTGA TATGGAGAAC

36961	TTGCTCCTGT TTTATATAAC TCCCTAATGC TGGGATTAAA GATGTGTGAT CCCAGGTGCT

37021	GAGATCATCT TTGTGTGAGC TGTTTCTCTT TAGGACTGGA TCAATTTTGT GTAGATCTGG

37081	ATGGCTTTGG GCTCACTGAG ATCTATCTAC CTCTTAATCC CTGGTCCTAG GATTAAAGGT

37141	ATGTACCACC ACATCCTAGC TTCTGGCTGC TGGGATTAAA GGTGTATGCC TGGCTTCGAT

37201	GGCTTGTGGC TGACTTTGCT TTCTGAATCC GCAGGCAAGC TTAAAAAAAT CATAAATAAT

37261	ATATCACCAT AGACCACACT TCCAAATAGG CTTCCATTTA GAGGCGCCAG TGGGTGATAA

37321	TGTAGGCGGT TTTACTCAGT TTTGTGCAGA TGGCTGGCGT CCTGTCTGGT GAGTTCAGAT

37381	TTTTTTTTTT TTTTTTTTTA AGTTCAGAAT CTTACCCAGC TCAGCTTTTC AGGCTGCATT

37441	CAGTGTCCGG CTTTTTTCTC ACCGTCTTGA CTTCCTGTCC TGCATCCCAT TTCTCAGCCT

37501	GGACCCTGCC AGTCTATCAG ATAGATAACA TAAACAAAAT TGTACTGGAT TAATGGGAGC

37561	TGTTTGGACA TTTCCTACTT TTGCCTTTTC ACCAATGATT TGCATACTTA AGCCTGCAAC

37621	TACAGCCCCG ATGCAGTAAG CTCAGTCTCT GGCAAGCAAA GGTCTCTCTG GGGTCTTGTT

37681	TAAGAACCAG CTCAGGCTGC TGGCTCTGTT GGCAGTGGAG GTATTTCCTA TAATGGGATG

37741	ATGGGATGGG TTATTCACAC ACATCTCAGT TACTGGGCTA CATGGATCCA AATCAGCCAC

37801	CCAAGGGTTT GCAGTCACAT GTGAGTCACT TAGCACAGAG AAAGAAGCCT GGAGGAGGAG

37861	GGGTCCTCCC AGCTTCAGGA GGGTTTTCCA GGATATAGGC TTCTAGTCTC GTTTTGGATC

37921	AATTTATCAG TTTTGGATTG GGTCTAATAA CTCTTTCCTG AGCCTGGACT GGGCTCAAAG

37981	GCATGAGTAT GTGAGGGGAA TTTACTAGAA TTCACCTGTA GTTTCTGTAT CATTCCTAGA

38041	GAAGGGGAAG TAGAGACACT GGTGATGGGA AATAAAAACA AAACAAAACC TAAATATTGG

38101	GAGCACAGAG GTCCTTGTTC CACAGCTCTT GATAGAAGTC AGGAATGTTA TGTATGTACA

38161	ATTGCCCTTG AAAAGGAAAG GATGTATGAC CTGTTTTTCT GTCCCGAAGG CTGGGAACTG

38221	GGGATGATTA ACAGCCTGTT GATCTGCATT ATCTGAAGGG CTAGGCCATA TCAAGCTCCC

38281	ACAGCTAGCA CTGAAGGAGA ATAGGGCCTT ACAAAGGGAA TTCCCTCTTT GGATCGAACC

38341	TAGGAACATC TTCTGTTTTA CCGCTCTCTC CTTGTTTCAT CTGCAAAGGG AGGAGCTTGG

38401	TAGTGATGTT GAGGCAGGCA CCACTTGTAT TTTTCTAAGC CACAGAGACT GTTTCCCTAC

38461	CTTACAAACA TCCCTGTGCA TCACTGCAGC TCTGTCTCTT ATGGCAGTGT CTCAGTTAGG

38521	GCTTCTATTG CTGCGACTAA ACACCATGAC CAAAAAAGCT CACACTTCCA TACTCCTGTT

38581	CATTATTGAA GAATGTCAGG ACTGGAGCGC AAACAGGGCA GGGTCCTGGA GGCAGGAGCT

38641	GATGCAGAGG TCATGGAGGA AGGCTGCTTA CTGGCTTGCT CTCCATGGCT TGCTCAGCCT

38701	GCTTTCTTAT AGAACCCAGG ACCACCTGCC CAGGGATGAC ACCACCTACA ATGGGCTGGG

38761	CGCTAATATG AGGGATCAAA GAGATGGAGT TGTGGGAGGG ACAGAGGGGG AGAGCAATGA

38821	AAGAGATAAT CTTGATAGAG GGAGCCGTTA TGGGGTTAGG GAGAAACCTG GTGCTAGAGA

38881	AATTCCCAGG AATCCACAAG GAAGACCCCA GCTAAGACTC CTAGCAATAA TGAAGAGGAT

38941	GTCTGAACGG GTCTTCCCCT TTAATCAGAT TAGTGACTAC CCTAATTGTC ATCACAGAAC

39001	CTACATCCAG TAACTGATGG AAGCAGATGC AGTGATCCAC AGCCAAGCAC TGGGCTGAGC

39061	TTCGGGAGTT CAGTTGAAGA GAGAAGGGAT CATGTGAGCA AGGGGGTGGG GGAAGTCAAG

39121	ATCATGATGG GGAAAACCAC AGAGACAGCT GACCCGAGCT AGTGGGAGCT CATGGACTAT

39181	GAAACGCCAG ACGTTGTAGA CTCCCTAAGG AAGGCCTTAC CCCCTCTGAA GAGTGGATGG

39241	GGGGTGGGAA GTGGGGACGC TGGGGGACAG GAGAAAGGGA GGGAGGGGGA ACTGGGTTGG

39301	TTTGTAAAAT GAAAAAATAG ATTTTTTTTA AATAAAAAAA GAAAGTGCTT TACATCTGGA

39361	TTTCATGGAG GCATTTTCTT AACTGAAGCT CCTTCCTCTC TGGCGACTCT AGTTTGTGTC

39421	AAGTTAACAC AGAACCAGCC AGTACAGGCA GCAGAAATAC CTTGCAGAAA TATCTTAGTT

39481	CAGGAGTCCA CGGTGGTCTC AGTCACTTCC TCATGTGCCA CCTGAGTTTA ACATTCCCCA

39541	AAACTTGGAA CACAGGCCAC CACATCATGG AGCCCTGGCT TAAAGCTCAA GTTTTATGGT

39601	ATTTTCTTTT ATCACTGTCT ATAATTCCTA AACATGCTAC AATGTTGTGA GCCCTCACCG

39661	TCTCCTAGGT CCATAGTGAC TTCCTGGCAT TAATAGACTG TGCCCCAAGA GCTCTATGGC

39721	CACGACCACC ACCTGCCATT CCCCTCCCCC TCCATGGTCC CAGCCTCACT TCTTCACTTC

39781	CTGGTCCTTC CGAGCCCAAT GTGCAAACCC ACAGAATCTG TCTGCTTATG TAAGTTTCCT

39841	GGTCACTGAG TGGGGTGACT CAGCACCAAG GTGGTGCCCT GCGATTTCCC AGCCCCAGGC

39901	AGCAGAACAA CTGAAATGGA AAACAAGTCC CGTTAATAGG GTCCAGCTGA GAGCCTCCCT

39961	TTCTCAGGGA GTCTGGCAAA TCTACTCCTC GGGGAACTGC CCTGGGCAGT GGAATTCTCC

40021	AGCTCCCTGC TCATTTCCTA GTTCCTCTTC CCTCTTCTCA CCTTTGGCTG AGGATCAGAA

40081	AGGTTCCCAC TGAGGTCTGC TTTGCCCTGG GCCTGCTCTT TTCAGAGTCC CATTTTTGGA

40141	ATGAATTTTT TTTGTCTCCT ACTTTCAAGT TCACATATTG AAGCCATTAT TGCCAAGGTG

40201	ATGGTATCAG AAGGAGGGAC CTTTGGGAGA TGAATGGATG GATTCCAAGA GGTTATGTGG

40261	GCAGAGCACC CATGATGGGG TTGGTGCCTT CATAGGAAGA AGACACAGTA GAAGGGAAAG

40321	AGATGCCGAC TGAAAAACAG GAAGTCTCCT GGAGTAGGCC ACTCAGCCTA TGACACGCCA

40381	GCACTCAGAT CTCGGACTTC CCATCTCCCA AATGGTGATA AACAAATGCT GTTGTCCAGG

40441	CTGCACAGTC TACGGCATTT TGTTGCAAGG GCCTGGACCA ACCAGGCTCA GGCAGGAAGT

40501	GAATCTAGTG TGGGAGGATG TACAGACTGC CACTCAGTCT GGACACAAAC TGTCCTCAGG

40561	GATCACCTGA GCCACATCTA CCTAAGAATG GCTATTCTTT CCATTTGTTA ACATCAAATG

40621	CCAAGCCCCT ACTGTATGTA GGCTCTTGCT AGCAGTGGAT ATGATGCTAT GTGAGATGGG

40681	AGCAATCCTC TCTGCACAGA ACTATACATA GAACTATGCA TAGAAGACCA ACAGGGAGAC

40741	ATCAGATAAC TATTAACTGT GATAGCTCTG TGGGAGACAA ACAGAATGAG GGAATGGACA

40801	ATGACTTTGA GGAAAAACTA TGATTGAAAA TACTCTATCT GGCTGGGCGG TGGTGGCGCA

40861	TGCCTTTAAT CCCAGCACTT GGGAGGCAGA GGCAGGTAGA TCTCTGTGAG TTCGAGACCA

40921	GCCTGGTCTA TAAGAGCTAG TTCCAGGACA GCCTCCAAAG CCACAGAGAA ACCCTGTCTC

40981	AAAAAAAACA AAACAAACAC ACAAAAAAGA AAATATTCTG TGAGGTAAAC AAGCATCTGG

41041	AAGGGTTGGG AGATAATGCA GGCAAAAATG CATTAGACAG CACACAGTAC AACACAGCAA

41101	TCAAACTTAA TATAAACACA GCAAATGTCA TCTTTGGGCT TTGCCCCATT TCCTGATCTG

41161	ACCATAACAG CCTAGTGTCT GGAAAGCACA CTAAAGCCAT TTACGTCACA CAGGAGTTCA

41221	ATGTTGAGTT CAGAGGGAGG GGGTGGAGGG CAGATTAGCG AGGTACAAGT TCTGGTCCCT

41281	TTGATGAAGT GTTGATGTAC CCATCGACAC CACACAAATA TACCATCATG CTCCATGTTA

41341	GGGTCAGTGA AGGATTGCAT ATGTGACGGT GGCCCACTGG GCTGAGAAAG CCCTATTGCT

41401	TAGTGACATC TGTGATAATG ACATGCGAGC CCTATTGCTT AGTGACATCA CTCTTCTCAT

41461	AGTGTGGGAT CCAATGTGTT TCTTGTACAC TTGTGATAAT GACATGCAAA CAAGTCTATT

41521	GTGCGGCCAG TCACACAAAA AATATATTAT GTGCAGTCAG GAACAGTCCA TAGTACTTGA

41581	TTGGGACAGC ACAAGTCTGT GTTGCTGGTT CACACATTAA TCATTACCAC TGTTTTAGTG

41641	TGCTCCTATA TATATATATT TAAAAATTAC TATAAAATGA TACACCGTGC TGAGCAATAG

41701	CACCTCTTAT ACCTTGTGTT TACTGGATGT ACTCAAGCTA TTTTCTCTTG TGCTTGATTT

41761	ATTTGTATTT GTATTTTTGA GAGAACCTCA TCTAGTCCAT GCTGGCTTCA AACTTGTTAT

41821	AAAGCTGAGG ATGGCTTCGA ACTCCTGATC CCCCAGCCTC TGCCTCCCAA ATGATGAGAT

41881	TACAGGCATA TGCTACCAAA CATGACTTTT ATTTATTTTT ATTACTTAGG TGGTATGGGT

41941	GGTTTGAATG AGACTGTCCC CTTTGGCTTA TATATTTGTA GGTGGACCTT TGGAAAGGTT

42001	TAACAGGTAT GACCATAGTG GAGGCAGTGT GTCAGTAGGG GAGGTCTTTG GGGAACCCAA

42061	TACTCAATCA ATTCCAAGTT AGGGCTGTCT GTCTGTCTGT CCCCTGATTG TGTCACAAGG

42121	CAGAAACTCT CAGCTACTGC TCTAGTTCTA TGCCTACCCA CCTGTTGCCA TGGTCCCTGC

42181	CATGATGGTC ATGTACTTCA ACCCTTTGGA TAGGTGGCCC CCAAATTAAA TGGTTTCTTT

42241	TATAAGTTGC CTTGGTCATG GTGTTTTGTC ATGGCGATAA GAAAGTGACT GAGACAGGTT

42301	TGTTGCTGTT GTTACAAGGT TTAGTCCAGG CATCTGGCAC CACCTCTGGC CTGTGCTTGA

42361	TTCAATCATG TTACCTTTAG AAATAGCAGG CTAAAGGACA TATACCTGTG TACGTATATG

42421	TGTACGTATA TATTAGCTGT ATAGTCTAAG TGTGCACCTG ACTCTAATAT CTAGGTTTGT

42481	GTAAGTAGAC TCCACCAAGC TCACTAAGCA ATGGTATCAC AGTTTTCAGA TAGTGTTCAG

42541	CGATGCTTGG CTGAGTGTTA GTTCTTTTTT TAATATTTTA TTTATTTATT ATGTATACAA

42601	CATTCTGCTT CCATGTATCT CTGCACACCA GAAGAGGACA CCAAATCTCA TAACGGATGG

42661	TTTTGAGCCA CCATGTGGTT GCTGGGAATT GAACTCAGGA CCTCTGGAAG AGCAGTCGGT

42721	GCTCTTAACC TCTGAGCCAT CTCTCCAGCC CCTGAGTGTT TTTAAATCAA GGAAAAAAGC

42781	CTGAGGGAAG GGAGCTCAGG CTGAAGGGGA GGAGTCAAGA CAGTCTGACC CCAAGGCATT

42841	GTGGGACGTA AAGAGTTCTG GGACAAGACT GAGGTCTCTT CCTTCTCAGA GACTGTGGGC

42901	TTCAGTTTCC TTGGTAGCCG GAAGCAAAGC TAATCCATGG CTTAAAATAT AATACTCAGT

42961	GTAACCTTGT GTTGTAGAAG TGACTTGCTT GTCTTCTTCC ATAATTCTAA AACATCTTTA

43021	AGAGCAGGAT CCAGGAAGGG AAAAGGAGAG ATTCTCATCT TCTTCAAAAG GCAGCTTTCC

43081	CTAAAGCATT TTCTGATGAA ATTTAAGTTC TAAAACCAGC AGTGGTATAA TCCCATCATG

43141	AATGGGGATC TCTGAGTTTA AGGCCAGCCT GGTCTACAGA GCAAGTTCCA GGACAGCCAC

43201	GGTTACACAA AGAAATCCTG TCTTAAAACA AAACAAAACC CAAAACAAAC ATAAACAAAA

43261	ACTATCCAAA ACCAACCAAC CCCCCCAACT CAGAAAGAAA GAAAGAAAGA AATCAAGAAA

43321	GAACTGCCCA CCGGGTGTTG GTGGTGCAAG CCTTTAATCC CAGCACTCGG GAGGCAGAGG

43381	CAGGCAGATC TCTGTGAGTT TGAGGCCAAC CTGTTCTCCA GAAAGAGTGC CAGGATAGGC

43441	TCCAAAGCTA CACAGAGAAA CCCTGTCTTG AAAAAAGAAA AGAAAGAACT ACCCATGACC

43501	AAACAGTTCC ATGGCCAGGT AGAGAATGAG GACGCTGAAA GTCACACCTT CTCAGAGTCT

43561	CAAACTGCAC ATCTGGCCTC AAAGTCCAGA AATGAGTGCA AGACCATTAA TGACAGTCTT

43621	TGGAAACAAA CCAGACCAAA GAACATTTGG CTCCTGATAC ATATTCTGAG GGTCACATAG

43681	AAAGAAAGAT CTGCCTTTGG CCACCTCCTT TTGAAGTGGG GAATTTTATT TTCTTCTGCA

43741	TGGAAACTTC ATGTAGGTAT TTGAGAATAC ATACAGACAT GCAGGTGCAC ATGCACGGAC

43801	ATGAACACAC ACATACACCC CGGGTAGGCA GGCAAGAAAG TGTGTGGAAT AACACTTGAA

43861	CTTCCCTTCC AGAACAGAAG CCCTCTGAAG TGTGACATTC ATGCTGGCTG CATGGGGTCT

43921	GATCAGTACT AGTGAGTGGA GGTGGAGGGG TAGGAAACAT GGGGATGATA ATAGGTTGTC

43981	AGGAAAGTGG TGCCCCAGGT AGCACAGAGT AGAAATTTGT CCCCCAAAAT CCTTTTGAAC

44041	CCAGTTGATT TGAATGCCGT GCCCCTGCCA CCCAGGCTTC AGAGCTAAGT GACTTATGTC

44101	TTCAGGTCAG TGATGATTAC CACGGTTGCA GTGCTAACAC AGATGCTTTA TCTACCAGGA

44161	CAGAAACAAG AAAGATGCTC CTTCCCAGGC CCCTTAGCAC TCTCTGGGTG GGGAGGATTG

44221	CCCCACCTTC CAAAAATAGA ATACTGTTTT GGTAAACAGC CACTTTGAGC CCATGAGGAT

44281	ATCTTCATTA GCTATGGAGA CAGGTTTTAG TAAGAAAGCA AGATGAGAGG CTAAAAAACC

44341	CTTGGGGAGC AGGAACTGGG AAGACTGTGG TACCTTGTTC CCAGATCCAC CAGAAACCTT

44401	GCCACCAGAC GATGTGTCCA GGCCCCACAT ATTTCACAAA AAGTTGGATC TGATAACAAT

44461	GAGGATGGAA TCCCGGTCTT AAGGTGGGTT TGGGGTGGGA AGAGGCGGGA TAATGGGTGA

44521	GAGGGTCGGT GGGGACAGGT GAGATGGGGT ATGGTGGGGA GAGGTGGAAT GGGGTGGGGT

44581	GGGTTGAGAT GGAGTATGGT ACAGCGGGGA GGGATAGAAT TGTCTTTTCC CTGTACCACA

44641	GAGAAGTTTG ACTGCTACCC TTGGCAATTA ATCAATTATA GAAAATGCAA CTTTGCTTTT

44701	AAAATGTGTC TATTTCCAAA GGCTTCTTCC CCTCCCCTAC CTAGGGAGAA GGAAAGAATG

44761	GATAATGCTA CTGTAGAGGA GGGTAGCATC ACTATAGAGG CCTCAGTATC TGCCCCAGGG

44821	AGCTGGGAGA GAGTTCTATC ACACAAACAC AGCCCGAGTC ACATACTCAA CAAACCCCAC

44881	AAAACAAAAC AACAATAATG AAGATACAAA ATCTCATTAT GTAGCCCAGG CTAGTCCTAG

44941	ATTTCTGTTT TCTTTTTTTG TTTTTCGAGA CAGGGTTTCT CTGTGTAGCT TTGGAGCCTA

45001	TCCTGGCACT TGCTCTGAAG CCCAGGCTGC CCTCACTCAC AGAGATCCGC CTGCCTCTGT

45061	CTCCAGAGTG CTGGGATTAA AGGCGTGCAC CACTAATGCC TGGCTAGTCC TAGATTTTTT

45121	TATCCTCCTG CCTCAGGCTC CCAACTGTTG GGTTTACTTT TGGGAGTCCA TTTTCTTCCA

45181	GCATGGATTC TTTGAATTGA AATTCAGATT ATCAGGTTTC TGTAGCAATC CCACCAGCCC

45241	ATTTTTTTGT CTGACACTGC TTGTTTTGAG ACACAGTCTC CCACTGCTGT AGCCCAGGCT

45301	GCCCTAGATT TTCTATGTAG CCCAGGCTGG CCTTGAACTC CCAGGAGTCC TCTGGCCTCT

45361	CCCTTTTGAT TACTGGAACT AGAAGAAGTC ACTATGCTTG ACTTGGAACT AATATTAGAA

45421	CAAAATATAT TTTTCATTGA GATTCAACTT TGAAATCCTG ATGCTCCTGC CTCACTCAGG

45481	TCATCAGGGT TGGCAGCAAG AGCCTTTATC CACTGAGTCA TATTGGGCCC TGACCTGCTT

45541	TTAAATTTTG CCTTTAGGGC TGGAGATGTA GCTCGGCTGG TTCAGTGCTT GCCTGGTACC

45601	CACGAAGCCC TGGGTTTGAT CTACAACACA GTATAAGCCA GGCCTGATGG CGTATACATG

45661	TAATCCTAAC ACTTGGGGAG CAAGAGGGAG GCCAAAGCCA TCCTCTGCTA CTTGGTGAGC

45721	TTGAGGCCAG CCTGGGATCC TTGAGACCCT GTTTCAAAAC AATAACAACA AACACAGACT

45781	ACTAAAAAAA ATTAATAAGG GCCAGACTGG GTGGTGTATT CCTTTAATCC AAGCAATGAG

45841	GAGGCAGAGG CAGGCAAATG TCTGTGAGTC TGGGGACAGC CTGGTCTACT GAGCAGCAGG

45901	CCAACTAAGG CTACATAGTG AGACTATCTC AAAAAAAGCA AAATAACAAT AAACAGACCA

45961	GTTCCCCATC TCCTATTTTG CCTTTACCTC CTATTCCCTG CTCAGCAGGT TATTTTTTGT

46021	TCCTGCATCT TGGTTCACTG ATCTGTAAAC TTGTCTGAAT AAGTAGGTAC AGGGTTGTTT

46081	TAAAATTAGA TAATATATTC AATGAGAAGG GCTACCAAGT GCTCAACCAA TGTATGCATA

46141	TGTATGTATG TATGTATGTA TGTATGTATT TATTTTTGTT TTGTTTTTCA AGATAAGGTT

46201	TCTCTGTGTA GTTTTAGAGT CTGTCCTAGA ACTTGCTCTG AAGAGCAGGC TGGTCTTGAA

46261	CTCACAAAGA TCCACCTGTC TCTGCCTCCC AAGTGCTGGG ATTAAAGGCA TGTGACACCA

46321	CCCCCAAAGC CAATGTTCTT ATAGGCATCT TTGATTTTTT TTCTCTTTCT TTGAGTGGAG

46381	TCTGACTAAG TAGCCCATAC TAGCTCTGCA TTTACAATCT GAACACATGG ATAAGAGTGG

46441	TGAAAATTAT CAAGATCATG TTATGCTATG CCTCCTGAGT CACCATGCCC TGCTTCAGAC

46501	TTCTTTGTAT TAAAGAACTG TGTAAAAAAA AAAAAAAAGA CATTTGAAGG CACATAATCA

46561	GAGGAATTTG TCAGTGATTT TTCACATACT GTCTTATTTG TGGCCAAGGT AAGCCTAGAG

46621	AGTATTTCTT AAAATTAAAA ATAGTGGGCA GATTTTGGAG GCGATCTGAT ATGAAAATCC

46681	CTTCCCACCC CAGGTAGTCA TGGGCTGACT ATCAAGGATA CATTCTGAGA CATATATCCT

46741	CAAGCAGTTT CTGCCTTACG CAAATATCAT AGGTCATAGC ACACTGAGAC TATGTGGCAG

46801	TCTATGTGTC TATATACACA TGGTGTGGCC TATTGTTCCC ATGGTCACAA AGAACAAAAC

46861	AACTTTTTCA CAAGGCTTTA CCCCTAGAGG AAGAGCTACA GGCAATCAAT GGTTGCTGAG

46921	AGGAGTATCA GTCTTCTCCA GGGACTTAGC CAATCCCAAG AGGTCAGCCA CGCATAGGAA

46981	CGCTTAGCCA CGCTTGTATA GAACATCTCA AACAACAACC ACCTCAGTGT AAAGCAAGCA

47041	CACAAGGAAC TGATGCAACT AAGAGACAAA GGGCCCGGTG TGTGTGGCCC GTAGCTGTCA

47101	TCCCAGCACT TGAGACTAAG GAAGGAAGGT TGAGAATTTG AGGCCAGCAT GGACTCCACA

47161	GAAAGACCGT TTTCTTTCTC AGAAAAAAGA AGCAAAAACC AAGAACAAGG TGTATGGGAA

47221	TGCTACTGTC TTGGCATATT GTTTATAGAA AACTTTTTTA TATATAAAAG GAATGCACTA

47281	CAAAAATTAT AAACTACTGT AATATTAACT GCATAGATCT ATAACATGGT CATTTATTAT

47341	TGAGTATGAT TATCTATCTA CCCACGCTGC AGGTTTAGAC AGTTGCACTA CAGTAGATCT

47401	GTTTGCAGTA GCATCATTAT TAGACATTTT GGACAAAGCC AAGTGGTAAT GGCACATGCC

47461	TTTAATCCCA GCACTTGGGA AGCAGAGGTA GGCGGATCTC TGTGAGTCAG AGACCAGCCT

47521	GGTCTACAAA GAACTAGTTC CAGGAGAGTC TCCAAGGCCA CAGAGAAACC CTGTCTCGAA

47581	AAACCAAAAG AAAAAAAGAA AACAAAAAAC TAAAAAATAA ATAAATTTGG GGCAATATCT

47641	TGTCCTATGA TGTTACTGGG TAATGGGATT TCCTCCTCTT GTATTATTTT TTCTTTGGGG

47701	GTTTTACTTA TTATTTACTT GAGACAGAGT CTCATTTATG ACAGGCTGGC CTCAAACAGG

47761	AAATGAAGCC AAGGAAGACC TTGAAGACCT AATCCTTCTG TTTCTTCCTC CTATATGGTG

47821	AGTTAAAGGC ATACAGTACC ATGCCCAGTC TATTCACTGC CCAGGGCTTC ATGCATGCTA

47881	GCAAAGCACC AACTGAGCTG CATCCCCACC CCTCCTCCTG GCTTCCATCT CCTTATGTAG

47941	CTAGAAATGA GCCTGTCTGT CTCAAATACT GGGATTATGG GTGTGTGCCA CCACACCTGG

48001	CTTCCTATTA TAGCCTTGTG GGATCACTGT TGTTTACTGA AGCATTGTGA CACACTGCAG

48061	ATTGCTGGAA CAGCGTCTGC CATCATCATG ACACAACTTC AGAGAAAGAG AGAGTTCCCA

48121	ACCAGCCACA CACTTAACTC AATGCCTGTA GCCCTTATTC TGTTAAGACG ATTTCCTGCC

48181	ATCTTACTCA AAGACCCTCT TTAACTCGGT AGGAACATCT GTTACACTGA AAGTCCTGCC

48241	TGTTGCTCCA CTGACCTCCT TCACAAATTA TTATATTTTG GAGCCAATTC TGAACCCAGG

48301	TTTTCTGAGT GACACATTTT AGTATTTTTT TTTTCTTTCT ATTTTCTTTC ATGGAAAGTC

48361	TCTTGTTACT GTTCACATGA CCAAGGATCA CTGCATCATC TTCCAAGGCC AATTTTGGAT

48421	GTTTCAGCAA GGGAGACTGA AGATCCTGAG TCTCAGTGTT GATCTCCTTT AGAATGTCCT

48481	CTGGAGAAGG TAGTGACAAC ACTGCAAGGA TAATAGGTGA ATAAAGGGAA GCCAGAGTGT

48541	CCTCTGGGAT GTGCGGCACT TACATGAAGG ATTCATTTAT AAATTTTAAG TTATGGAGTA

48601	TAATAATAAG ACTAAATATG TAGTGTCGTA ATTTTATAAC TATACATATG TATATAGTAA

48661	ATATAAATTT ATATGTAATG TATTTATAGT AAGTGTACAT AGAATTGAAC ATATGTTACA

48721	TAAATGGCAG AAAGGAATGA TTCTCAATTG CTTTTTTTCT AATTATAATT TCTATTGCTC

48781	TTTGTGGATT TCACACCATG CATTCTGATC CCACTTATCT CCTTGTCTCC TTGCATTTGC

48841	CCTCTGCCCT TGCAACCTCA CCCCCAAATC AAAGCCAAAT TTAAAAAAAA AACCAAAATC

48901	CAAACAAAAC AGAGACAAAA CAAAAATAAA AGCAACAACA AAAAAAGGAG AATCTTGTCA

48961	TGGTAGCTGT AGTGTGGCCT GTTGAATCAC ACAGTATACC CTTTAGTCCA TTCATCTTTT

49021	CTTCCAAGTG TTCATTGATA CAAGTCACGG TCTGGCTCGA GGATTCTGGT TTCTGCTATA

49081	TTACTAATAA TGGGCTCTCA CTGGGGCTCC CCTTGGATAT CCTATTGTCC TGTGTTATGG

49141	AGAGCCTGCT GTTTTGGATA TGTAGGTTTG TCCCCTTCAC ATGCTATAAC AATTCATAAA

49201	TTCAGTGAAT GTTGGGGTGG GCCAACTCAT AGCCCTGGTT CTGGGCTTGG GTGGTATTAT

49261	TAAACCCACT GATGGAGAAT AAGACCACTA CCATAATTTA AAAGCCAAAT TGAAGCAAGT

49321	TTTAATTCAA TACTGCCCAG GTGGACAGGC TCTGGCTAGG TCCATCTCTG AGTTTCCAGG

49381	AGGTGGCCCT GACTCACGGT TTACAGTGGC TTGAGTATTT TCCATAAGGT CCAATCAGGG

49441	GCAAGCATAC ATCCTGATGT ACCTCCAGTC TATATCCAAT CGGGGGCAAG TGTACATCTT

49501	GATGTATTTC CTGCCTGTGA ACCTACTGCC CACATGTGAT CAAGCACATC CGGTGCAGTT

49561	GGGTCAAACA GACTTGTTTA GGGCAATGAA AAACACATGG CTTTTTATCT CCCATAAACA

49621	ATAGCCTCCA GCGGTTCAGG GACTATTTGT CCTTGGGCAA GGAATTTACA GATCCTATAG

49681	GTGAGTCAGG GTCAGCATCC TGCTCTCATG CCCTCAGGGC TGGCTCACTT GTTACCTCCC

49741	CGACCCTCTC TCAACAGGGT CAGCTCTGAG GTGCTGCCCA GGTGGGGTGC AGGGCCTACT

49801	CTTCCGCATG TTGCAGCTGG TCAGGGTTAG TTCTCTCATA TGCCACAGGT GGCAATGGGT

49861	GAAGGGGGAG GGCATGTTTC CCTCATCAAC GCCATTACAT GGGGGGATGG GGTCAGCTCT

49921	CATGCCCTTA GGGTTGGCTC ACCTGCATCC TTGACCATAG GGTCAGCTCT AGTATGCTGC

49981	TCAAGTGAGG CGCACACCTA

SEQ ID NO: 4 (LoxP sequence from bacteriaphage P1)

1	ATAACTTCGT ATAGCATACA TTATACGAAG TTAT

SEQ ID NO: 5 (FRT sequence from the 2i.tm plasmid of the baker′s yeast Saccharomyces

cerevisiae)

1	GAAGTTCCTA TTCtctagaa aGTATAGGAA CTTC

SEQ ID NO: 6 (attB sequence from E. coli)

1	cCTGCTTt t TtatAc tAA CTTGa

SEQ ID NO: 7 (Recognition site for the CHO-23/24 meganuclease, 35,699 basepairs

downstream of CHO DHFR)

1	TAAGGCCTCA TATGAAAATA TA

SEQ ID NO: 8 (Recognition site for the CHO-51/52 meganuclease, 15,898 basepairs

downstream of CHO DHFR)

1	ATAGATGTCT TGCATACTCT AG

SEQ ID NO: 9 (CHO-23/24 meganuclease)

1	MAPKKKRKVH MNTKYNKEFL LYLAGFVDGD GSIKAQIFPN QCYKFKHQLR LRFQVTQKTQ

61	RRWFLDKLVD EIGVGYVTDR GSVSDYMLSQ IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE

121	QLPSAKESPD KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLPGSVG GLSPSQASSA

181	ASSASSSPGS GISEALRAGA GSGTGYNKEF LLYLAGFVDG DGSIIAQIKP GQSYKFKHTL

241	QLVFQVTQKT QRRWFLDKLV DEIGVGYVID RGSASDYRLS EIKPLHNFLT QLQPFLKLKQ

301	KQANLVLKII EQLPSAKESP DKFLEVCTWV DQIAALNDSK TRKTTSETVR AVLDSLSEKK

361	KSSP

SEQ ID NO: 10 (CHO-51/52 meganuclease)

1	MAPKKKRKVH MNTKYNKEFL LYLAGFVDGD GSIIAQIPPN QSCKFKHQLR LTFQVTQKTQ

61	RRWFLDKLVD EIGVGYVRDR GSVSDYILSE IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE

121	QLPSAKESPD KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLPGSVG GLSPSQASSA

181	ASSASSSPGS GISEALRAGA GSGTGYNKEF LLYLAGFVDG DGSIYAGIAP NQSCKFKHQL

241	RLWFVVSQKT QRRWFLDKLV DEIGVGYVID NGSVSHYRLS EIKPLHNFLT QLQPFLKLKQ

301	KQANLVLKII EQLPSAKESP DKFLEVCTWV DQIAALNDSK TRKTTSETVR AVLDSLSEKK

361	KSSP

SEQ ID NO: 11 (CHO-51/52 donor plasmid with EcoRI site)

1	TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA

61	CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG

121	TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA GCAGATTGTA CTGAGAGTGC

181	ACCATATGCG GTGTGAAATA CCGCACAGAT GCGTAAGGAG AAAATACCGC ATCAGGCGCC

241	ATTCGCCATT CAGGCTGCGC AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT

301	TACGCCAGCT GGCGAAAGGG GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT

361	TTTCCCAGTC ACGACGTTGT AAAACGACGG CCAGTGAATT CGAGCTCGGT ACCCAGAAAC

421	CTTTCAACCA GCTTTTGAGC TAATGATAGA GAGAAGCTCA AGGAATTGGA GCAATGCTTG

481	ACTAGGGATG TCAGAGGGAG GCTATCCAGA GGAGCTTACA ACTGAGGTAA ACTTAAAAGT

541	TAGGGAGTTT GTCAACTTCA ACCCACAGAA TAGAGCAGAG CCAGGAGGAG CTGAGGCTTC

601	TGAGTGTTAT GGTGGAAGCA TCACCCCAAC CCTTGACATC CATATGCCTG AAGAGTCTGG

661	AATGTTATGG TGGAAGTTCC ACCCAAGCCT CCCTTCCCGG TCGCCCTCCA AACCCTGCTA

721	CATCTCAGAA ATCCCACCAA ATGATGACTC CCTCCCCCAG AGATATTCAA GACCACTCCC

781	ACAGGGTATT TAAACTGCCC CCCAACCCCC AGAAAATAGA TGTGTGGTTT TCCAATCTCT

841	CTTTCCTATC ACGTCTCTGG GGAGCTGGCA GGCCATTTGG GAGCATTGTA TCCATTAAAC

901	GACTTCTCAG TGGAGACTCT GAAAGCCAGA AGAGCCTAGA CAGATAGATG TCTTGCGAAT

961	TCTTGCATAC TCTAGAGACT ACAGATGCCG GCCCAGACTA TTATATCCAG CAAAAGTTTC

1021	AAACACCATA CAAAGTCAAA TTTAAACAGT ATCTATCTAC AAATCCAATA TTACAGAAGG

1081	TGCTAGTAGG AAAACTCCAA ACTAAGATTA ACTATACCTG TGAAGACACA GGAAATAATC

1141	TCACACTGGC AAAAGAAGAA AAACCTCTCT CTCTCTCTCC TCTCTCTCTC TCTCTCTCTC

1201	TCTCTCTCTC TCTCTCTCTC TCTCTCTCTC TCACACACAC ACACACACAC ACACACACAC

1261	ACCAACACCA ATACCATGAA CAACAAAATA ACAGGAATTA ACAATAATTG ATGTGTGTGT

1321	ATGTCCCTGT GTGTGTGTCC TTGTGTGTGT CTGTTTGTGT GTCTGTGTAT ATGTTTGTCA

1381	CCTGAGGGGT GGCTCTTCCT TGGTTTGTGA GGTTTCTACC CAAAAGCTTG GCGTAATCAT

1441	GGTCATAGCT GTTTCCTGTG TGAAATTGTT ATCCGCTCAC AATTCCACAC AACATACGAG

1501	CCGGAAGCAT AAAGTGTAAA GCCTGGGGTG CCTAATGAGT GAGCTAACTC ACATTAATTG

1561	CGTTGCGCTC ACTGCCCGCT TTCCAGTCGG GAAACCTGTC GTGCCAGCTG CATTAATGAA

1621	TCGGCCAACG CGCGGGGAGA GGCGGTTTGC GTATTGGGCG CTCTTCCGCT TCCTCGCTCA

1681	CTGACTCGCT GCGCTCGGTC GTTCGGCTGC GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG

1741	TAATACGGTT ATCCACAGAA TCAGGGGATA ACGCAGGAAA GAACATGTGA GCAAAAGGCC

1801	AGCAAAAGGC CAGGAACCGT AAAAAGGCCG CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC

1861	CCCCTGACGA GCATCACAAA AATCGACGCT CAAGTCAGAG GTGGCGAAAC CCGACAGGAC

1921	TATAAAGATA CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT GTTCCGACCC

1981	TGCCGCTTAC CGGATACCTG TCCGCCTTTC TCCCTTCGGG AAGCGTGGCG CTTTCTCATA

2041	GCTCACGCTG TAGGTATCTC AGTTCGGTGT AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC

2101	ACGAACCCCC CGTTCAGCCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTGAGTCCA

2161	ACCCGGTAAG ACACGACTTA TCGCCACTGG CAGCAGCCAC TGGTAACAGG ATTAGCAGAG

2221	CGAGGTATGT AGGCGGTGCT ACAGAGTTCT TGAAGTGGTG GCCTAACTAC GGCTACACTA

2281	GAAGAACAGT ATTTGGTATC TGCGCTCTGC TGAAGCCAGT TACCTTCGGA AAAAGAGTTG

2341	GTAGCTCTTG ATCCGGCAAA CAAACCACCG CTGGTAGCGG TGGTTTTTTT GTTTGCAAGC

2401	AGCAGATTAC GCGCAGAAAA AAAGGATCTC AAGAAGATCC TTTGATCTTT TCTACGGGGT

2461	CTGACGCTCA GTGGAACGAA AACTCACGTT AAGGGATTTT GGTCATGAGA TTATCAAAAA

2521	GGATCTTCAC CTAGATCCTT TTAAATTAAA AATGAAGTTT TAAATCAATC TAAAGTATAT

2581	ATGAGTAAAC TTGGTCTGAC AGTTACCAAT GCTTAATCAG TGAGGCACCT ATCTCAGCGA

2641	TCTGTCTATT TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA ACTACGATAC

2701	GGGAGGGCTT ACCATCTGGC CCCAGTGCTG CAATGATACC GCGAGACCCA CGCTCACCGG

2761	CTCCAGATTT ATCAGCAATA AACCAGCCAG CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG

2821	CAACTTTATC CGCCTCCATC CAGTCTATTA ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT

2881	CGCCAGTTAA TAGTTTGCGC AACGTTGTTG CCATTGCTAC AGGCATCGTG GTGTCACGCT

2941	CGTCGTTTGG TATGGCTTCA TTCAGCTCCG GTTCCCAACG ATCAAGGCGA GTTACATGAT

3001	CCCCCATGTT GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC TCCGATCGTT GTCAGAAGTA

3061	AGTTGGCCGC AGTGTTATCA CTCATGGTTA TGGCAGCACT GCATAATTCT CTTACTGTCA

3121	TGCCATCCGT AAGATGCTTT TCTGTGACTG GTGAGTACTC AACCAAGTCA TTCTGAGAAT

3181	AGTGTATGCG GCGACCGAGT TGCTCTTGCC CGGCGTCAAT ACGGGATAAT ACCGCGCCAC

3241	ATAGCAGAAC TTTAAAAGTG CTCATCATTG GAAAACGTTC TTCGGGGCGA AAACTCTCAA

3301	GGATCTTACC GCTGTTGAGA TCCAGTTCGA TGTAACCCAC TCGTGCACCC AACTGATCTT

3361	CAGCATCTTT TACTTTCACC AGCGTTTCTG GGTGAGCAAA AACAGGAAGG CAAAATGCCG

3421	CAAAAAAGGG AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT

3481	ATTATTGAAG CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATGTATTT

3541	AGAAAAATAA ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTCT

3601	AAGAAACCAT TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG AGGCCCTTTC

3661	GTC

SEQ ID NO: 12 (Recognition site for the CHO-13/14 meganuclease, in Intron 2 of CHO

DHFR)
1	TACATGTATG TACAAAATAT AT

SEQ ID NO: 13 (CHO-13/14 meganuclease)

1	MAPKKKRKVH MNTKYNKEFL LYLAGFVDGD GSIFASITPR QCYKFKHELQ LTFVVTQKTQ

61	RRWFLDKLVD EIGVGYVIDQ GSVSHYRLSE IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE

121	QLPSAKESPD KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLPGSVG GLSPSQASSA

181	ASSASSSPGS GISEALRAGA GSGTGYNKEF LLYLAGFVDG DGSIIAQIKP NQSCKFKHQL

241	MLTFTVAQKT QRRWFLDKLV DEIGVGYVID IGSVSEYRLS QIKPLHNFLT QLQPFLKLKQ

301	KQANLVLKII EQLPSAKESP DKFLEVCTWV DQIAALNDSK TRKTTSETVR AVLDSLSEKK

361	KSSP

SEQ ID NO: 14 (Recognition site for the CGS-5/6 meganuclease, in Exon 4 of CHO GS)

1	AAGGCACTCG TGTAAACGGA TA

SEQ ID NO: 15 (CGS-5/6 meganuclease)

1	MAPKKKRKVH MNTKYNKEFL LYLAGFVDGD GSIKAIIRPE QSYKFKHRLR LVFQVTQKTQ

61	RRWFLDKLVD EIGVGYVYDR GSVSDYYLSE IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE

121	QLPSAKESPD KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLPGSVG GLSPSQASSA

181	ASSASSSPGS GISEALRAGA GSGTGYNKEF LLYLAGFVDG DGSIWARIKP GQSYKFKHTL

241	ELVFQVTQKT QRRWILDKLV DEIGVGYVTD AGSASVYRLS EIKPLHNFLT QLQPFLKLKQ

301	KQANLVLKII EQLPSAKESP DKFLEVCTWV DQIAALNDSK TRKTTSETVR AVLDSLSEKK

361	KSSP

SEQ ID NO: 16 (Forward PCR primer for evaluating CHO-23/24 target site)

1	ggagggacat taatctgcat gcagtgatc

SEQ ID NO: 17 (Reverse PCR primer for evaluating CHO-23/24 target site)

1	gtcttggttt gggttgtcta agcaacctc

SEQ ID NO: 18 (Forward PCR primer for evaluating CHO-51/52 target site)

1	CACAGGTGTC CACTCCCAGT TCAATTACAG CTCTTAAGG

SEQ ID NO 19 (Reverse PCR primer for evaluating CHO-51/52 target site)

1	CGATGGCCCA CTACGTGAAC CATCACC

SEQ ID NO: 20 (PCR template for mRNA encoding CHO-23/24)

1	CACAGGTGTC CACTCCCAGT TCAATTACAG CTCTTAAGGC TAGAGTACTT AATACGACTC

61	ACTATAGGCT AGCCTCGAGC CGCCACCATG GCACCGAAGA AGAAGCGCAA GGTGCATATG

121	GCACCGAAGA AGAAGCGCAA GGTGCATATG AACACCAAGT ACAACAAGGA GTTCCTGCTC

181	TACCTGGCGG GCTTCGTCGA CGGGGACGGC TCCATCAAGG CCCAGATCTT TCCGAACCAG

241	TGCTACAAGT TCAAGCATCA GCTGAGGCTC CGTTTCCAGG TCACCCAGAA GACACAGCGC

301	CGTTGGTTCC TCGACAAGCT GGTGGACGAG ATCGGGGTGG GCTACGTGAC TGACCGCGGC

361	AGCGTCTCCG ACTACATGCT GAGCCAGATC AAGCCTCTGC ACAACTTCCT GACCCAGCTC

421	CAGCCCTTCC TGAAGCTCAA GCAGAAGCAG GCCAACCTCG TGCTGAAGAT CATCGAGCAG

481	CTGCCCTCCG CCAAGGAATC CCCGGACAAG TTCCTGGAGG TGTGCACGTG GGTGGACCAG

541	ATCGCGGCCC TCAACGACAG CAAGACCCGC AAGACGACCT CGGAGACGGT GCGGGCGGTC

601	CTGGACTCCC TCCCAGGATC CGTGGGAGGT CTATCGCCAT CTCAGGCATC CAGCGCCGCA

661	TCCTCGGCTT CCTCAAGCCC GGGTTCAGGG ATCTCCGAAG CACTCAGAGC TGGAGCAGGT

721	TCCGGCACTG GATACAACAA GGAATTCCTG CTCTACCTGG CGGGCTTCGT GGACGGGGAC

781	GGCTCCATCA TCGCCCAGAT CAAGCCGGGT CAGTCCTACA AGTTCAAGCA TACCCTGCAG

841	CTCGTTTTCC AGGTCACGCA GAAGACACAG CGCCGTTGGA TCCTCGACAA GCTGGTGGAC

901	GAGATCGGGG TGGGCTATGT GATCGACCGC GGCAGCGCCT CCGACTACCG CCTGAGCGAG

961	ATCAAGCCTC TGCACAACTT CCTGACCCAG CTCCAGCCCT TCCTGAAGCT CAAGCAGAAG

1021	CAGGCCAACC TCGTGCTGAA GATCATCGAG CAGCTGCCCT CCGCCAAGGA ATCCCCGGAC

1081	AAGTTCCTGG AGGTGTGCAC CTGGGTGGAC CAGATCGCCG CTCTGAACGA CTCCAAGACC

1141	CGCAAGACCA CTTCCGAGAC CGTCCGCGCC GTTCTAGACA GTCTCTCCGA GAAGAAGAAG

1201	TCGTCCCCCT AGACAGTCTC TCCGAGAAGA AGAAGTCGTC CCCCTAGCGG CCGCTTCGAG

1261	CAGACATGAT AAGATACATT GATGAGTTTG GACAAACCAC AACTAGAATG CAGTGAAAAA

1321	AATGCTTTAT TTGTGAAATT TGTGATGCTA TTGCTTTATT TGTAACCATT ATAAGCTGCA

1381	ATAAACAAGT TAACAACAAC AATTGCATTC ATTTTATGTT TCAGGTTCAG GGGGAGATGT

1441	GGGAGGTTTT TTAAAGCAAG TAAAACCTCT ACAAATGTGG TAAAATCGAT AAGATCTTGA

1501	TCCGGGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC AGTTGCGCAG

1561	CCTGAATGGC GAATGGACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG TGTGGTGGTT

1621	ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT CGCTTTCTTC

1681	CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG GGGGCTCCCT

1741	TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA TTAGGGTGAT

1801	GGTTCACGTA GTGGGCCATC G

SEQ ID NO: 21 (PCR template for mRNA encoding CHO-51/52)

1	CACAGGTGTC CACTCCCAGT TCAATTACAG CTCTTAAGGC TAGAGTACTT AATACGACTC

61	ACTATAGGCT AGCCTCGAGC CGCCACCATG GCACCGAAGA AGAAGCGCAA GGTGCATatg

121	gCACCGAAGA AGAAGCGCAA GGTGCATATG AACACCAAGT ACAACAAGGA GTTCCTGCTC

181	TACCTGGCGG GCTTCGTGGA CGGGGACGGC TCCATCATCG CCCAGATCCC GCCGAACCAG

241	TCCTGCAAGT TCAAGCATCA GCTGCGCCTC ACCTTCCAGG TCACGCAGAA GACACAGCGC

301	CGTTGGTTCC TCGACAAGCT GGTGGACGAG ATCGGGGTGG GCTACGTGCG CGACCGCGGC

361	AGCGTCTCCG ACTACATCCT GAGCGAGATC AAGCCTCTGC ACAACTTCCT GACCCAGCTC

421	CAGCCCTTCC TGAAGCTCAA GCAGAAGCAG GCCAACCTCG TGCTGAAGAT CATCGAGCAG

481	CTGCCCTCCG CCAAGGAATC CCCGGACAAG TTCCTGGAGG TGTGCACCTG GGTGGACCAG

541	ATCGCCGCTC TGAACGACTC CAAGACCCGC AAGACCACTT CCGAGACTGT CCGCGCCGTT

601	CTAGACAGTC TCCCAGGATC CGTGGGAGGT CTATCGCCAT CTCAGGCATC CAGCGCCGCA

661	TCCTCGGCTT CCTCAAGCCC GGGTTCAGGG ATCTCCGAAG CACTCAGAGC TGGAGCAGGT

721	TCCGGCACTG GATACAACAA GGAATTCCTG CTCTACCTGG CGGGCTTCGT GGACGGGGAC

781	GGCTCCATCT ACGCCGGGAT CGCGCCGAAC CAGTCCTGCA AGTTCAAGCA TCAGCTGCGC

841	CTCTGGTTCG TGGTCAGCCA GAAGACACAG CGCCGTTGGT TCCTCGACAA GCTGGTGGAC

901	GAGATCGGGG TGGGCTACGT GATTGACAAT GGCAGCGTCT CCCATTACCG CCTGAGCGAG

961	ATCAAGCCTC TGCACAACTT CCTGACCCAG CTCCAGCCCT TCCTGAAGCT CAAGCAGAAG

1021	CAGGCCAACC TCGTGCTGAA GATCATCGAG CAGCTGCCCT CCGCCAAGGA ATCCCCGGAC

1081	AAGTTCCTGG AGGTGTGCAC CTGGGTGGAC CAGATCGCCG CTTTGAACGA CTCCAAGACC

1141	CGCAAGACCA CTTCCGAGAC TGTCCGCGCC GTTCTAGACA GTCTCTCCGA GAAGAAGAAG

1201	TCGTCCCCCT AGACAGTCTC TCCGAGAAGA AGAAGTCGTC CCCCTAGCGG CCGCTTCGAG

1261	CAGACATGAT AAGATACATT GATGAGTTTG GACAAACCAC AACTAGAATG CAGTGAAAAA

1321	AATGCTTTAT TTGTGAAATT TGTGATGCTA TTGCTTTATT TGTAACCATT ATAAGCTGCA

1381	ATAAACAAGT TAACAACAAC AATTGCATTC ATTTTATGTT TCAGGTTCAG GGGGAGATGT

1441	GGGAGGTTTT TTAAAGCAAG TAAAACCTCT ACAAATGTGG TAAAATCGAT AAGATCTTGA

1501	TCCGGGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC AGTTGCGCAG

1561	CCTGAATGGC GAATGGACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG TGTGGTGGTT

1621	ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT CGCTTTCTTC

1681	CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG GGGGCTCCCT

1741	TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA TTAGGGTGAT

1801	GGTTCACGTA GTGGGCCATC G

SEQ ID NO: 22 (PCR template for mRNA encoding CGS-5/6)

1	CACAGGTGTC CACTCCCAGT TCAATTACAG CTCTTAAGGC TAGAGTACTT AATACGACTC

61	ACTATAGGCT AGCCTCGAGC CGCCACCATG GCACCGAAGA AGAAGCGCAA GGTGCATATG

121	GCACCGAAGA AGAAGCGCAA GGTGCATATG AACACCAAGT ACAACAAGGA GTTCCTGCTC

181	TACCTGGCGG GCTTCGTCGA CGGGGACGGC TCCATCAAGG CCATTATCCG GCCAGAGCAG

241	TCCTACAAGT TCAAGCATCG CCTGCGGCTC GTTTTCCAGG TCACGCAGAA GACACAGCGC

301	CGTTGGTTCC TCGACAAGCT GGTGGACGAG ATCGGGGTGG GCTACGTGTA CGACCGCGGC

361	AGCGTCTCCG ACTACTATCT GAGCGAGATC AAGCCTCTGC ACAACTTCCT GACCCAGCTC

421	CAGCCCTTCC TGAAGCTCAA GCAGAAGCAG GCCAACCTCG TGCTGAAGAT CATCGAGCAG

481	CTGCCCTCCG CCAAGGAATC CCCGGACAAG TTCCTGGAGG TGTGCACGTG GGTGGACCAG

541	ATCGCGGCCC TCAACGACAG CAAGACCCGC AAGACGACCT CGGAGACGGT GCGAGCGGTC

601	CTGGACTCCC TCCCAGGATC CGTGGGAGGT CTATCGCCAT CTCAGGCATC CAGCGCCGCA

661	TCCTCGGCTT CCTCAAGCCC GGGTTCAGGG ATCTCCGAAG CACTCAGAGC TGGAGCAGGT

721	TCCGGCACTG GATACAACAA GGAATTCCTG CTCTACCTGG CGGGCTTCGT GGACGGGGAC

781	GGCTCCATCT GGGCCCGGAT CAAGCCGGGG CAGTCCTACA AGTTCAAGCA TACCCTGGAG

841	CTCGTGTTCC AGGTCACCCA GAAGACACAG CGCCGTTGGA TCCTCGACAA GCTGGTGGAC

901	GAGATCGGGG TGGGCTACGT GACCGACGCC GGCAGCGCCT CCGTCTACCG CCTGAGCGAG

961	ATCAAGCCTC TGCACAACTT CCTGACCCAG CTCCAGCCCT TCCTGAAGCT CAAGCAGAAG

1021	CAGGCCAACC TCGTGCTGAA GATCATCGAG CAGCTGCCCT CCGCCAAGGA ATCCCCGGAC

1081	AAGTTCCTGG AGGTGTGCAC CTGGGTGGAC CAGATCGCCG CTCTGAACGA CTCCAAGACC

1141	CGCAAGACCA CTTCCGAGAC CGTCCGCGCC GTTCTAGACA GTCTCTCCGA GAAGAAGAAG

1201	TCGTCCCCCT AGACAGTCTC TCCGAGAAGA AGAAGTCGTC CCCCTAGCGG CCGCTTCGAG

1261	CAGACATGAT AAGATACATT GATGAGTTTG GACAAACCAC AACTAGAATG CAGTGAAAAA

1321	AATGCTTTAT TTGTGAAATT TGTGATGCTA TTGCTTTATT TGTAACCATT ATAAGCTGCA

1381	ATAAACAAGT TAACAACAAC AATTGCATTC ATTTTATGTT TCAGGTTCAG GGGGAGATGT

1441	GGGAGGTTTT TTAAAGCAAG TAAAACCTCT ACAAATGTGG TAAAATCGAT AAGATCTTGA

1501	TCCGGGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC AGTTGCGCAG

1561	CCTGAATGGC GAATGGACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG TGTGGTGGTT

1621	ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT CGCTTTCTTC

1681	CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG GGGGCTCCCT

1741	TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA TTAGGGTGAT

1801	GGTTCACGTA GTGGGCCATC G

SEQ ID NO: 23 (Forward PCR primer for evaluating CGS-5/6 target site)

1	tgacagctct ggccttaagt gcctacgaaa ctag

SEQ ID NO: 24 (Reverse PCR primer for evaluating CGS-5/6 target site)

1	gtctttcctc tttgctgtag ccttggtaga actactgcc

SEQ ID NO: 25 (CHO-23/24 Insertion target sequence donor plasmid)

1	TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA

61	CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG

121	TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA GCAGATTGTA CTGAGAGTGC

181	ACCATATGCG GTGTGAAATA CCGCACAGAT GCGTAAGGAG AAAATACCGC ATCAGGCGCC

241	ATTCGCCATT CAGGCTGCGC AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT

301	TACGCCAGCT GGCGAAAGGG GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT

361	TTTCCCAGTC ACGACGTTGT AAAACGACGG CCAGTGAATT CCATACCCAG GGGAGCTGTA

421	CTGGGCTGCA GCCCTGCGCC ATTCAGCCAT GCACCAGGCT ACTCCCTCCT CTTCCAGCTT

481	TCTCCTTCTG ATGGCCATAG GATTAGAAGA TAAGGGACTC TAGTGCAGGT CAACTGCTGA

541	CCAGTGTGAA AATGCACAGA CTACATGCTG GTAGATCAGC ACTTCAAACT ACTGTTCACC

601	ATCATCTCTG GAATAAGCAC TACATTTACA GGGTTCAAAC CTCAATGAAT ATAAACAAAC

661	AAAACACACC TCCCTTCCTT CACTGTCTCC CATTTCTTTG GTTCCCATCT CCACATAGAA

721	TTTATAATTA AAATTTCTAA GTATCTTTCC AGAAATACTT CACACATGTT ATAAGCAAAT

781	GTGCTTTTAA AGATACTATT TTAAATTATG AAAATGGTTA TATTAGTTGA GATAAAAGAA

841	TAGAATGGGA AGTTCCAGAA TTTAAGGCCT CATATGGATC CCAGCTGTGG AATGTGTGTC

901	AGTTAGGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA AGCATGCATC

961	TCAATTAGTC AGCAACCAGG TGTGGAAAGT CCCCAGGCTC CCCAGCAGGC AGAAGTATGC

1021	AAAGCATGCA TCTCAATTAG TCAGCAACCA TAGTCCCGCC CCTAACTCCG CCCATCCCGC

1081	CCCTAACTCC GCCCAGTTCC GCCCATTCTC CGCCCCATGG CTGACTAATT TTTTTTATTT

1141	ATGCAGAGGC CGAGGCCGCC TCGGCCTCTG AGCTATTCCA GAAGTAGTGA GGAGGCTTTT

1201	TTGGAGGCTA CCATGGAGAA GTTACTATTC CGAAGTTCCT ATTCTCTAGA AAGTATAGGA

1261	ACTTCAAGCT TGGCACTGGG TACCGCCAAG TTGACCAGTG CCGTTCCGGT GCTCACCGCG

1321	CGCGACGTCG CCGGAGCGGT CGAGTTCTGG ACCGACCGGC TCGGGTTCTC CCGGGACTTC

1381	GTGGAGGACG ACTTCGCCGG TGTGGTCCGG GACGACGTGA CCCTGTTCAT CAGCGCGGTC

1441	CAGGACCAGG TGGTGCCGGA CAACACCCTG GCCTGGGTGT GGGTGCGCGG CCTGGACGAG

1501	CTGTACGCCG AGTGGTCGGA GGTCGTGTCC ACGAACTTCC GGGACGCCTC CGGGCCGGCC

1561	ATGACCGAGA TCGGCGAGCA GCCGTGGGGG CGGGAGTTCG CCCTGCGCGA CCCGGCCGGC

1621	AACTGCGTGC ACTTCGTGGC CGAGGAGCAG GACTGACACC CGAGCGAAAA CGGTCTGCGC

1681	TGCGGGACGC GCGAATTGAA TTATGGCCCA CACCAGTGGC GCGGCGACTT CCAGTTCAAC

1741	ATCAGCCGCT ACAGTCAACA GCAACTGATG GAAACCAGCC ATCGCCATCT GCTGCACGCG

1801	GAAGAAGGCA CATGGCTGAA TATCGACGGT TTCCATATGG GGATTGGTGG CGACGACTCC

1861	TGGAGCCCGT CAGTATCGGC GGAATTCCAG CTGAGCGCCG GTCGCTACCA TTACCAGTTG

1921	GTCTGGTGTC AAAAATAATA ATAACCGGGC AGGGGGGATC TGCATGGATC TTTGTGAAGG

1981	AACCTTACTT CTGTGGTGTG ACATAATTGG ACAAACTACC TACAGAGATT TAAAGCTCTA

2041	AGGTAAATAT AAAATTTTTA AGTGTATAAT GTGTTAAACT ACTGATTCTA ATTGTTTGTG

2101	TATTTTAGAT TCCAACCTAT GGAACTGATG AATGGGAGCA GTGGTGGAAT GCCTTTAATG

2161	AGGAAAACCT GTTTTGCTCA GAAGAAATGC CATCTAGTGA TGATGAGGCT ACTGCTGACT

2221	CTCAACATTC TACTCCTCCA AAAAAGAAGA GAAAGGTAGA AGACCCCAAG GACTTTCCTT

2281	CAGAATTGCT AAGTTTTTTG AGTCATGCTG TGTTTAGTAA TAGAACTCTT GCTTGCTTTG

2341	CTATTTACAC CACAAAGGAA AAAGCTGCAC TGCTATACAA GAAAATTATG GAAAAATATT

2401	CTGTAACCTT TATAAGTAGG CATAACAGTT ATAATCATAA CATACTGTTT TTTCTTACTC

2461	CACACAGGCA TAGAGTGTCT GCTATTAATA ACTATGCTCA AAAATTGTGT ACCTTTAGCT

2521	TTTTAATTTG TAAAGGGGTT AATAAGGAAT ATTTGATGTA TAGTGCCTTG ACTAGAGATC

2581	ATAATCAGCC ATACCACATT TGTAGAGGTT TTACTTGCTT TAAAAAACCT CCCACACCTC

2641	CCCCTGAACC TGAAACATAA AATGAATGCA ATTGTTGTTG TTAACTTGTT TATTGCAGCT

2701	TATAATGGTT ACAAATAAAG CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA

2761	CTGCATTCTA GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCCCC

2821	AGGAAGCTCC TCTGTGTCCT CATAAACCCT AACCTCCTCT ACTTGAGAGG ACATTCCAAT

2881	CATAGGCTGC CCATCCACCC TACTAGTATA TGAAAATATA AAGCGCTTTC TCTTTTAAGT

2941	CTAGGGTAGG TGTACTAGAT CAGCGCTCAG CTCCATACCA TGAAGCCATC CAGGAGTCAG

3001	ACCTCTCTGA CAGCCCTGCC ATTGTCACAG AGAAGTTTCT GTCACCAGTG CTCATGCTGT

3061	CAGAGGAGCG AAGGAGAAAA GATGTGAGAC CTCCCAAGTC AAAGTCATCT ATGGATAAAA

3121	CCTTAGTTGC ATGGCACACC AGTGTTAGGG AGTCGGGGAA ACACAGCCAT AGCCCAGCTT

3181	CCTCTCTGTT CTTGCTCTTA TTACCACCAG AAAGAGGTTG CTTAGACAAC CCAAACCAAG

3241	ACACAGGGCT CTGTGGGAGG GAATCAGTCC CAGGCTTCTG GCACATGCTA TGTCACCGGA

3301	AAGCCCCAGC CCTACTCCGA ATCCCCACAA GTACAGCAAA TATCAGATTA TAGCATTTAA

3361	AGGGGCACTC TTGCCAAAGA GAAGCACCAT TGGAATAGCC ATGCTTGAGA ACTAAGCTTG

3421	GCGTAATCAT GGTCATAGCT GTTTCCTGTG TGAAATTGTT ATCCGCTCAC AATTCCACAC

3481	AACATACGAG CCGGAAGCAT AAAGTGTAAA GCCTGGGGTG CCTAATGAGT GAGCTAACTC

3541	ACATTAATTG CGTTGCGCTC ACTGCCCGCT TTCCAGTCGG GAAACCTGTC GTGCCAGCTG

3601	CATTAATGAA TCGGCCAACG CGCGGGGAGA GGCGGTTTGC GTATTGGGCG CTCTTCCGCT

3661	TCCTCGCTCA CTGACTCGCT GCGCTCGGTC GTTCGGCTGC GGCGAGCGGT ATCAGCTCAC

3721	TCAAAGGCGG TAATACGGTT ATCCACAGAA TCAGGGGATA ACGCAGGAAA GAACATGTGA

3781	GCAAAAGGCC AGCAAAAGGC CAGGAACCGT AAAAAGGCCG CGTTGCTGGC GTTTTTCCAT

3841	AGGCTCCGCC CCCCTGACGA GCATCACAAA AATCGACGCT CAAGTCAGAG GTGGCGAAAC

3901	CCGACAGGAC TATAAAGATA CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT

3961	GTTCCGACCC TGCCGCTTAC CGGATACCTG TCCGCCTTTC TCCCTTCGGG AAGCGTGGCG

4021	CTTTCTCATA GCTCACGCTG TAGGTATCTC AGTTCGGTGT AGGTCGTTCG CTCCAAGCTG

4081	GGCTGTGTGC ACGAACCCCC CGTTCAGCCC GACCGCTGCG CCTTATCCGG TAACTATCGT

4141	CTTGAGTCCA ACCCGGTAAG ACACGACTTA TCGCCACTGG CAGCAGCCAC TGGTAACAGG

4201	ATTAGCAGAG CGAGGTATGT AGGCGGTGCT ACAGAGTTCT TGAAGTGGTG GCCTAACTAC

4261	GGCTACACTA GAAGAACAGT ATTTGGTATC TGCGCTCTGC TGAAGCCAGT TACCTTCGGA

4321	AAAAGAGTTG GTAGCTCTTG ATCCGGCAAA CAAACCACCG CTGGTAGCGG TGGTTTTTTT

4381	GTTTGCAAGC AGCAGATTAC GCGCAGAAAA AAAGGATCTC AAGAAGATCC TTTGATCTTT

4441	TCTACGGGGT CTGACGCTCA GTGGAACGAA AACTCACGTT AAGGGATTTT GGTCATGAGA

4501	TTATCAAAAA GGATCTTCAC CTAGATCCTT TTAAATTAAA AATGAAGTTT TAAATCAATC

4561	TAAAGTATAT ATGAGTAAAC TTGGTCTGAC AGTTACCAAT GCTTAATCAG TGAGGCACCT

4621	ATCTCAGCGA TCTGTCTATT TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA

4681	ACTACGATAC GGGAGGGCTT ACCATCTGGC CCCAGTGCTG CAATGATACC GCGAGACCCA

4741	CGCTCACCGG CTCCAGATTT ATCAGCAATA AACCAGCCAG CCGGAAGGGC CGAGCGCAGA

4801	AGTGGTCCTG CAACTTTATC CGCCTCCATC CAGTCTATTA ATTGTTGCCG GGAAGCTAGA

4861	GTAAGTAGTT CGCCAGTTAA TAGTTTGCGC AACGTTGTTG CCATTGCTAC AGGCATCGTG

4921	GTGTCACGCT CGTCGTTTGG TATGGCTTCA TTCAGCTCCG GTTCCCAACG ATCAAGGCGA

4981	GTTACATGAT CCCCCATGTT GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC TCCGATCGTT

5041	GTCAGAAGTA AGTTGGCCGC AGTGTTATCA CTCATGGTTA TGGCAGCACT GCATAATTCT

5101	CTTACTGTCA TGCCATCCGT AAGATGCTTT TCTGTGACTG GTGAGTACTC AACCAAGTCA

5161	TTCTGAGAAT AGTGTATGCG GCGACCGAGT TGCTCTTGCC CGGCGTCAAT ACGGGATAAT

5221	ACCGCGCCAC ATAGCAGAAC TTTAAAAGTG CTCATCATTG GAAAACGTTC TTCGGGGCGA

5281	AAACTCTCAA GGATCTTACC GCTGTTGAGA TCCAGTTCGA TGTAACCCAC TCGTGCACCC

5341	AACTGATCTT CAGCATCTTT TACTTTCACC AGCGTTTCTG GGTGAGCAAA AACAGGAAGG

5401	CAAAATGCCG CAAAAAAGGG AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC

5461	CTTTTTCAAT ATTATTGAAG CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT

5521	GAATGTATTT AGAAAAATAA ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA

5581	CCTGACGTCT AAGAAACCAT TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG

5641	AGGCCCTTTC GTC

SEQ ID NO: 26 (reverse PCR primer in the SV40 early promoter)

1	AGATGCATGC TTTGCATACT TCTGCCTGC

SEQ ID NO: 27 (donor plasmid for inserting GFP into FRT Insertion target sequence)

1	GACGGATCGG GAGATCTCCC GATCCCCTAT GGTGCACTCT CAGTACAATC TGCTCTGATG

61	CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG

121	CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC

181	TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT

241	GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA

301	TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC

361	CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC

421	ATTGACGTCA ATGGGTGGAG TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT

481	ATCATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT

541	ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA

601	TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG

661	ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC

721	AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG

781	GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA

841	CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC

901	GTTTAAACTT AAGCTTAGCC ACCaTGGTGA GCAAGGGCGA GGAGCTGTTC ACCGGGGTGG

961	TGCCCATCCT GGTCGAGCTG GACGGCGACG TAAACGGCCA CAAGTTCAGC GTGTCCGGCG

1021	AGGGCGAGGG CGATGCCACC TACGGCAAGC TGACCCTGAA GTTCATCTGC ACCACCGGCA

1081	AGCTGCCCGT GCCCTGGCCC ACCCTCGTGA CCACCCTGAC CTACGGAGTG CAGTGCTTCA

1141	GCCGCTACCC CGACCACATG AAGCAGCACG ACTTCTTCAA GTCCGCCATG CCCGAAGGCT

1201	ACGTCCAGGA GCGCACCATC TTCTTCAAGG ACGACGGCAA CTACAAGACC CGCGCCGAGG

1261	TGAAGTTCGA GGGCGACACC CTGGTGAACC GCATCGAGCT GAAGGGCATC GACTTCAAGG

1321	AGGACGGCAA CATCCTGGGG CACAAGCTGG AGTACAACTA CAACAGCCAC AACGTCTATA

1381	TCATGGCCGA CAAGCAGAAG AACGGCATCA AGGTGAACTT CAAGATCCGC CACAACATCG

1441	AGGACGGCAG CGTGCAGCTC GCCGACCACT ACCAGCAGAA CACCCCCATC GGCGACGGCC

1501	CCGTGCTGCT GCCCGACAAC CACTACCTGA GCACCCAGTC CGCCCTGAGC AAAGACCCCA

1561	ACGAGAAGCG CGATCACATG GTCCTGCTGG AGTTCGTGAC CGCCGCCGGG ATCACTCTCG

1621	GCATGGACGA GCTGTACAAG TAAGGATCCA CTAGTCCAGT GTGGTGGAAT TCTGCAGATA

1681	TCCAGCACAG TGGCGGCCGC TCGAGTCTAG AGGGCCCGTT TAAACCCGCT GATCAGCCTC

1741	GACTGTGCCT TCTAGTTGCC AGCCATCTGT TGTTTGCCCC TCCCCCGTGC CTTCCTTGAC

1801	CCTGGAAGGT GCCACTCCCA CTGTCCTTTC CTAATAAAAT GAGGAAATTG CATCGCATTG

1861	TCTGAGTAGG TGTCATTCTA TTCTGGGGGG TGGGGTGGGG CAGGACAGCA AGGGGGAGGA

1921	TTGGGAAGAC AATAGCAGGC ATGCTGGGGA TGCGGTGGGC TCTATGGCTT CTGAGGCGGA

1981	AAGAACCAGC TGGGGCTCTA GGGGGTATCC CCACGCGCCC TGTAGCGGCG CATTAAGCGC

2041	GGCGGGTGTG GTGGTTACGC GCAGCGTGAC CGCTACACTT GCCAGCGCCC TAGCGCCCGC

2101	TCCTTTCGCT TTCTTCCCTT CCTTTCTCGC CACGTTCGCC GGCTTTCCCC GTCAAGCTCT

2161	AAATCGGGGG CTCCCTTTAG GGTTCCGATT TAGTGCTTTA CGGCACCTCG ACCCCAAAAA

2221	ACTTGATTAG GGTGATGGTT CACGTACCTA GAAGTTCCTA TTCCGAAGTT CCTATTCTCT

2281	AGAAAGTATA GGAACTTCCT TGGCCAAAAA GCCTGAACTC ACCGCGACGT CTGTCGAGAA

2341	GTTTCTGATC GAAAAGTTCG ACAGCGTCTC CGACCTGATG CAGCTCTCGG AGGGCGAAGA

2401	ATCTCGTGCT TTCAGCTTCG ATGTAGGAGG GCGTGGATAT GTCCTGCGGG TAAATAGCTG

2461	CGCCGATGGT TTCTACAAAG ATCGTTATGT TTATCGGCAC TTTGCATCGG CCGCGCTCCC

2521	GATTCCGGAA GTGCTTGACA TTGGGGAATT CAGCGAGAGC CTGACCTATT GCATCTCCCG

2581	CCGTGCACAG GGTGTCACGT TGCAAGACCT GCCTGAAACC GAACTGCCCG CTGTTCTGCA

2641	GCCGGTCGCG GAGGCCATGG ATGCGATCGC TGCGGCCGAT CTTAGCCAGA CGAGCGGGTT

2701	CGGCCCATTC GGACCGCAAG GAATCGGTCA ATACACTACA TGGCGTGATT TCATATGCGC

2761	GATTGCTGAT CCCCATGTGT ATCACTGGCA AACTGTGATG GACGACACCG TCAGTGCGTC

2821	CGTCGCGCAG GCTCTCGATG AGCTGATGCT TTGGGCCGAG GACTGCCCCG AAGTCCGGCA

2881	CCTCGTGCAC GCGGATTTCG GCTCCAACAA TGTCCTGACG GACAATGGCC GCATAACAGC

2941	GGTCATTGAC TGGAGCGAGG CGATGTTCGG GGATTCCCAA TACGAGGTCG CCAACATCTT

3001	CTTCTGGAGG CCGTGGTTGG CTTGTATGGA GCAGCAGACG CGCTACTTCG AGCGGAGGCA

3061	TCCGGAGCTT GCAGGATCGC CGCGGCTCCG GGCGTATATG CTCCGCATTG GTCTTGACCA

3121	ACTCTATCAG AGCTTGGTTG ACGGCAATTT CGATGATGCA GCTTGGGCGC AGGGTCGATG

3181	CGACGCAATC GTCCGATCCG GAGCCGGGAC TGTCGGGCGT ACACAAATCG CCCGCAGAAG

3241	CGCGGCCGTC TGGACCGATG GCTGTGTAGA AGTACTCGCC GATAGTGGAA ACCGACGCCC

3301	CAGCACTCGT CCGAGGGCAA AGGAATAGCA CGTACTACGA GATTTCGATT CCACCGCCGC

3361	CTTCTATGAA AGGTTGGGCT TCGGAATCGT TTTCCGGGAC GCCGGCTGGA TGATCCTCCA

3421	GCGCGGGGAT CTCATGCTGG AGTTCTTCGC CCACCCCAAC TTGTTTATTG CAGCTTATAA

3481	TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT TTTCACTGCA

3541	TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGTA TACCGTCGAC

3601	CTCTAGCTAG AGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC

3661	GCTCACAATT CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA

3721	ATGAGTGAGC TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA

3781	CCTGTCGTGC CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT

3841	TGGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG

3901	AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC

3961	AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT

4021	GCTGGCGTTT TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG

4081	TCAGAGGTGG CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC

4141	CCTCGTGCGC TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC

4201	TTCGGGAAGC GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT

4261	CGTTCGCTCC AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT

4321	ATCCGGTAAC TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC

4381	AGCCACTGGT AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA

4441	GTGGTGGCCT AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA

4501	GCCAGTTACC TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG

4561	TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA

4621	AGATCCTTTG ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG

4681	GATTTTGGTC ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG

4741	AAGTTTTAAA TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT

4801	AATCAGTGAG GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTGCCTGACT

4861	CCCCGTCGTG TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT

4921	GATACCGCGA GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG

4981	AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG

5041	TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT

5101	TGCTACAGGC ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC

5161	CCAACGATCA AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT

5221	CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC

5281	AGCACTGCAT AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA

5341	GTACTCAACC AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC

5401	GTCAATACGG GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGGAAA

5461	ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA GTTCGATGTA

5521	ACCCACTCGT GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG

5581	AGCAAAAACA GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG

5641	AATACTCATA CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT

5701	GAGCGGATAC ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT

5761	TCCCCGAAAA GTGCCACCTG ACGTC

SEQ ID NO: 28 (reverse PCR primer in the hygromycin-resistance gene)

1	CAGAAACTTC TCGACAGACG TCGCGGTGAG

SEQ ID NO: 29 (CHOX-45/46 amino acid sequence)

1	MAPKKKRKVH MNTKYNKEFL LYLAGFVDGD GSICASIRPE QERKFKHRLV LRFEVTQKTQ

61	RRWFLDKLVD EIGVGYVYDS GSVSRYYLSQ IKPLHNFLTQ LQPFLKLKQK QANLVLKIIE

121	QLPSAKESPD KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLPGSVG GLSPSQASSA

181	ASSASSSPGS GISEALRAGA GSGTGYNKEF LLYLAGFVDG DGSIFATICP RQQYKFKHQL

241	RLRFEVDQKT QRRWFLDKLV DEIGVGYVYD LGSVSRYGLS EIKPLHNFLT QLQPFLKLKQ

301	KQANLVLKII EQLPSAKESP DKFLEVCTWV DQIAALNDSK TRKTTSETVR AVLDSLSEKK

361	KSSP

SEQ ID NO: 30 (CHOX-45/46 recognition site sequence)

1	CAGCACGTCT CACCCCACCC CT

SEQ ID NO: 31 (CHOX-45/46 forward screening primer)

1	GGAATCTGAC TGTGGTAAGC CTGTACAC

SEQ ID NO: 32 (CHOX-45/46 reverse screening primer)

1	CAGCACTCAG GAGGTAGAGG CAGG

SEQ ID NO: 33 (artificial splice acceptor)

1	TCTTACTGAC ATCCACTTTG CCTTTCTCTC CACAGG

SEQ ID NO: 34 (SV40 polyadenylation signal)

1	ACTTGTTTAT TGCAGCTTAT AATGGTTACA AATAAAGCAA TAGCATCACA AATTTCACAA

61	ATAAAGCATT TTTTTCACTG CATTCTAGTT GTGGTTTGTC CAAACTCATC AATGTATCTT

121	ATCATGTCTG

SEQ ID NO: 35 (BGH polyadenylation signal)

1	CTGTGCCTTC TAGTTGCCAG CCATCTGTTG TTTGCCCCTC CCCCGTGCCT TCCTTGACCC

61	TGGAAGGTGC CACTCCCACT GTCCTTTCCT AATAAAATGA GGAAATTGCA TCGCATTGTC

121	TGAGTAGGTG TCATTCTATT CTGGGGGGTG GGGTGGGGCA GGACAGCAAG GGGGAGGATT

181	GGGAAGACAA TAGCAGGCAT GCTGGGGATG CGGTGGGCTC TATGG

Claims

1-19. (canceled)

20. A method for inserting an exogenous sequence into an amplifiable locus of a mammalian cell comprising:

(a) providing a mammalian cell having an endogenous target site proximal to a selectable gene within the amplifiable locus, wherein the endogenous target site comprises:

(i) a recognition sequence for an engineered meganuclease;

(ii) a 5′ flanking region 5′ to the recognition sequence; and

(iii) a 3′ flanking region 3′ to the recognition sequence; and

(b) introducing a double-stranded break between the 5′ and 3′ flanking regions of the endogenous target site;

(i) a donor 5′ flanking region homologous to the 5′ flanking region of the endogenous target site;

(ii) an exogenous sequence; and

(iii) a donor 3′ flanking region homologous to the 3′ flanking region of the endogenous target site;

whereby the donor 5′ flanking region, the exogenous sequence and the donor 3′ flanking region are inserted between the 5′ and 3′ flanking regions of the endogenous target site by homologous recombination to provide a modified cell.

21. The method of claim 20, further comprising growing the modified cell in the presence of a compound that inhibits the function of the selectable gene to amplify the copy number of the selectable gene.

22. The method of claim 20, wherein the exogenous sequence comprises a gene of interest.

23. The method of claim 20, wherein the endogenous target site is downstream from the 3′ regulatory region of the selectable gene.

24. The method of claim 23, wherein the endogenous target site is 0 to 100,000 base pairs downstream from the 3′ regulatory region of the selectable gene.

25. The method of claim 20, wherein the endogenous target site is upstream from the 5′ regulatory region of the selectable gene.

26. The method of claim 25, wherein the endogenous target site is 0 to 100,000 base pairs upstream from the 5′ regulatory region of the selectable gene.

27. The method of claim 20, wherein the selectable gene is glutamine synthetase (GS) and the locus is methionine sulphoximine (MSX) amplifiable.

28. The method of claim 20, wherein the selectable gene is dihydrofolate reductase (DHFR) and the locus is Methotrexate (MTX) amplifiable.

29. The method of claim 20, wherein the selectable gene is selected from the group consisting of Dihydrofolate Reductase, Glutamine Synthetase, Hypoxanthine Phosphoribosyltransferase, Threonyl tRNA Synthetase, Na,K-ATPase, Asparagine Synthetase, Ornithine Decarboxylase, Inosine-5′-monophosphate dehydrogenase, Adenosine Deaminase, Thymidylate Synthetase, Aspartate Transcarbamylase, Metallothionein, Adenylate Deaminase (1,2), UMP-Synthetase and Ribonucleotide Reductase.

30. The method of claim 29, wherein the selectable gene is amplifiable by selection with a selection agent selected from the group consisting of Methotrexate (MTX), Methionine sulphoximine (MSX), Aminopterin, hypoxanthine, thymidine, Borrelidin, Ouabain, Albizziin, Beta-aspartyl hydroxamate, alpha-difluoromethylornithine (DFMO), Mycophenolic Acid, Adenosine, Alanosine, 2′ deoxycoformycin, Fluorouracil, N-Phosphonacetyl-L-Aspartate (PALA), Cadmium, Adenine, Azaserine, Coformycin, 6-azauridine, pyrazofuran, hydroxyurea, motexafin gadolinium, fludarabine, cladribine, gemcitabine, tezacitabine and triapine.

31-54. (canceled)

55. A recombinant meganuclease comprising a polypeptide having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 9.

56. The recombinant meganuclease of claim 55, having the sequence of the meganuclease of SEQ ID NO: 9.

57. A recombinant meganuclease which recognizes and cleaves a recognition site having at least 75%, 85%, 90%, 95%, 97%, 98% or 99% sequence identity to SEQ ID NO: 7.

58. The recombinant meganuclease of claim 57, wherein the meganuclease recognizes and cleaves a recognition site of SEQ ID NO: 7.

59-70. (canceled)

Resources