"Long before it's in the papers"
January 27, 2015


Most “junk” DNA not junk, studies find

Sept. 5, 2012
Courtesy of Nature
The University of Washington
and World Science staff

Far from be­ing junk, the vast ma­jor­ity of our DNA is ac­tive in at least one type of cell, ac­cord­ing to bi­ol­o­gists who made a vast set of new re­sults pub­lic Sept. 5.

The sci­en­tists, par­ti­ci­pat­ing in a proj­ect known as the En­cy­clo­pe­d of DNA El­e­ments (EN­CODE), pub­lished the work in a set of 30 re­search pa­pers in the jour­nals Na­ture, Sci­ence and Ge­nome Re­search.

The tra­di­tion­al def­i­ni­tion of a gene is a re­gion of ge­net­ic code that pro­vides the blue­print for pro­duc­tion a par­tic­u­lar mol­e­cule, or pro­tein, with­in the body. DNA that lies out­side those re­gions were, in the early 2000s, con­sid­ered “junk” DNA with no known func­tion. The view of large zones of DNA as use­less has been chang­ing in the past dec­ade, though, and the new find­ings imply this pic­ture of things may have to be all but aban­doned.

DNA form­erly called “junk” is in­volved in im­por­tant ac­ti­vi­ties called tran­scrip­tion fac­tor as­socia­t­ion, chro­ma­tin struc­ture and his­tone modifica­t­ion, ac­cord­ing to the bi­ol­o­gists. These func­tions ul­ti­mately in­volve in­flu­enc­ing the ac­ti­vity of tra­di­tion­al genes.

In an over­view pa­per in Na­ture, mem­bers of the proj­ect con­sor­ti­um de­clared that 80 per­cent of the hu­man ge­nome has at least one “bio­chem­i­cal ac­ti­vity” as­signed to it in at least one cell type. In ad­di­tion, 99 per­cent of the ge­nome lies rel­a­tively close to a place on the ge­nome where DNA where some such ac­ti­vity takes place, sug­gest­ing some lev­el of par­ticipa­t­ion.

“The first phase of the Hu­man Ge­nome Proj­ect pro­vid­ed the pri­ma­ry ge­nome se­quence, and a bas­ic cat­a­log of genes, which oc­cu­py only two per­cent of the ge­nome,” ex­plained John A. Stam­a­toy­an­no­pol­ous, a ge­net­icist at the Uni­vers­ity of Wash­ing­ton, who led sev­er­al ma­jor stud­ies as­sociated with the proj­ect.

“Ev­ery cell in the body has the same genes, but dif­fer­ent kinds of cells, such as liv­er or heart, switch on dif­fer­ent com­bina­t­ions of genes,” he went on. “When cells be­come un­healthy, these com­bina­t­ions change. Un­der­stand­ing how genes turn on and off is there­fore vi­tal to de­ci­pher­ing their role in both nor­mal health and dis­ease. The in­struc­tions for how genes are con­trolled are con­tained in small DNA ‘switch­es’ that are scat­tered around the 98 per­cent of the ge­nome that does not con­tain genes. 

“Map­ping and de­cod­ing these in­struc­tions is a cen­tral mis­sion of the EN­CODE proj­ect,” he added. “Data gen­er­at­ed in this proj­ect so far have al­ready shown, for ex­am­ple, that com­mon DNA varia­t­ions in the gene-con­trolling switches can af­fect the risk of de­vel­op­ing dif­fer­ent com­mon dis­eases. This find­ing, to­geth­er with the emerg­ing wealth of in­forma­t­ion about the bas­ic mech­a­nisms of gene con­trol, is open­ing new vis­tas on pre­vent­ing, di­ag­nos­ing, and treat­ing dis­ease.”

Re­search­ers lo­cat­ed mil­lions of DNA “switch­es” that dic­tate how, when, and where in the body dif­fer­ent genes turn on and off. These switches, or reg­u­la­tory DNA, con­tain small chains of DNA “words” that make up dock­ing sites for pro­tein mol­e­cules. These are called reg­u­la­tory pro­teins be­cause they are in­volved in con­trolling the ac­ti­vity of genes, in par­ti­cular whe­ther they are turned on or off.

Of­ten the switches are far from the genes they con­trol, Stam­a­toy­an­no­pol­ous said. And of the mil­lions of reg­u­la­tory DNA re­gions, only a small frac­tion, around 200,000, are ac­tive in any giv­en cell type. This frac­tion is al­most un­ique to each type of cell, a sort of mo­lec­u­lar ba­r code of its ident­ity. The reg­u­la­tory “pro­gram” of most genes is now thought to have more than a doz­en switches. 

To find the DNA “words” rec­og­nized by the reg­u­la­tory mol­e­cules, re­search­ers said they used a sim­ple, pow­er­ful trick to study all the pro­teins at once. In­stead of try­ing to see pro­teins di­rect­ly, they looked for their “foot­prints” on the DNA. They dis­cov­ered that over 90 per­cent of the pro­tein dock­ing sites were slight vari­ants of about 680 dif­fer­ent DNA words.

The ge­nome senses and re­sponds to sig­nals re­ceived from oth­er parts of the cell and from the en­vi­ron­ment by chang­ing the ac­ti­vity of reg­u­la­tory pro­teins, Stam­a­toy­an­no­pol­ous ex­plained. Sci­en­tists mapped all of the con­nec­tions be­tween reg­u­la­tory pro­tein genes to cre­ate a cen­tral wir­ing di­a­gram for the cell. Us­ing pow­er­ful com­put­ers, they cre­ated wir­ing di­a­grams of how 475 reg­u­la­tory pro­tein genes were con­nect­ed to each oth­er, and how those con­nec­tions changed across 41 dif­fer­ent types of hu­man cells. Even though in­di­vid­ual con­nec­tions be­tween reg­u­la­tory pro­teins dif­fered among cell types, the overall con­nec­tion was found to be nearly the same in all cell types. 

When com­pared to the best-studied bi­o­log­i­cal net­work — the map of all con­nec­tions be­tween nerve cells in the worm brain, cre­ated by No­bel Prize win­ner Syd­ney Bren­ner – the lay­out is al­most iden­ti­cal, Stam­a­toy­an­no­pol­ous said. So na­ture seems to have set­tled on an ide­al “brain-like” ar­chi­tec­ture to pro­cess com­plex bi­o­log­i­cal in­forma­t­ion; this plan can be found in the ge­no­mic wir­ing of eve­ry liv­ing cell.

Hun­dreds of stud­ies have at­tempted to map the genes caus­ing com­mon dis­eases and phys­i­cal traits. Frus­trat­ingly, most of these stud­ies have point­ed to re­gions of the ge­nome that don’t con­tain gene se­quences that make pro­tein. 

The sci­en­tists set out to chart a glob­al map of the rela­t­ion­ship be­tween dis­ease-as­sociated ge­net­ic changes and the gene-con­trolling switches scat­tered around the ge­nome. With sup­port from U.S. Na­t­ional In­sti­tutes of Health, re­search­ers col­lect­ed reg­u­la­tory DNA maps from 349 tis­sue sam­ples co­vering all ma­jor or­gan sys­tems in adults and stages of hu­man de­vel­op­ment. Us­ing pow­er­ful com­put­ers, they crossed these maps with da­ta from ge­net­ic stud­ies of over 400 com­mon dis­eases and clin­i­cal traits.

In­stead of iso­lat­ed in­stances, they found that most dis­ease-as­sociated ge­net­ic changes oc­curred with­in gene-regulating switches, of­ten lo­cat­ed far away from the genes they con­trol. Most changes af­fected cir­cuits ac­tive dur­ing early hu­man de­vel­op­ment, when body tis­sues are most vul­ner­a­ble. Ex­ten­sive blue­prints of con­trol cir­cuit­ry re­vealed pre­vi­ously hid­den con­nec­tions be­tween di­verse dis­eases, may ex­plain com­mon clin­i­cal fea­tures, and will open new av­enues for de­vel­op­ing di­ag­nos­tics and treat­ments, the re­search­ers said.

* * *

Send us a comment on this story, or send it to a friend


Sign up for

On Home Page         


  • St­ar found to have lit­tle plan­ets over twice as old as our own

  • “Kind­ness curricu­lum” may bo­ost suc­cess in pre­schoolers


  • Smart­er mice with a “hum­anized” gene?

  • Was black­mail essen­tial for marr­iage to evolve?

  • Plu­to has even cold­er “twin” of sim­ilar size, studies find

  • Could simple an­ger have taught people to coop­erate?


  • F­rog said to de­scribe its home through song

  • Even r­ats will lend a help­ing paw: study

  • D­rug may undo aging-assoc­iated brain changes in ani­mals

Far from being junk, the vast majority of our DNA is active in at least one type of cell, according to biologists who made a vast set of new results public Sept. 5. The scientists, participating in a project known as the Encyclopedia of DNA Elements (ENCODE), published the work in a 30 research papers in the journals Nature, Science and Genome Research. The traditional definition of a gene is a region of genetic code that provides the blueprint for production a particular molecule, or protein, within the body. DNA that lies outside those regions were, in the early 2000s, considered “junk” DNA with no known function. The view of large zones of DNA as useless has been changing in the past decade, though, and the new findings imply this picture of things may have to be abandoned. DNA formerly called “junk” is involved in important activities called transcription factor association, chromatin structure and histone modification, according to the biologists. These functions ultimately involve influencing the activity of traditional genes. In the overview paper in Nature, members of the project consortium declared that 80% of the human genome has at least one “biochemical activity” assigned to it in at least one cell type. In addition, 99% of the genome lies relatively close to a place on the genome where DNA where some such activity takes place, suggesting some level of participation. “The first phase of the Human Genome Project provided the primary genome sequence, and a basic catalog of genes, which occupy only two percent of the genome,” explained John A. Stamatoyannopoulos, a geneticist at the University of Washington, who led several major studies associated with the project. “Every cell in the body has the same genes, but different kinds of cells, such as liver or heart, switch on different combinations of genes,” he went on. “When cells become unhealthy, these combinations change. Understanding how genes turn on and off is therefore vital to deciphering their role in both normal health and disease. The instructions for how genes are controlled are contained in small DNA ‘switches’ that are scattered around the 98 percent of the genome that does not contain genes. “Mapping and decoding these instructions is a central mission of the ENCODE project,” he added. “Data generated in this project so far have already shown, for example, that common DNA variations in the gene-controlling switches can affect the risk of developing different common diseases. This finding, together with the emerging wealth of information about the basic mechanisms of gene control, is opening new vistas on preventing, diagnosing, and treating disease.” Researchers located millions of DNA “switches” that dictate how, when, and where in the body different genes turn on and off. These switches, or regulatory DNA, contain small chains of DNA “words” that make up docking sites for protein molecules. These are called regulatory proteins because they are involved in controlling the activity of genes, such as when they are turned on or off. Often the switches are far from the genes they control, Stamatoyannopoulos said. And of the millions of regulatory DNA regions, only a small fraction, around 200,000, are active in any given cell type. This fraction is almost unique to each type of cell, a sort of molecular bar code of its identity. The regulatory ‘program’ of most genes has more than a dozen switches. To find the DNA “words” recognized by the regulatory molecules, researchers employed a simple, powerful trick to study all the proteins at once. Instead of trying to see proteins directly, they looked for their “footprints” on the DNA. They discovered that over 90 percent of the protein docking sites were slight variants of about 680 different DNA words. The genome senses and responds to signals received from other parts of the cell and from the environment by changing the activity of regulatory proteins, Stamatoyannopoulos explained. Scientists mapped all of the connections between regulatory protein genes to create a central wiring diagram for the cell. Using powerful computers, they created wiring diagrams of how 475 regulatory protein genes were connected to each other, and how those connections changed across 41 different types of human cells. Even though individual connections between regulatory proteins differed among cell types, the overall connection was found to be nearly the same in all cell types. When compared to the best-studied biological network — the map of all connections between neurons in the worm brain, created by Nobel Prize winner Sydney Brenner – the layout is almost identical, Stamatoyannopoulos said. So nature seems to have settled on an ideal “brain-like” architecture to process complex biological information; this plan can be found in the genomic wiring of every living cell. Hundreds of studies have attempted to map the genes causing common diseases and physical traits. Frustratingly, most of these studies have pointed to regions of the genome that don’t contain gene sequences that make protein. The scientists set out to chart a global map of the relationship between disease-associated genetic changes and the gene-controlling switches scattered around the genome. With support from National Institutes of Health, researchers collected regulatory DNA maps from 349 tissue samples covering all major organ systems in adults and stages of human development. Using powerful computers, they crossed these maps with data from genetic studies of over 400 common diseases and clinical traits. Instead of isolated instances, they found that most disease-associated genetic changes occurred within gene-regulating switches, often located far away from the genes they control. Most changes affected circuits active during early human development, when body tissues are most vulnerable. Extensive blueprints of control circuitry revealed previously hidden connections between diverse diseases, may explain common clinical features, and will open new avenues for developing diagnostics and treatments, the researchers said.