Plasticity Sapien API Search Results & AlternativesView More Results
Related APIs in Category: Artificial Intelligence
Plasticity may refer to:
Related Plasticity Sapien API Videos
Thank you Tyra and good morning everyone.
It’s a pleasure to see you all here and
it’s fun to be here to talk about a topic
that’s near and dear to my heart.
So I’m going to cover a fair amount of ground.
I’m glad to answer questions, but I guessmost of them are usually at the end.
But if I’m not making myself clear and you
just can’t wait, go ahead.
So first of all my disclosures.
I have no relevant financial relationships.
I do -- I disclose that I’m a real enthusiast
about genetics so I hope that that comes through
It’s a wonderful field if you’re thinking
about going into it.
It underlies all of biology.
It is basically a hunting license to do whatever
you want in biomedicine so I urge you to think
about that career if you’re at that stage
of your career.
So I’m going to talk about some features
of Mendelian disease and then review the rapidly
evolving field of clinical DNA sequencing.
And then I’m going to talk about disease
gene discovery results and tools, and I’ll
focus particularly on the Baylor Hopkins Center
for Mendelian Genomics which is one of now
four centers around the country that are charged
with finding as -- the genes responsible for
as many Mendelian phenotypes as possible.
So we customarily think of Mendelian disease
as being quite rare and yet it is becoming
I see that this slide says this month.
This is actually I think from January 2015
so I apologize for that error.
But the point is in any month, if you look
at all four issues of the New England Journal,
you will see a lot about Mendelian disease.
In this particular month, there were -- let
me see here.
You can see that there were -- can you hear
You can see that, you know, there were typical
Mendelian disorders with onset in childhood,
but look at here.
Here’s one that’s adult coronary artery
disease and if you looked in the editorial
section, there was even an article about ethical
issues about screening for genetic -- monogenic
So there’s a lot of interest about Mendelian
disease throughout the biomedical community
right now and we have -- I’ll talk in a
minute about why that might be so.
I’m going to talk right now about why that
might be so.
So first of all the Genome Project obviously
provided a reference sequence so that made
finding the relevant disease genes much easier.
Obviously the availability of new sequencing
technology, that dramatically decreases costs
and increases throughput, also gave us many
new avenues for finding genes responsible
for Mendelian disease.
And things like the HapMap project and the
1000 Genome Project gave us an appreciation
of the extent of “normal” human genetic
variation, not only in North America and Northern
Europe, but from populations around the world.
So that turns out to be a tremendous resource.
And lastly, there has been the development
of genomic and genetic strategies to identify
responsible variants in genes.
So the first thing you might say is well when
should I think of a Mendelian disorder if
I’m a physician seeing a patient, or I’m
somewhere in the healthcare profession?
For many Mendelian disorders, not all but
for many, the phenotype includes multiple
systems that are not easily related one to
Many Mendelian disorders, but not all, have
relatively early age at onset, often in the
first decade of life.
There are of course -- the recessive ones
are of course increased with consanguineous
unions, and if you find multiple affected
sibs and/or generations then obviously that’s
a pretty key clue.
And if you think about it, there’s sort
of an inescapable rules of biology about how
genes are transmitted from one generation
to the next, and we use those so called Mendelian
rules to really help us evaluate candidates
for Mendelian disorders.
And it’s one sort of fundamental bedrock
of genetics that whatever you find pretty
much has to be put in this context.
So although we do think about Mendelian disorders
as having their onset in childhood, I would
submit that there are many Mendelian disorders
that present in adult age and that our colleagues
in internal medicine, and I confess I’m
a pediatrician, our colleagues in internal
medicine have to be more alert to the possibility
of Mendelian disorders.
So I just want to make that point by presenting
two families to you that we’ve seen in the
last few years.
So the first is a man who was 34 years old
and he presented to Johns Hopkins Hospital
actually about two and a half years ago.
And he had a fever, 10 day history of pretty
high fever, really bad pharyngitis, and he’d
been treated by his personal physician with
antibiotics -- actually two different antibiotics,
And so the physician, for reasons not known
to me, treated him with a large dose of steroids
Following that intervention, now 10 days into
his illness, the man began -- or now eight
days into his illness, the man began to develop
confusion and that led to him being taken
to a local emergency room where the doctors
were smart enough to think about hyperammonemia.
And they measured his ammonia and it was 10
times normal, 280 micromolar, and he had a
mild respiratory alkalosis which in the presence
of hyperammonemia suggests a urea cycle disorder
because there’s no accumulation of organic
acids and ammonia’s a stimulant for the
central respiratory centers.
So he was rapidly transferred to the Johns
Hopkins Hospital medical intensive care unit.
By the time he arrived two hours later, he
was in the early stages of coma and CT scan
showed mild cerebral edema, and his ammonia
had already risen to 420 micromolar.
For those of you that are not physicians,
he had about one foot and maybe three toes
in the grave at this point.
So he -- the emergency room docs or the MICU
docs did their thing.
One of the things they did is they called
genetics and so I happened to be the attending
and I went with one of our residents, Hans
Bjornsson, to see this man.
So we saw him about 20 minutes after he hit
the MICU, and so like any good geneticist,
one of the first questions we asked well what
is the family history?
We asked this of a fourth year medical student
who was involved in the case, and he said
what is I think the most common response to
that question which is negative.
So I would submit one important take home
lesson from this lecture is unless the person
is adopted and knows nothing about their family,
the family history is never negative.
You may have some pertinent negative results
that help you eliminate certain things, but
the family history always tells you information.
But when you get the family history, you have
to get it and think at the same time, and
sometimes as you think, you’ll come up with
So you have to be willing to go back and forth
with the family as new ideas, new hypotheses
for the diagnosis enter your mind.
So this is the information we got from the
So I said, “What do you mean negative?
Go out and ask the family for more detail.”
The family was assembling in the MICU waiting
So he went out and he came back and it turned
out the family was a little bit more extensive
and so there -- here’s the second version
of the family history.
So what you can see is that the proband indicated
by the red arrow had a brother who died and
he had two male twins -- identical twins or
no, fraternal twins who also died.
Now it turns out the twins died in childbirth
and almost certainly had something else.
The brother that the medical student -- I
said, “What do you mean negative?”
And the brother -- the medical student said,
“Well not to worry, the brother died of
drowning when he was 14 years old.”
So what did I say?
So I said, “Well why did a 14 year old boy
Go back out there and find out.”
So he went back out there and the story was
that the 14 year old brother was on an Outward
Bound like experience and he developed a upper
respiratory illness and was sick.
And then his campmates reported that he began
to be confused, and by confused they noted
a time when he couldn’t find his hiking
boots and they were right in front of him.
And that night, the night before he died,
they all went into their tents to go to bed.
They were camped by a lakeside and in the
morning when they woke up, they found him
floating off the end of the dock drowned.
So they theorized that he got up in the middle
of the night in his confused state and walked
out on the end of the deck and fell off and
And we actually got the autopsy because he
-- because it was an acute death, he had to
be autopsied out in West Virginia someplace.
And the local coroner said, “You know, the
strange thing about this drowning is I mean
the boy clearly drowned, but he had cerebral
And that’s a phenotype that you never see
with drowning because drowning takes place
I -- to cut to the chase, we got a baby tooth
of this boy and he also had the same urea
cycle disorder that his brother presented
So it turns out that this is late onset ornithine
transcarbamylase deficiency and the -- we’ve
studied this molecularly.
One of our graduate students and Ted Hahn,
and Ted found a promoter variant never before
seen in a four base highly conserved element
that’s important for binding of a particular
hepatic -- a liver specific transcription
So we theorized, and he -- Ted actually showed
that it reduces the activity of OTC and reporter
assays and so forth.
So we theorized that this is a promoter mutation,
regulatory mutation that reduced the function
of OTC that the -- both of these boys had
enough OTC activity to get through early years
of their life, but under conditions of severe
stress this genetic vulnerability was brought
out and in both cases led to their death.
Now of course the geneticists in the room
will say well it looks like the mother must
be a carrier.
She’s had two affected sons, and we tested
her and she was a carrier.
And then we tested her sister and she was
also a carrier.
And we wanted to test these two boys, each
of whom is at a one in two risk of having
Both of them are young adults.
Both of them are underachieving in comparison
to their family.
This is a quite sophisticated family and both
of them refused.
They live out in the Midwest and they both
refused to be tested.
So I don’t know what they have, but I’m
suspicious that they might have the same thing
just based on their sort of performance.
So here’s a Mendelian disease lurking in
an adult patient.
The patient is just more vulnerable to particularly
severe environmental stress, namely this bad
infection and a dose of steroids perhaps contributing
Now in case you think that that’s just a
one off example, a few months later Hilary
Vernon, one of my colleagues, was asked to
see this man who is a 54 year old man who
presented to the cardiology clinic with severe
And they noticed that he seemed to have a
-- some features of early onset dementia and
the theory -- going theory was that perhaps
because his congestive heart failure was so
bad that this may just be some low level chronic
CNS insult, but they were worried about his
B12 status and they sent homocysteine level
and the methylmalonic acid level, and both
of them were elevated.
And it turns out that this man has a cobalamin
C form of combined methylmalonic acidemia
He died of his -- shortly thereafter he died
of his cardiac -- his congestive heart failure,
but it turns out his sister also has this
She’s also middle aged.
She’s also intellectually not doing as well
as you might expect for the family.
So here’s another late onset Mendelian disorder.
They’re out there, just have to look for
them, think about them.
So that’s all I’m going to say about the
prominence of Mendelian disorders.
Now I want to talk briefly about finding the
responsible variants in genes.
So I think probably everybody in the audience
knows that geneticists -- human geneticists
since there have been human geneticists have
been interested in finding the genes and variants
responsible for Mendelian phenotypes.
So Archibald Garrod in 1902 reported patients
with alkaptonuria and noticed that the distribution
of affected individuals within families was
entirely consistent with what Gregor Mendel
had described in 1865, and that he hypothesized
at that time that maybe alkaptonuria disorder
in tyrosine degradation was in fact one of
these Mendelian disorders that this monk described
30 years earlier in pea plants.
And then the number of recognized human disorders
began to grow and the geneticists, once we
understood that the factors that were responsible
were actually encoded in the DNA, we began
to look for the genes and variants responsible.
And typically we used really tedious strategies,
linkage with collecting large families and
doing linkage analysis or searching for a
chromosomal aberration that pointed to a particular
region in the genome where we might find that
But things changed with the Genome Project
as I said and with the development of next
And I refer you if you’re interested in
this to two papers.
The one on top in particular is really a seminal
paper in this field.
So this was a paper from our colleagues at
the University of Washington, most notably
Mike Bamshad, Debbie Nickerson, and Jay Shendure.
And they were working on the development of
so called next generation sequencing and genomic
-- studying the human genome and they did
a simple experiment really, but it’s very
They said, you know, we’re able now to sequence
the genome and particularly the exome which
was about 1.5 percent of the total genome,
the exome being the coding sequences -- the
protein coding sequences, were able to sequence
that pretty well and we have this reference.
So what would be the chance that we -- if
we get a patient with a particular Mendelian
disorder, we could simply do a whole exome
sequence and recognize the variant or variants
that were responsible for the -- for the phenotype?
So that sounds like a straightforward hypothesis,
but the problem is when you do a whole exome
sequence, the problem is fundamentally that
each of us differ by about 3 million single
nucleotide variants from the reference genome.
So you have to find which of those three million
variants which single usually of the three
million variants is really responsible for
Now if you focus on the exome, you cut that
number way down, maybe 25,000 variants from
the reference sequence in someone’s exome,
but you’re still a long ways from figuring
out what the responsibility gene is.
So what they did and I’m not going to dwell
on it, but they took a set of patients with
a very well characterized Mendelian phenotype,
namely Freeman-Sheldon syndrome.
The disease gene was already known, MYH3,
and they said let’s sequence one patient
with Freeman-Sheldon syndrome and see if we
can find the variants, and they found actually
-- they looked only for severe loss of function
variants, indels, splice site changes, and
And sequencing that one patient, they had
several hundred variants that might be candidates
for this particular disease.
So then they said well okay, let’s get another
unrelated patient and we’ll do the same
thing and we’ll look for genes that are
affected in both of these two unrelated individuals.
The hypothesis being since we see that they
both have Freeman-Sheldon syndrome, they should
have a variant in the same mutation.
Notice they left out the problem of locus
heterogeneity which would have killed this
experiment, but they did very careful clinical
So they sequenced the second one and they
looked only for genes that had a loss of function
variant in both individuals.
And I forget the exact numbers, but they were
down to about 100 genes at that point.
Then they did.
Well that looks good.
Let’s do another one.
They did another one and they were down to
I think something like seven or eight genes,
and they did a fourth and they only -- there
was only one gene that had loss of function
variants in all four individuals, and that
was MYH3, the gene that they already knew
that was responsible for this phenotype.
So that said unambiguously that you could
use genomic technology based on next generation
sequencing and what we know about the reference
human genome to find the variants responsible
for human disease, and you don’t have to
do big timely linkage studies or anything
You just have to find some well characterized
patients and sequence those patients either
as singletons in their family or depending
on the inheritance modes, you might want to
take a few other people from the family and
use those Mendelian segregation rules to help
you sort through the variants as well as comparing
patient’s one family to the next.
So that said okay guys, this is a new age.
Let’s go get them.
We did a paper shortly thereafter which I
like to think contributed a little bit to
this effort, and that’s the reference below.
And for those of you that are students in
the room, I think this is a very illustrative
We were -- had a speaker at Hopkins, David
Goldstein, a great human geneticist and he
was having lunch with students as we often
-- as often happens.
And he said he was working on whole genome
sequencing in this case and he was looking
to see if he could solve an unsolved Mendelian
disorder using whole genome sequencing.
Now, you know, many of us, myself included,
if we were sitting around the lunch table
and we heard that, we would say great and
then we would forget it a couple hours later
and that would be the end of it.
Fortunately Nara Sobreira, the lead author
on this paper who was at that time a human
genetics graduate student, is quite persistent.
And two days later, she called up David Goldstein
and she said she had a family and she’d
send him the DNA, so she did.
The family was provided by Julie Hoover-Fong,
one of my clinical colleagues, and the family
had something called metachondromatosis and
David did the whole genome sequence in about
two and a half weeks actually.
Now if you do whole genome sequence as I said,
you’re going to find three million single
nucleotide variants compared to the reference
sequence, and you’ll find some structural
variants as well.
So he said wow, this is really a difficult
What can we do to help us?
And so the only reason I mentioned this paper
is because we then went back to genetics.
So this first paper is all genomics.
We used genetics.
We said okay, we actually -- this family was
not big enough to do convincing linkage analysis,
that is to find a region of the genome that
unambiguously harbored the responsible variant.
But recall that linkage is actually very powerful
at eliminating regions of the genome that
can’t possibly have the thing -- have the
So we did some quick snip nucleotide linkage
panels on a few other family members, very
cheap compared to whole genome sequencing,
and we looked -- we quickly found six regions
of the genome that could potentially harbor
the responsible gene.
So certainly we hadn’t narrowed it down
dramatically, but actually those six regions
only comprised two percent of the whole genome.
So we were eliminated 98 percent of the genome
using that simple genetic trick so I think
of this as combining genomics with classical
And sure enough, under the second linkage
peak that we looked at, there was the responsible
gene within an unambiguous loss of function
mutation and we were able to find another
family with the same phenotype that had a
nonsense mutation in the same gene, PTPN11
And that whole exercise took about six weeks
so at that time that was going pretty fast.
So genomics and particularly genomics combined
with genetics offers powerful reagents or
tools to get at these disorders -- these genes.
Okay, so that was a few years ago and with
that sort of stimulus and because of all the
other reasons that I’ve already enumerated,
one of the things that’s going on in the
last few years is what I call the rise of
clinical DNA sequencing.
So those of you that see patients know that
increasingly it’s possible to use molecular
diagnostic tools to make -- to search for
a precise molecular diagnosis in your patient.
So I just want to review that because I find
that people don’t really -- have not really
thought through all of the approaches and
what they mean.
So I organized sequencing -- clinical sequencing
So the first is a very focused search and
that’s a single disease gene, think BRCA1.
And so you have a patient who let’s say
has breast cancer, maybe a positive family
history, and you want to find out if that
patient has breast cancer because they have
a pathological variant in BRCA1.
So you look at that single gene, one of 20,000
Now a second strategy is what’s come to
be called the disease gene panel.
I mentioned the cardiomyopathy patient so
we know of on the order of 25 to 30 genes
that when -- that when certain variants occur
in those genes, the patient will present at
different age ranges with dilated cardiomyopathy.
So there’s a -- several panels that one
can send such patient’s DNA samples and
get tested for all of those 25 or 30 genes.
So it’s a collection of genes, each known
to be responsible for particular disease,
and you’re asking which of these genes if
any is responsible for my patient’s problem.
Then, whole exome sequencing, I’ve already
referred to this.
It’s sequencing the entire exome together
with a splice sites flanking each exome.
And we, by back of the envelope calculations
which I believe have withstood the test of
time, estimated early on that whole exome
sequencing -- that about 85 percent of Mendelian
variants would be found in the exome and in
the flanking intronic splice sites.
And I won’t go into how that comes, but
there’s actually fairly good evidence for
So this is a pretty -- you essentially only
have to sequence one and a half percent of
the genome, but you have a very high chance
of finding the genes that are responsible
for your patient’s problem.
And then there’s whole genome sequencing
that I also mentioned, the sequencing the
entire genome, exomes, entrons, regulatory
sequences, you know, 1.5 percent of the genome
If you look at what fraction of the genome
is highly conserved evolutionarily, it’s
about five to 10 percent, maybe seven percent.
That means that evolution seems to really
care about seven percent of the genome so
you’re still sequencing a lot of the genome
that perhaps is not really very important
when you do a whole genome sequence.
And obviously we’re much, much, much less
sophisticated in interpreting the results
of variants that we discover in the non-coding
part of the genome as compared to the coding
part of the genome.
So let me diverge briefly to just make sure
everyone’s clear on the difference between
clinical and research whole exome sequencing.
So research whole exome sequencing typically
you have a clinical diagnosis, but you don’t
know what gene is responsible and you want
to find that gene for this particular phenotype,
the gene that’s responsible for this particular
So you typically sequence multiple members
of a family, maybe two affected and one unaffected
or maybe the proband and the two parents,
depending on the inheritance model, and then
what samples are available.
Speed is typically not that critical so it
may be months going on here.
Surveys all 20 or 21,000 protein coding genes.
It requires validation once you find some
candidate genes and variants, and we do that
validation by segregation within the family
to the extent that we have family members
and depending on what we think the inheritance
pattern is, and functional studies of the
candidate genes to make sure that the variants
do what we think they do.
Then there’s clinical whole exome sequencing
and there are a number now of commercial companies
that provide this service.
So typically a physician sees a patient in
a clinic and doesn’t know the diagnosis
and says I’m -- rather than sort of spend
a lot of time working up this -- doing various
sort of classical workups, I’m just going
to send the clinical whole exome test and
see what this tells me.
And you typically send the patient, and for
some of the commercial operations you also
send the parents or some other family member,
and they may do the probands clinical whole
And then if they find something that looks
interesting, they may look at that particular
variant in other members of the family or
they may not.
And this is the key point: for most clinical
whole exome services, the genes they look
at are the known disease genes.
So in other words, right now, I’ll show
you later, there are about 3,500 known disease
genes out of the 20,000 in your genome.
So those, although they sequenced the whole
exome, they focus on those known disease genes.
So in that sense the efforts for the Mendelian
centers and other investigators doing research,
and finding and validating disease genes through
research whole exomes then provide the knowledge
for the commercial clinical services to offer
So I didn’t make a slide of it, but I was
looking at a company in Europe last night
on their website and it says on the website,
you open it up and it says clinical whole
So I read that and I think okay, they’re
surveying 20,000 genes, but then they say
we give you results on 2,800 genes.
So that’s -- they’re really only giving
you results on maybe 10 or 15 percent of the
genes in the genome.
Now eventually as the research progresses,
they’ll give a higher and higher fraction,
but that’s the relationship between research
whole exome sequencing and clinical whole
Now one -- two other things that you have
to be clear on when you order these kinds
There are some unanticipated or if you think
about it, they’re actually anticipated,
consequences of large scale DNA sequencing.
The first is our state of knowledge right
now, it’s imperfect so you will find -- you
absolutely if you cast a broad enough net,
you will absolutely find variants that you’re
not sure how to interpret, and they’ve come
to be called variants of unknown significance
And I’ll tell you how many you find in a
minute, and then you also may find incidental
findings of great medical consequence.
So you may, you know, be let’s say doing
a clinical whole exome on a child who has
a developmental defect or something like that
and so you want to find the gene that is responsible
for the developmental defect.
And you do that whole exome sequencing and
you find a well-known pathological variant
Now when the family gave permission to do
that test, they were not thinking about BRCA1
and that variant almost certainly had nothing
to do with why the exome was ordered, but
now you have a piece of information that may
be very relevant to that individual’s long-term
And that variant may also be in other members
of the family.
So you found out some information, not only
about your patient, but also about family
So the best way to deal with this possibility
is to discuss it with the family before you
send the test so that everybody is -- got
their eyes wide open about what you’re doing.
Medicine has always picked up incidental findings.
You know, you send -- you think the patient
is anemic, you send a CBC and you discover
that they have leukemia or something like
But what’s different about these findings
is that they may predict illness way down
the road that have absolutely nothing to do
with why you ordered the test.
And they also may provide information that’s
relevant to other family members who are not
even your patients.
Okay, so let’s look back at those four classes
of sequencing approaches.
So starting on the first row up here, single
gene testing -- BRCA1 already mentioned -- costs
several hundred dollars to a few thousand
-- actually a few thousand.
It’s less expensive, or it’s relatively
inexpensive if you’re correct.
That is maybe you spend $2,500, but you get
You have fewer variants of unknown significance
because you’re only looking at the variants
in a particular gene.
And very often those genes have been pretty
well studied, so you find relatively small
You’ll find occasional, but relatively small
numbers of variants of unknown significance,
and no incidental findings because you’re
only thinking about this particular gene.
The second category is some sort of disease
I mentioned cardiomyopathy -- maybe 25 -- depending
on when you did the test, the numbers going
Cost is quite similar, actually, several hundred
to a few thousand dollars.
It’s a broader net.
It’s less expensive on a per-gene basis.
And, but you will find more variants of unknown
You won’t find incidental findings because
you’re really just looking at the cardiomyopathy
genes; you’re not looking beyond that.
Now what about a Whole Exome Sequence?
A so-called clinical Whole Exome Sequence?
So currently you can get them for around $5,000.
It’s a much broader net, a bargain on the
per-gene basis, right?
But you will find, absolutely, many variants
of unknown significance.
So you’ll need to council the family about
those variants of unknown significance or
you will have to build in some approach that
you’ve agreed beforehand to set those aside.
And you’ll find incidental findings.
I think most groups now are reporting, if
you just consider these so-called 56 American
College of Medical Genetics genes where a
panel of experts decided that we -- that there
were reportable and actionable incidental
And you say, “How often do you find variants
in those 56 genes which seem to be significant?”
Most people who are doing a lot of Whole Exome
Sequencing are finding on the order of one
to three percent of the people they do Whole
Exome Sequencing on will have incidental findings;
and that small number, 56, very solidly known
And then a whole genome sequence.
Largely a research tool at this time, but
several companies are beginning to suggest
It’s a broader net, still.
It’s the broadest net we can currently cast;
although RNA-Seq will be coming down the pike.
And it’s much, much, much harder to interpret.
You will find variants of unknown significance
and incidental findings galore.
So one take home message is that if you’re
going to use this outside of the research
setting, you should -- we think -- build in
a good bit of genetic counseling time for
those subjects that have this to explain all
Now what is -- as I’ve indicated clinical
-- particularly clinical whole exome panels,
genes, and clinical whole exome is a growing
So what have been the outcomes?
So we’re beginning to see publications now
that are looking to see what has been the
consequence of this.
So the first publication, I think, of any
size was from Baylor College of Medicine that
very quickly opened a commercial lab associated
with their genetics group to provide clinical
Whole Exome Sequencing.
So they reported in this reference on the
first 2,000 samples they did; 88 percent were
in the pediatric age range.
They made a molecular diagnosis in 25 percent
of these patients.
So that’s a pretty good return on a diagnostic
test rate, right?
And, interestingly, 58 percent of a diagnostic
mutations had not previously been reported.
That is to say, they found a loss of function
allele in a gene that was known to cause a
phenotype when it had loss of function.
And so this is just a new loss of function
variant in this known disease gene.
The frequency of the various inheritance patterns
are shown there for the solved cases.
A key thing is that 30 percent of the diagnoses
involved a disease gene that was identified
in the last three years.
So this gets back to this -- the research
community, particularly, research whole exomes
pumping in new disease genes and those new
disease genes then can add to the list of
genes that the clinical west can interpret
So it’s really going up like a rocket right
And one interesting feature, which has been
found over and over again now, it that 23
of the patients where which they got an answer
or 4.6 percent actually had what they call
“the blended phenotype” from two different
So in medicine -- you know, it’s sort of
an Occam’s razor approach -- and you’re
trying always to find a diagnosis that will
explain everything about your patient.
So one of the reasons that these patients
were difficult to diagnose is because they
actually had two diseases -- two rare diseases
-- in one.
And the phenotype had features of both of
And so clinical geneticists were not able
to recognize what it was.
So very interesting.
Now GeneDx, another private laboratory service
here in Rockville, Maryland, does excellent
work, very shortly thereafter reported 3,040
consecutive probands nearly all in the pediatric
They made a molecular diagnosis in 851 or
28.8 percent, roughly the same as the Baylor
lab had found.
And, again, 28 of the patients or 3.3 percent
had two or three Mendelian disorders.
And this graph, which I won’t say much about,
but shows the test yield in terms of percentage
of positive results by the particular systems
that were involved.
So actually the highest system is hearing
loss, which is already known to have a huge
contribution of genetic causation to isolated
So those two studies were largely pediatric.
Baylor recently reported 486 consecutive adult
patients 18 or older, and they made a molecular
diagnosis a little bit younger; a little bit
less in this older group, 17.5 percent.
And they found six or seven percent with two
And this graph shows the diagnostic rate with
the age of the patient in years.
So the older the patient got, the less chance
they had of finding a straight-forward Mendelian
And this is a plot much like the GeneDx plot
and it shows the success rate by indication
and the overall diagnostic rate of 17.5 percent.
So even in adult population at least young
and middle aged adults suspected of having
a Mendelian disease, this turns out to be
a very high yield diagnostic service.
Now, for those of you that are not physicians
in the room, let me just emphasize some values
for having a precise diagnosis.
So physicians are trying to explain the phenotype
of the clinical problem of their patients
so they can have a continuous diagnostic work
up until they get the answer.
So this stops that diagnostic work up; it
It ends the uncertainty of the diagnostic
This is the term that’s been given to families
or patients that keep coming back to medical
attention and trying over and over again to
find out what in the world is their problem.
It turns out that if you have a child with
a problem, or you, yourself, have a problem,
there’s a -- for most affected individuals
-- there’s a strong urge to find out exactly
what you have.
And that you’re not -- when you go to your
doctor and say, “I’ve got this problem
and that problem,” you’re not crazy, you
actually have some problem.
And it provides a biological explanation for
So over and over again, those of you that
have been to a genetics clinic, if you talk
to parents who have a child with some genetic
disorder, the parents will say things like,
“Well, you know, I thought, actually, you
know, three months into this pregnancy I fell
on the ice.
I took a bad fall.
And I always thought that the reason that
my baby had this problem was because I fell
And you say, “No, actually, this is a straight-forward
The fact that you fell down, or that you had
a glass of wine, or you had a cold or something
like that, is irrelevant to this problem.”
It puts the focus on patient management.
And it focuses the patient management; now
you know what you’re dealing with.
And so you can draw from experience with other
people with that problem.
And it informs the family of the recurrence
In other words, you know, if it’s a recessive
disorder, they have a one in four or 25 percent
chance of having another.
And I certainly have been in the -- I’ve
had the unfortunate experience -- I remember
a case of Hurler’s syndrome which is a very
high burden lysosomal storage disease.
Patient was referred relatively late so the
patient was about 18-months-old and the family
came in with this 18-month-old boy that from
down the hall you could tell had Hurler’s
But they had a three-month-old child sitting
on the mother’s knee.
And I could tell that that three-month-old
child also probably had that disorder.
And they’ve now -- both of those kids have
But if the diagnosis had been made quickly,
and the family informed, then they would not
have had to go through six or eight years
of very high burden chronic illness with those
So that’s a big benefit.
So I’ll give you this one example.
This is a patient that I’ve been following
for 36 years.
He’s 39 right now.
In fact, I’m scheduled to see him two weeks
He had recurrent episodes of lactic acidosis
from early childhood.
He had diminished intellectual function for
his family with an I.Q. of 65, and cortical
atrophy on his CNS imaging studies.
He had mild to moderate cardiomyopathy, and
he had prominent dysfunction of his autonomic
nervous system, constipation, postural hypotension,
other such things as that.
And he would come in with these episodes of
recurrent lactic acidosis.
We would say over and over again, “This
is -- something is wrong with the function
of your mitochondria.
This is some sort of mitochondrial misfunction.
But we’re not sure what it is.”
Several years ago, we finally were able to
get money together to do -- to sequence his
And I told the mother that, “You know, if
his problem was as I suspected, mitochondrial,
it could either be in the mitochondrial genome
or the nuclear genome.
At least we could check out the mitochondrial
And it turned out to be normal.
So I had to go back to her and say, “Well,
the mitochondrial genome is normal, so I’m
thinking it’s probably a mitochondrial -- a
gene that encodes the mitochondrial protein
in the nuclear genome.”
At that point it was out of the question,
not only for that family, but just in general,
to sequence, let’s say, a whole exome.
But, eventually, about two years ago, we -- she
got financial resources and insurance to actually
pay for a clinical Whole Exome Sequence, and
he has a homozygous nonsense mutation to a
gene called FBXL4 -- never heard of it before
until the test was done.
But it is a previously described three or
four other patients mitochondrial DNA depletion
The encoded protein is necessary for proper
replication of mitochondrial DNA.
And so if you lack that protein, your mitochondria
don’t have as many mitochondrial genomes
as they should.
The end result is your mitochondria don’t
So I had the pleasure, actually, of telling
the mother that after 36 years, I finally
had a diagnosis.
The mother was incredibly relieved, actually,
to know exactly what this is.
I couldn’t -- I said, “You know, I don’t
-- I really don’t -- there’s nothing I
can do about this.”
So it’s not that it’s going to lead to
a better treatment.
Maybe down the road it will, but not right
But at least we know.
And the relief of just having the knowledge
of exactly what was the etiology of this boy’s
problem was palpable for this woman.
Okay, so then I want to turn to one other
publication about clinical Whole Exome Sequencing
which just came out.
It’s a perspective evaluation of Whole Exome
Sequencing as a first-tier molecular test
in infants with suspected monogenic disorders.
It’s from the Murdoch Institute in Australia.
That’s the first author in the reference.
And they did some sort of thoughtful modifications
of the sort of -- rather than the sort of
shotgun exome -- clinical whole exome diagnostic
So they considered using this test in 119
infants -- unrelated infants -- that met a
set of criteria.
They had a well-defined phenotype.
Some of them had a positive family history
and so forth.
Of those 119 families, 80 agreed to participate.
They did a single clinical Whole Exome Sequence.
That is, they didn’t do any other family
members and they examined in that clinical
exome 2,830 of the 20,000 genes.
And they excluded to get rid of the problem
with late-onset incidental findings.
They said, “We’re not going to look at
We’re not going to look at certain genes
that have those incidental findings.”
So they excluded 122 genes.
They didn’t analyze those genes.
Of the 80 infants that were sequenced, 46
or 57 percent yielded a molecular -- a precise
And of these -- of the 46 -- 32 percent had
a significant management change based on this
new diagnostic information.
So it turned out to be of quite important
medical significance to about a third of the
patients at this point followed for a few
months or a year.
And, additionally, 28 couples -- 28 of the
80 couples that participated -- received high
-- either 25 percent or 50 percent recurrence
rates -- so they could use that information
to avoid the scenario that I discussed earlier.
So it will be interesting to follow these
studies now and to ask, “What does this
mean for the sort of medical economic issues?”
Was this initial investment at a rather expensive
Does it not only improve the medical care,
but does it reduce medical cost as the families
I suspect strongly that it will.
But some medical economic experts need to
look at this in detail so that we can get
these data -- we desperately need those data.
Okay, so that’s all I’m going to say about
And its value and its aspects that need to
be managed carefully if you’re going to
use it in your clinic or with your patients.
But the growth of the ability to detect Mendelian
disease genes and the value of detecting them
led the Genome Institute to issue an RFA to
develop centers for Mendelian genomics that
would use the technologies that I talked to
you -- genomics and genetics -- to try to
find as many genes responsible for Mendelian
disorders as possible.
And in the initial four-year funding period
of three centers were funded; U-Dub at University
of Washington at Seattle, Debbie Nickerson
and Mike Bamshad, P.I.s Yale, with the P.I.
And we partnered with Baylor College of Medicine
to form what we call the Baylor-Hopkins Center
for Mendelian Genomics.
And I’m a P.I. along with Jim Lupski down
So it’s a real team effort.
And there’s our website: www.mendeliangenomics.org.
And you’ll see me refer to it as BHCMG,
Baylor-Hopkins Center for Mendelian Genomics.
We just started our second four-year funding
And I just was at a meeting Monday and Tuesday
of this week.
We were sort of tooling up again for the next
four year run at this.
So it’s interesting to say, “Well, what
is the current state of the art?”
So we keep track of how many Mendelian disease
genes have been identified by using the data
in Online Mendelian Inheritance in Man or
OMIM which was started by my colleague, now
deceased, Victor McKusick and currently managed
by my colleague Ada Hamosh at Hopkins.
And currently OMIM as of late last night,
lists about 7,500 Mendelian phenotypes.
It lists 3,543 disease genes; that’s about
18 percent of the total.
You’ll notice that the number of phenotypes
is greater than the number of disease genes,
that’s because, in part, that -- well, the
next column is explained phenotypes 5,722.
That number is bigger than 3,543.
And that’s because some disease genes cause
two different -- or sometimes more -- discreet
phenotypes that clinically we would have never
imagined were caused by mutations in the same
There are some genes, LMNA for example, that
account for 13 or 14 discreet clinical phenotypes.
So the average is about 1.8 phenotypes per
disease gene, right now.
And there’s still 1,800 unexplained phenotypes
in OMIM and you have to realize that there
are new phenotypes coming into to OMIM all
They come in at a rate of about 300 new phenotypes
So there’s lots of Mendelian disease out
there that we have not yet recognized as being
Mendelian disease or we have not given a name
to or an OMIM number yet.
So 18 percent of the total genome number of
genes in the genome, I’ve been tagged as
Mendelian disease genes so we have a long
way to go.
If -- depending on your view of how many genes
in the genome can cause a Mendelian phenotype.
So let me talk about that for a minute.
How many Mendelian disease genes are there
in our genome?
And how close is that 3,500 to saturation?
So, first of all, how would I define a Mendelian
And I would define it as, “Those genes in
which some fraction of variance in that gene,
produce highly penetrant phenotypes.”
That’s sort of “genetics-speak.”
“Penetrance” means that you manifest a
phenotype when you have the genetic variant.
And if I ask my colleagues, “How high does
the penetrance have to be to call something
a Mendelian disease?”
There’s no unanimity.
So I arbitrarily take the -- set the penetrance
level at .7.
So that means that if you have the variant
-- the genetic variant -- you’re chance
of getting the phenotype is 70 percent or
For example, the standard disease variants
in BRCA1, many of them have penetrances in
the range of 70 percent.
So that means you’re highly likely to get
the phenotype, but it also means that some
And the ones that don’t get the phenotype,
geneticists refer to as, “non-penetrant.”
We’ll talk more about that in a minute.
So with that sort of background, then you
say, “Well, one way I might be able to get
at how many Mendelian disease genes there
are, is to count the phenotypes.”
Well, it turns out it’s a lot harder to
count phenotypes that it is to count genes.
So, as I said, OMIM currently lists 7,500
with about 1.8 phenotypes per disease gene,
and 1,800 unexplained phenotypes.
So that predicts maybe 900 more disease genes;
a pretty small number, actually.
But we know that many phenotypes are conditional
and dependent on environmental variables.
So think of G6PD deficiency, people with G6PD
deficiency are typically entirely asymptomatic
unless they happen to chow down on a plate
of fava beans, in which case they will have
massive hemolysis and become jaundiced and
perhaps severely anemic.
So that’s -- we all think that G6PD is a
Mendelian disease, but if you avoid all of
the environmental triggers that cause the
hemolysis, you’ll never know that you have
that Mendelian phenotype.
There are many other phenotypes of this nature.
So the point is that to define all Mendelian
disease genes, and all variants that cause
-- in those genes that cause Mendelian disease,
you have to sort of challenge the population
with a variety of environmental triggers to
see what brings out the clinical phenotype.
Easy to do in a mouse, a little bit harder
to do in a person.
The other thing is that there are a vast number
of unrecognized phenotypes.
Remember, I said that 300 come in -- new phenotypes
come in to OMIM each year.
Obviously, they’re not new phenotypes; they’ve
been there all along.
We’re just recognizing them and getting
them into medical attention.
And they’re vast swaths of the populations
of Homo sapiens around the world that don’t
even sort of get access to this kind of service.
So I’ve recently visited the Middle East
and I saw my host showed me just one family
after another that had genetic things that
I had never seen before, but clearly based
on the Mendelian segregation in the family,
were clearly Mendelian.
So they’re just waiting to be explained.
So there’s sort of two schools of thought
about how many Mendelian disease genes are
in the genome.
Here’s one that says that the number of
genes in the genome that when they have a
certain variant could cause a highly-penetrant
phenotype is substantial, but limited.
So let’s say, arbitrarily here, I put at
Now there’s another school of thought that
says, “actually if you look carefully enough
across the entire population of Homo sapiens,
you’ll find that a large fraction -- 90
percent or more -- of genes can produce a
Mendelian phenotype when they have a particular
class of variants in that gene.”
And the answer to this question is not known,
I’m -- obviously I guess you would predict
I’m greatly in favor of the red curve.
I think if we look carefully enough and long
enough, we’ll find Mendelian phenotypes
for almost every gene in the genome.
Now let me just give you a couple reasons
why I think that’s true.
So the biggest one is this, it’s evolutionary
If those genes are not important for something,
evolution would get rid of them, right?
There’s a constant mutation rate.
All DNA segments in the genome accumulate
mutation, and if that mutation occurs in genes
with important function then selection eliminates
So the set of genes that we have right now,
have stood the test of time by evolutionary
And so it’s true that some of them may have
been more valuable in earlier socio-economic
cultural conditions of our species.
But, nevertheless, the vast majority of them
are there because evolution cares about them.
So, well, then you could ask, “Well, okay,
Valle, if you think the fraction of Mendelian
genes is so large, why are they so difficult
So the first answer that most people give
is, “Well, maybe a substantial fraction
of the genes in our genome are so important
that when there is a significant variation
in those genes, it leads to early-onset developmental
And so those fetuses are only known in terms
of spontaneous abortions.
And it is true, that there are a fraction
of our genes in the genome that are very,
very, highly conserved.
And by very, very, highly conserved I mean
the nucleotide sequence and the amino acid
sequence of the amgoda [spelled phonetically]
protein are highly, highly conserved and that
suggests that they are intolerant to variation.
And we know that our species has a high-frequency
of spontaneous first trimester spontaneous
A large fraction of those are chromosomal
abnormalities, but there are other spontaneous
first trimester abortions in which the karyotype
So why -- what’s going on there?
So how many of those might be Mendelian disorders
that affect some gene that is absolutely important
for early embryonic development.
And that remains to be seen.
We also know that 30 percent -- and this statistic
is often used -- 30 percent of the genes in
the mouse genome, when made homozygous for
a null allele a true knockout, lead to spontaneous
fetal losses, okay?
Either perinatal or earlier in embryo.
So that says, indeed a fraction of the mouse
genes -- 30 percent of the mouse genes are
absolutely necessary for normal development.
So you’re not going to see -- so then the
logic is, “Well, you won’t see those genes
causing medical problems in later life.”
So I would argue that’s not the case because
we know that every gene we’ve looked at
-- there’s a spectrum of mutational events
from those mutations that cause a complete
loss of function to those mutations that moderately
decrease function, to those mutations that
only mildly decrease function.
So somewhere in that spectrum of functional
consequence there will be some alleles for
these very genes that only reduce the function
of the protein product by some fraction and
that allows for successful -- leads to viable
in utero development.
And then will make itself known either in
infancy or later in life, depending on how
the biology of that gene and the severity
of that mutation.
So, you know, we used to say, for example,
that Rett syndrome’s only seen in females.
And it’s a X-linked gene, and that it must
be a developmental lethal for males.
But once the gene was cloned, we did find
a small number of males that survived embryonic
development and have mutations in MECP2.
So those are variants that are hypomorphs
that make it to an extra unit in life.
So I think that this mouse-knockout experience
and the human-knockout experience will show
us genes that are really critical, but it
doesn’t mean that they won’t present with
Mendelian disease depending on the allele.
Now another reason that they’re difficult
-- Mendelian disease genes are difficult to
identify, is that our phenotyping is incomplete
So the phenotyping in humans is largely a
standard medical exam and then if it’s a
research project we may do some other kind
of fancier testing.
But we basically that phenotyping is routine
phenotyping or maybe what I would call, “uninformed
You’re not thinking about a particular system
when you do the phenotyping.
And it’d be better to have directed phenotyping
that is where we’re thinking about particular
biological systems that might account for
this patient problem.
Or iterative -- even better or in addition
to -- iterative phenotyping where we go back
to the patient over and over again as we learn
more about their condition and look for more
Another problem with phenotyping is that we
have technological limitations.
We only measure certain things.
And they’re big, whole systems -- biological
systems of unequivocal importance that we
don't mention it, that we don't really measure.
So the one I like to think of is protein turnover
If you're in the clinic, you can't send a
ubiquitin level, you don't look at ubiquitinated
proteins, we don't really assess the protein
We know, now, from whole genome, or whole
exome sequencing, or other genomic approaches,
that mutations in those pathways do cause
For example, certain forms of Parkinsonism.
So, in contrast to serum sodium or liver enzymes,
we just don't measure that biological system
So if we're not measuring it, we're not going
to find those phenotypes.
Except until we come at it through the genomic
And then the thing I've already mentioned,
the conditional nature of some phenotypes,
a patient, a person can be apparently completely
normal, and then when exposed to particular
environmental stress, like the man we described
at the beginning of this talk, the phenotype
And the last explanation is that biological
gene products don't work in isolation.
The protein products of genes don't work in
They work in complex biological systems.
And those systems have evolved to have buffering,
that is, the ability to maintain homeostasis
when perturbed, either by environmental variables
or genetic variables.
And much of that buffering and robustness
comes -- or some of it, at least, comes from
redundancy of biological systems.
So you have two biological systems and they
do much the same thing.
Now, I would submit, if you look carefully,
most cases of redundancy are what I would
call incomplete redundancy.
They sort of cross cover one another, but
you can find conditions where only one of
the two systems really handles it, and other
conditions where the other system handles
So that means that there will be times when
you find phenotypes in there.
So, I already alluded to this mouse experience,
about 30 percent are lethals.
But all viable mice, nearly all viable mice,
do have phenotypic features.
And the point is that these mouse knockouts
are essentially almost all 100 percent null
We don't know much about other model systems.
There's a spectrum of phenotypic consequences,
depending on the allele and the genotype,
as well exemplified by mutations in a gene
Their homozygous loss of function, you get
a developmental lethal, basically.
If you're homozygous for only moderate loss
of functions, you may have a skeletal dysplasia.
But you live a full lifespan.
And, for heterozygotes, for certain alleles,
all you have is an abnormality in the morphology
of the nucleus of polymorphous nuclear leukocytes,
called Pelger-Huët Anomaly.
So a whole span of phenotypic severity, all
due to different mutations at that particular
So, if we want to find all the Mendelian genes,
we have to figure out ways of casting a wide
net, lots of people, lots of phenotyping,
and looking carefully and rigorously.
Another system, I'd just mention this, that
is undoubtedly important, but we don't phenotype
it at all, is the olfactory system.
We have about 1,000 olfactory receptor genes
in our genome.
We all know that some people have exquisitely
sensitive ability to smell, and other people
can't smell anything.
The men in the room probably have been told
by their wife, "Can't you smell that?"
And you say, "I can't smell that."
And so it turns out that the olfactory receptor
collection is highly polymorphic.
And if you study peoples' olfactory capabilities,
you find wide, wide, wide variations.
We just don't phenotype it.
We sort of think, well, it's really not that
But it actually has been shown to influence
mate selection, it influences certain things
that you do in your life.
And if you look at other species, other than
us, it's critically important.
For example, in mice, blind mice function
perfectly well, and they can't function if
they have no olfactory ability.
So they live in an olfactory world, rather
than a visual world.
And there are few Mendelian disorders of olfaction
that have been discovered, but largely it's
a whole swatch of the genome we don't pay
any attention to.
And I'll just emphasize this point about conditional
phenotypes with this one disorder I've already
mentioned, G6PD deficiency.
Here's a boy that presented with seizures,
hypoglycemia, and hyperammonemia, 36 hours
into an episode of viral gastroenteritis when
he was 18 months old.
He ultimately turned out to have something
called medium-chain acyl-CoA dehydrogenase
We actually screen for it now in the neonates.
He was born before the screening, in a state
where the screening program was not in place.
The point of importance here, though, is that
this is an inborn error in the beta-oxidation
And it only comes to medical attention when
you put stress on the beat-oxidation pathway.
So typically for children, that happens when
they get their first bout of viral gastroenteritis.
The baby doesn't feel good, doesn't eat well.
The parents put the baby to bed without eating
About 4:00 a.m. in the morning, now having
fasted for 14 hours, the longest the kid has
fasted in their entire life, they wake up
seizing and hyperammonemic, and in metabolic
Before the screening program, 25 percent of
the children with this disorder, that first
episode was fatal.
If you simply make the diagnosis and you avoid
In other words, you avoid stressing the beta-oxidation
system, these people do fine.
And he's not had any difficulty since the
diagnosis was made.
And he's now a young man with two children
of his own.
We, of course, did what?
We checked his, once we made the diagnosis,
we checked his siblings.
And it turns out his older brother also has
MCAD deficiency and had had one explained,
nearly fatal illness in childhood.
It was called anicteric hepatitis.
But he had, that was an episode of MCAD problem.
So you only see it when the environment is
stressed, it leads to stress on this system.
So we're going to learn a lot about this,
I think, from the undiagnosed disease network
project that really got started here by the
work of Bill Gahl, and in the mice knockout
And it shows the tremendous value of education.
If you --It's difficult to treat the genetic
disorder -- correct the genetic disorder,
but simple education of the patient, the family,
and the primary care physician, can really
make a difference between life and death in
And there are other disorders.
And then the buffering and the robustness
really goes back to this man we heard about
at the beginning of the lecture.
He was able, through the robustness of biological
systems and waste nitrogen excretion, turns
out the urea cycle has tremendous buffering
So that an 80 percent reduction in OTC functioning
probably leaves you, under most conditions,
to be just fine.
It's only when you have tremendous periods
of protein breakdown as he did stimulated
by his illness, that you overwhelm that system.
So, how many Mendelian disease genes?
My hypothesis is that if you look carefully
and across a large population, nearly all
the genes in our genome will have Mendelian
Not all of my colleagues agree with this,
so we'll see.
Now, let's come back to the Centers for Mendelian
Genomics, who are tasked with finding all
of these Mendelian disease genes.
So, the overall strategy is to find well phenotype
cases and families, perform whole exome sequencing
on relevant family members, use family relationships,
allele frequency data, functional predictions,
model organism results, functional studies
to identify the responsible genes and variants.
It's really a lot of fun.
Very interesting, very exciting when you get
And some things you solve right away and everyone
jumps for joy.
And then other things you plug away at for
years and we don't solve.
And in the case of Centers for Mendelian Genomics,
we have an online web tool that anybody in
the world can submit their cases.
And once we analyze the sequence and get an
answer, if we do get an answer, we give the
information back to the submitter and ask
them to write the paper.
So they get all that for free.
You can get that service at that website.
So if I'm looking for all the Mendelian disease
genes, I would liken it to this.
I think of the world, or the entire population
of Homo sapiens, as our, sort of, petri dish,
if you will.
And we're looking around the world to try
to find those families and those individuals
who have very rare disorders that represent
the phenotype of a particular genetic mutation.
So this is sort of the ultimate genotype-to-phenotype
And we're doing pretty well on that.
These are the Baylor-Hopkins CMG data.
As of April 2016, we've got 9,000 consented
samples and we've got samples from 29 countries
around the world.
There's still big swaths of the population
of Homo sapiens, namely India and China, that
we're really not getting much access into.
We've developed, as I said, a web-based tool
to make it easier for a health-care professional
anywhere around the world to submit a candidate
case or family or cohort.
These papers describe that tool.
It's called PhenoDB.
And you can look at PhenoDB, it's free, you
just go in, you register with your name and
your email and so forth.
And then if you have a family or cohort or
whatever, you can enter them in there, as
long as they're consented appropriately and
And then we have a committee that meets every
two weeks, looks at the submissions.
The PhenoDB takes all the clinical data in
a very ordered fashion, so we can quickly
review the cases and ask, "Is this a good
family to carry through to the sequencing
and analysis part of the effort?"
This is the home page of PhenoDB.
We have users from many, many different countries
around the world.
The Baylor-Hopkins incidence of PhenoDB, we
have data on more than 4,000 projects in there,
including 53 cohorts ranging from 5 to 295
subjects, phenotypic data on more than 10,000
We have whole exome sequence data on more
than 6,000 samples.
And there's an analysis tool about on how
to analyze the results of your whole exome
sequencing in the PhenoDB.
So it's very convenient because you can go
from the analysis back to the phenotype, back
And we're continually improving it.
And the -- we here is largely Nada Subraya
[spelled phonetically], the same woman that
took David Goldstein up on his offer to do
whole genome sequence.
And Ada Hamosh who's the director of OMIM.
And then a really accomplished program person
named Francois Shidikate [spelled phonetically].
This is when you're analyzing your data, this
is the starting page.
So you'll see here, in this case, you have
a pro ban and you're going to look at the
sequence of his/her parents.
So you start with three anavar files, you've
At this stage, you're putting down the inheritance
model that you want to analyze the data.
And you pick filters, the frequency of the
alleles that you expect.
You want to eliminate common variants that
are present in these databases.
You may want to fine-tune the size of indels
that you're looking for, and a variety of
other variables that you can dial in.
And then you just push the button and out
comes a list of candidates, change, and variants.
These anavar files are created as you upload
the VCF, so that's done automatically.
And three standard analyses, autosomal-dominant,
autosomal-homozygous -- autosomal-recessive
homozygous and compound heterozygous -- are
And it automatically creates a file for pathogenic
or likely pathogenic, incidental findings
in the ACMG 56 portable gene flavors.
So it automatically goes to ClinVar and asks,
"Has this variant been seen and how is it
And then comes back and gives us a dated time
for the issue of whether or not there are
And on the consent form, they check whether
they want incidental findings or not.
It utilizes the phenotypical info in OMIM
and the OMIM algorithm to suggest possible
diagnoses when the phenotypes are entered,
and to flag, once you get to candidate genes,
if those candidate genes have been connected
to one of the phenotypes that it suggested,
it will make that connection for you.
And there's an API that's sort of the back-end,
what people call the back-end, that transfers
the final results, g-names, genomic coordinates
and features, to GeneMatcher.
And I'll say a word about that in a minute.
It's completely searchable on phenotypic features
and genotype features.
And one additional tool, so the issue of,
how do you declare victory?
What's the evidence, the variant you have
found in a particular gene is responsible
for this phenotype?
And it turns out that one of the most potent
ways to do this is to find other patients,
or model organisms that have variation in
the same gene and a similar phenotype.
So Nada and Francois and Ada developed this,
another tool called GeneMatcher, also free
to anyone who wants to use it.
And there's the website.
And it's designed to connect investigators.
So anybody can go in and put in their favorite
And if someone else has entered that gene
into GeneMatcher, then both of you will get
And then it's up to you what you want to do
with that connection.
All the data is D-identified, so IRB is not
It's automated in a continuous matching.
One you put it in there, it's in there until
you take it out.
And so, if someone matches you six months
later, you'll just get an email that says,
"You got a match."
And it'll give you the contact information.
And you can choose to collaborate or not.
And also we've added -- although initially
it was just matching on genes, you can click
a box and decide to match on phenotypic features
That was put in place in October 2015.
And it's connected to this matchmaker exchange
program, I'll describe in a second.
But this is the page in GeneMatcher for your
By GeneMatch, which is required by DiseaseMatch,
which you can ignore or you can say, I want
Or the location in the genome, that's optional.
Phenotypic match, that's optional.
Here's the data from a couple days ago.
We had 4,247 genes in GeneMatcher.
And there have been more than 2,000 matches.
Now, we don't know which genes are being matched.
And we don't know who matches.
So I can't tell you how many of these matches,
you know, turned out to lead to productive
I do know that in Baylor-Hopkins, we certainly
have solved a huge number of cases by this
We have a strong candidate but we only have
And we find other cases with similar phenotype
and similar kinds of variants in the same
So currently, more than 1,500 people are using
this and from 51 countries.
Now, GeneMatcher, this is the matchmaker exchange
diagram of all these groups interested in
rare phenotypes around the world.
So we built an API that connects GeneMatcher
to Decipher, and connects GeneMatcher to PhenomeCentral.
PhenomeCentral is the care-for-rare program
in Canada, which are rare diseases in Canada.
Decipher does rare diseases in the U.K. and
throughout Europe and the world.
And so, if you click a button in GeneMatcher,
you will not only look in GeneMatcher, but
you'll look at PhenomeCentral and Decipher.
So you get that added bang for your buck if
you just click it there.
The rest of these, some of these others are
planning to come in, they just haven't made
it happen yet.
And, as of April, through GeneMatcher, we've
made 81 matches into PhenomeCentral, and 74
matches into Decipher.
So those pipelines are working well.
Now, I just thought you might be -- I'm near
the end here -- but what's the Baylor-Hopkins
summary data at four years?
We've had 9,000 and change consented samples,
we've studies 776 phenotypes, 56 percent of
those were judged to be novel.
We've done 6,769 exomes, we've found a total
of 468 disease genes.
Of those, 222 were novel disease genes, that
is, they had not previously been connected
And 246 were known disease genes.
Now we try, in our evaluation of candidate
samples, to not do things that look like they
have a disease that's already well explained.
But you have to realize that for many of these
Mendelian phenotypes, there are only two or
three people in the literature, and so the
breadth of the phenotype has not been really
So the clinicians may look and say, well this
doesn't look like the same thing.
Once we find that it's the same gene, then
we often see the overlap and realize it's
what we call 'phenotypic expansion,' and we're
just fleshing out the full breadth of the
So, of these known disease genes, 55 percent
of them, the patients that we studied, had
additional phenotypic features that were not
described in the entity so far.
And it's led to 124 publications currently.
Now, finding disease genes.
Some immediate consequences?
It connects the gene to a phenotype, something
geneticists have been interested in since
there have been geneticists, connects the
phenotype to a biological system, and it tells
you something about how that system works,
both under normal circumstances and under
So it's quite powerful.
It unravels locus heterogeneity, which turns
out to be extensive.
It enables precise diagnosis and counseling,
all the stuff we talked about earlier.
It's the first step in the path towards informed
At least you know precisely what the problem
is, so now you can begin to use rational approaches
to try to find a way around it.
And it's a tremendous research stimulus, from
bench to bedside.
So it's very powerful.
Now, some long term consequences?
Suppose -- and this gets to our long-term
goals -- suppose we had phenotypes for more
than 50 percent of the genes in our genome?
Remember I said right now, it's about 17 or
What questions could we ask?
I sort of think of this as a classic forest
and trees analogy, right?
So right now we're getting very good at finding
a particular tree in the forest.
And we're going all around it, seeing how
many branches it has and how tall it is, and
all its individual variations.
And it's very exciting, every tree is interesting.
But what I'd like to see us do, as we get
farther along this, is to be able to stand
back and say, "Each of these trees is in a
forest, that is, in a human being.
What are the principles we are learning?
Not from this individual genetic disorder,
but from looking at large numbers of well-explained
Can we see new principles about how disease
What genes are important, what variants are
allowed/tolerated and so forth?"
So that's just, sort of, a long-term goal
that's a broader goal.
And actually, we've been interested in this
for some time.
There's a paper we published in collaboration
with László Barabási nearly 10 years ago.
And we just simply tried to look for interactions
between all the genes were known to be responsible
for Mendelian disorders, looking for interactions,
for patterns, and so forth.
Makes a nice diagram but didn't really get
us too far along.
And we wanted to know, are there unappreciated
principles of disease?
And if so, what are they?
And what do they mean for how we think about
At the time, we did this study, we had 1,700
So now we're a little over two times that
And I think studies such as this will be redone
over and over again, until we begin to really
get a sense of how it all works.
Here's an example of the kinds of questions
you would like to answer.
So, just about networks.
We talked about biological systems and networks
as buffering disease.
You could ask, "Are all networks equally vulnerable
And if not, what are the rules?"
We don't really have any rules for this question,
as far as I know.
Or we could ask, "Are all components of a
system equally vulnerable?
Or if not, what are the rules that make some
components more vulnerable than others?"
Can we predict the consequences of variation
in one component on the behavior of the biological
In other words, if there are 30 proteins in
a biological system, what happens if this
particular one is reduced in function by 50
Does the system still work pretty well?
Or is it completely crippled by that change?
So, here's two systems.
One is the ras map kinase pathway, here is
the peroxisome biogenesis pathway.
Each of these systems involves about 30 genes.
So they're roughly, in terms of that parameter,
the same size.
This pathway has more than 15 discrete phenotypes.
This pathway really has one to two phenotypes.
No gene predominates in this pathway.
In other words, every gene in the pathway
has been pegged with mutations causing particular
In this pathway, actually, about 65 percent
of the patients with defects in this pathway
have it in one gene, a gene called PEX1.
So those are two biological systems of roughly
the same genetic size.
But yet, they are quite different in terms
of the size of the mutational target.
Or the target that yields a phenotype.
And the kinds of phenotypes that they produce.
And we don't really know enough about it now
to look at any given biological system and
be able to make meaningful predictions of
We should be able to and we will be able to,
I would predict, as we go forward with this
So, I have no idea where I am here, time wise.
So, we're done?
There's just some quick samples here.
So I want to just say one word about this.
I hope you get the sense of the incredible
power of Mendelian disease, as predicting
biological things that we just didn't notice.
So this is a disorder, spondylometaphyseal
dysplasia, and the patients have two features.
They have a cone rod dystrophy.
That is a severe visual impairment.
And they're short-statured with skeletal dysplasia,
looks sort of like achondroplasia.
Now I'm sure all of you sitting in the audience
will immediately say, "Well, it's obvious
what would cause that phenotype."
Connect those two different biological systems?
I mean, when I looked at this, I said, "You
know, I don't have a clue what could possibly,
what gene would possibly bring these two systems
It's a rare autosomal recessive trait, there's
the macular degeneration.
The gene turns out to be a gene called PCYT1A.
Never heard about it until we did the study.
But it encodes an enzyme called phosphocholine
Which is the enzyme that is the rate-limiting
step in the major pathway for phosphatidylcholine
That's a major component of plasma membranes.
Some cells, it makes up about 50 to 60 percent
of the structural lipid in the plasma membrane.
So it's clearly an important molecule.
So, I still don't know the answer to why those
two systems are affected, but I do know that
there are cells in both those systems that
have a tremendous demand on membrane biogenesis.
The photoreceptors in the retina make a lot
of membrane every day.
And actually, the osteoblasts make a lot of
They enlarge from there, sort of, resting
size by about 30X.
That means they need a 10X increment in membrane.
And so both of those cell types have a huge
demand on membrane biogenesis.
So maybe that's why they're the ones that
show the phenotype.
There an alternative pathway, but apparently
that alternative pathway is not adequate,
at least not for these two cell types with
a big demand.
So those are things we didn't think about
until this, sort of, predictive thing came
Here's another disorder.
This is in press right now, TELO2.
It's worked on by a student in my lab, Jing
And it tags a complex called the TTD complex,
never heard of it before.
But it's involved in interacting with HSP
90 and the R2TP complex.
And does maturation of six enzymes that are
very important in the central metabolism of
In fact, those so-called PIK genes have already
been tagged with Mendelian disorders.
So again, we're getting more information about
a central biological pathway that's going
to be really important, in terms of understanding
brain function and other functions.
The Baylor group – the Baylor part of Baylor-Hopkins
has published this paper.
They looked at 128 consanguineous families
and found roughly -- I can't remember the
exact numbers, but -- it's about 48 known
disease genes and some of these in a subset
of these families.
And another 40 or so high-level candidates
in the others.
They put this all together and they asked,
"When are these genes expressed?"
Some are expressed in the early embryonic
life, some are expressed in fetal life, and
So you're, sort of, again, beginning to put
these biological systems together.
And these biological systems are very important
for the normal morphology and functioning
of the brain.
So we're beginning to move from an individual
tree in the forest, to stand back and beginning
to understand the size and the shape and the
behavior of the forest itself.
At least in this case, in the development
of the brain.
The last example, also from Jim, is looking
at peripheral neuropathies.
Charcot-Marie-Tooth neuropathies that are
now, we now know of about 65 genes, tremendous
And that monogenic disorders in each of those
65 genes can cause a Charcot-Marie-Tooth-like
It turns out, however, that there is a sort
of, a genetic burden principle.
So we think of these as monogenic disorders,
but what Jim did is score the genotype of
all 65 genes in each of these patients.
So it shows, in red is the distribution of
loss of functional alleles in these patients
versus the normal population.
It looked at two different populations, these
data hold up.
And what they say is that you have your genotype
at the disease gene locus.
But that is also affected, and that causes
the Mendelian disease, but it's also affected
by the genotype at all these other 65 genes.
And if you have additional variants at some
of the other genes, that will make this phenotype
more severe or less severe.
So you get the sense of genetic burden, the
architecture of genetic disease, from these
So I'll finish with unexpected and emerging
ideas coming out of this project so far.
The first is, it shouldn't have been unexpected,
but the extent and distribution of genetic
variation is just really enormous.
We still haven't enumerated all of it.
We're finding tremendous locus heterogeneity.
We also knew about locus heterogeneity for
certain phenotypes, RP and hearing loss, stuff
But we're finding it everywhere we look.
There are many examples of phenotypic expansion.
That is, we're fleshing out or expanding our
understanding of the phenotype for particular
And this turns out to be very powerful.
We always, medicine, over and over again,
forgets that you describe something in three
patients, and then we start thinking that
the phenotype, the aggregate phenotype of
those three patients is that disease.
And if we describe another 100 patients, we'll
find out that it's actually quite different
from the phenotype of just three.
We're finding an expectedly large role for
copy number variants and de novo mutations
in a lot of Mendelian disease, relatively
high frequency of two diseases occurring in
the same, difficult-to-diagnose individual.
And we're learning, as I showed you on the
last slide, a lot about the genetic architecture
and genetic burden for disease.
If you want to read about it, the project
was described in sort of the first three and
a half years in this paper, published in late
And thanks for your attention.
This is the Baylor-Hopkins team.
And I’m glad to answer questions.
We're at the end of the time, so if you want
to just -- you can come up and see me, I'm
glad to talk.
[end of transcript]