Review
Copy
Number
Variation
in
Domestication
Zoe N. Lye1 and Michael D. Purugganan1,2,*
Domesticated  plants  have  long  served  as  excellent  models  for  studying  evo-
lution.  Many  genes  and  mutations  underlying  important  domestication  traits
have  been  identified,  and  most  causal  mutations  appear  to  be  SNPs.  Copy
number  variation  (CNV)  is  an  important  source  of  genetic  variation  that  has  been
largely  neglected  in  studies  of  domestication.  Ongoing  work  demonstrates  the
importance  of  CNVs  as  a  source  of  genetic  variation  during  domestication,  and
during  the  diversification  of  domesticated  taxa.  Here,  we  review  how  CNVs
contribute  to  evolutionary  processes  underlying  domestication,  and  review
examples  of  domestication  traits  caused  by  CNVs.  We  draw  from  examples
in  plant  species,  but  also  highlight  cases  in  animal  systems  that  could  illuminate
the  roles  of  CNVs  in  the  domestication  process.
Domestication  is  a  Coevolutionary  Process
Domestication  is  an  evolutionary  process  that  arises  from  coevolutionary  interactions  where
one  species  controls  the  reproduction  and  dispersal  of  another  species  for  the  benefit  of  the
former.  Human-associated  domestication  as  an  evolutionary  process  began  in  the  Paleolithic
and  continued  into  the  Neolithic,  with  the  shift  of  hunter-gatherers  to  pastoralists  and  farmers
(cid:1)12  000  years  ago,  leading  to  the  evolution  of  hundreds  of  crop  plant  species  [1].
beginning
Moreover,  domestication  also  occurred  in  animals,  and  there  are  dozens  of  known  domesti-
cated
is  now  generally  thought  that  domestication  was  a
protracted  process  that  unfolded  over  thousands  of  years  [3,4]  and,  it  was  during  this  period,
that  genetic  changes  led  to  adaptation  to  agricultural  environments  and  differentiation  from  wild
ancestors.
livestock and pet species [2]. It
(ii) diversification and/or
The  early  evolution  of  domesticated  species  occurs  in  two  distinct  phases:  (i)  initial  domesti-
cation,  where  control  over  reproduction  and  dispersal  is  established,  resulting  in  the  origin  of
the  new  domesticated  species;  and
improvement,  where  the
domesticated  species  develops  local  or  population-specific  adaptations  to  different  environ-
its  center  of  origin  [3–5].  Many  of  the
ments  or  cultural  preferences  as
adaptive  traits  arising  during  this  process  may  have  evolved  under  the  process  termed
‘unconscious  selection’,  which  acts  similar  to  natural  selection  because
incipient  domes-
ticates  adapt  to
in  human-associated  environments  [1,2].  Nevertheless,  many  key
traits,  particularly  those  associated  with  diversification,  may  have  evolved  under  more
intense  selection.
it spreads from
living
Most  studies  on  the  evolutionary  genetics  of  domestication  have  used  SNPs  to  examine
population  relationships  and  to  identify  causal  genetic  variants  often  through  genetic  mapping
and  genome-wide  association  studies.  The  role  of  CNVs
in  the  evolution  of  domesticated
species  is  not  as  well  appreciated.  In  recent  years,  as  whole-genome  sequencing  methods
have  allowed  the  genome-wide  characterization  of  CNVs,  they  have  become  the  subject  of
increased  interest,  broadening  our  understanding  of  the  genetic  basis  of  evolution.  Here,  we
review  the  role  of  CNVs  in  domestication,  focusing  primarily  on  plant  species,  but  also  providing
352
Trends  in  Plant  Science,  April  2019,  Vol.  24,  No.  4
©  2019  Elsevier  Ltd.  All  rights  reserved.
https://doi.org/10.1016/j.tplants.2019.01.003
Highlights
Whole-genome
resequencing,  pan-
genomics,  and  developing  computa-
tional  methods  have  allowed  charac-
terization  of  CNVs  in  diverse  species.
Loss-of-function  CNVs  can  cause
some  of
the  critical  domestication
traits
in  plants,  whereas  other  CNVs
are  associated  with  postdomestication
diversification  traits,  such  as  environ-
mental  adaptation,  disease  resistance,
fruit  size,  and  cultural  preferences.
An  exhaustive  table  of  characterized
CNVs  associated  with  domestication
phenotypes  in  a  plant  and  animal  sys-
tems  is  included.
1Center  for  Genomics  and  Systems
Biology,  12  Waverly  Place,  New  York
University,  New  York,  NY  10003,  USA
2Center  for  Genomics  and  Systems
Biology,  New  York  University  Abu
Dhabi,  Saadiyat  Island,  Abu  Dhabi,
United  Arab  Emirates
*Correspondence:
mp132@nyu.edu  (M.D.  Purugganan).
field
Glossary
Amplification:  the  same  sequence
of  DNA  is  duplicated  multiple  times,
typically  in  tandem.
Chimeric  gene:  a  gene  comprising
coding  sequences  derived  from  two
or  more  other  genes.
Experimental  evolution:  the  use  of
laboratory  or  controlled
experiments  to  investigate  the
processes  of  evolution.  Typically,
organisms  with  short  generations
times  are  used  to  simulate  processes
that  would  take  longer  in  larger
organisms.
Fixation:  increase  in  frequency  of  a
genetic  variant,  eventually  resulting  in
all  members  of  a  population  sharing
the  same  variant  at  a  locus.
Fixation  index  (FST):  a  measure  of
genetic  differentiation  between  two
populations.
Microhomology:  identical  short
DNA  sequences,  1–4  bp  in  length.
Pan-genome:  the  entire  gene  set
contained  within  a  species,  taking
into  account  PAV  between
individuals  in  a  species.  Not  all
individuals  carry  all  of  the  genes  in
the  pan-genome.
Photoperiod:  day  length;  many
plants  use  day  length  as  a  signal  to
enter  various  stages  of  the  life  cycle.
Purifying  selection:  selection
against  disadvantageous  alleles.
Tandem  array:  cluster  of  genes
created  by  repeated  duplications.
flowering
Vernalization:  induction  of
by  prolonged  exposure  to  cold  (i.e.,
winter).
examples  from  domesticated  animal  species  that  could  point  to  contrasting  patterns  between
these  two  groups.
Copy  Number  Variation
CNVs  are  polymorphisms  within  species  in  which  sections  of  a  genome  differ  in  copy  number
between  individuals,  and  include  deletions,  duplications,  or  amplifications  (see  Glossary)  of
DNA  sequence.  Originally,  CNVs  were  only  thought  of  in  terms  of  copy  changes  in  functional
genetic  features.  Today,  many  researchers  adopt  a  more  expansive  definition  in  light  of  the
ability  to  discover  gains  and  losses  of  genomic  material  in  an  unbiased  genome-wide  manner
(Box  1).  This  can  the  include  transposable  elements  and  noncoding  sequences.  The  definition
‘segmental
of  a  CNV  continues  to  be  somewhat  arbitrary  and  is  often  conflated  with  the  terms
duplication’
is  typically  1  kb,
  or
although  many  studies
(bps)
[6,7].
interest  to
Nevertheless,  CNVs  that
researchers.
include  smaller  variants  of  as  few  as  50  base  pairs
include  functional  sequences  continue  to  be  of  most 
‘structural variant’. The defined minimum
length of a CNV
Box 1. CNV Detection Methods
For  reviews  on  major  methods  for  CNV  detection  applied  to  domesticated  species,  see  [7].  For  reviews  of  CNV  detection
from  next-generation  sequencing  data,  see  [103,104].
fluorescence  signals  of  a  test  and
Array  comparative  genome  hybridization  (aCGH)  is  based  on  the  comparison  of
reference  sample  hybridized  to  a  microarray  of  tiled  probes  covering  an  entire  genome.  The  use  of  smaller  probes
increases  the  specificity  of  CNV  detection  in  this  method;  however,  aCGH  (is  more  accurate  in  detecting  deletions  than
duplications  [7,105].  SNP  microarrays  are  also  applied  to  CNV  detection  by  comparing  probe
intensities  across
samples.  They  are  also  able  to  distinguish  CNV  alleles  because  they  can  use  allele-specific  probes  [7].
Next-generation  sequencing  (NGS)-based  methods  fall  into  three  major  categories:  read-depth  (RD),  read-pair  (RP),
and  split-read  (SR)  methods  [7,103].  RD  methods  detect  CNVs  by  comparing  normalized  read  depth  from  short-read
sequence  data  aligned  to  a  reference  genome.  Low  or  zero  RD
is
interpreted  as  an  increase  in  copy  number.  RP  methods  are  based  on  the  idea  that  read  pairs  should  map  to  a  reference
separated  by  approximately  the  same  distances  as  the  insert  size.  If  read-pairs  map  farther  away  from  each  other  than
expected,  a  deletion  is  detected;  if  they  are  too  close  together,  an  insertion  is  detected.  SR  methods  use  paired-end
reads  and  detect  CNVs  by  aberrant  mapping  to  a  reference  genome.  For  example,  when  only  half  of  a  read-pair  maps  to
a  genome,  a  CNV  breakpoint  is  identified.  Whole-exome  sequencing  data  are  also  applied  to  CNV  discovery  using  a  RD
identification  of  CNV
approach  to
breakpoints  and  infer  structure  of  CNVs  [7].
local realignments are also used to refine the
identify CNVs [104]. Additionally,
interpreted as a deletion and
increased RD
is
Each  method  comes  with  a  different  set  of  biases.  RP  methods  are  less  effective  in  repetitive  regions  and  their  accuracy
is  dependent  on  the  size  of  the  insert  [103,104].  SR  methods  are  biased  to  detect  smaller  CNVs  [103].  RD  methods
typically  have  higher  false  positive  rates  and  are  biased  towards  detecting  large  variants  [7].  The  effectiveness  of  these
methods  is  also  dependent  on  sample  read  depth.  Due  to  these  shortcomings,  CNV  studies  using  NGS  data  typically
combine  multiple  computational  approaches  to  minimize  false  positives  [7].
field  of  CNV  discovery.  Most,  if  not  all,  CNV  discovery
There  does  not  appear  to  be  clear  methodological  standards  in  the
methods  were  developed  for  use  in  humans,  and  can  be  benchmarked  against  gold  standard  sets  of  known  human
variants.  In  domesticated  species,  gold  standard  CNV  sets  do  not  exist  to  evaluate  the  efficacy  of  different  meth-
odologies.  Rather,  researchers  rely  on  simulations  to  benchmark  methods  or  simply  take  existing  methods  at  face  value.
There  are  more  than  50  published  methods  for  detecting  CNVs  from  NGS  data.  Selecting  an  appropriate  method  for  a
given  data  set  and  species  is  a  challenge  to  anyone  designing  a  CNV  study.  As  multiple  high-quality  reference  genomes
are  created  for  domesticated  species  and  third-generation  long-read  sequencing  becomes  available,  we  expect  to  see
an  increase  in  CNV  studies  and  the  development  of  more  novel  methodologies.  Long  read  sequences  have  already
been  used  to  resolve  CNV
is  critical  that  new
methodologies  developed  are  accurately  compared  to  existing  methods  to  ensure  that  research  is  comparable  across
platforms.
in tandem repeats where traditional methods are
limited [106]. It
Trends in Plant Science, April 2019, Vol. 24, No. 4 353
(A)
NAHR
(B)
SSA
(C)
Transposon excision
Retrogene
(D)
mRNA
DNA
Retrotransposase
Figure  1.  Mechanisms  of  Copy  Number  Variation  (CNV)  Formation.  (A)  Nonallelic  homologous  recombination
(DSB)  repair,  a  direct  repeat,
(NAHR;  unequal  crossing  over):  during  a  recombination-based  double-strand  break
represented  in  green,  is  used  as  homology  and  incorrectly  pairs  during  crossing  over,  this  causes  a  reciprocal  deletion
flanked  by
and  duplication  of  sequence  between  the  repeats  (purple).  In  this  scenario,  the  resulting  CNV  break  point  is
tracts  of  homologous  sequence.  (B)  Single-strand  annealing  (SSA).  During  double-strand  break  repair,  the  50 stands  are
resected  to  expose  complimentary  sequences  either  side  of  the  break  (green).  Although  this
is  similar  to  the  micro-
>30  base  pairs  (bp).  This
homology-mediated  end  joining  repair  pathway,  SSA  requires  longer  tracts  of  homology,  typically
flank
can  result  in  significant  deletions  of  intervening  sequence  (purple).  (D)  Transposon  excision.  Transposons  (pink  ovals)
a  unique  sequence  (purple).  Both  transposons  excise  simultaneously,  removing  the  unique  sequence  with  them,  and  can
result  in  a  deletion.  (D)  Retro-gene  formation.  Retrotransposon  activity  causes  insertion  of  a  coding  sequence  into  the
genome  (gene  is  shown  in  green  with  white  boxes  representing  introns).  mRNA  (red)  from  the  gene  is  reverse  transcribed  to
DNA.  This  DNA  can  be  occasionally  inserted  into  the  genome  and  become  a  retrogene,  a  copy  of  the  original  gene  lacking
introns  (green  box).  These  genes  can  be  inserted  into  another  gene,  creating  a  chimeric  gene,  or  become  under  control  of
different  promoter  sequences  and  take  on  a  new  expression  regime.
CNVs  are  formed  through  a  variety  of  genetic  mechanisms  (reviewed  in  [6]).  A  key  mechanism  is
nonallelic  homologous  recombination  (NAHR)  or  unequal  crossing  over,  which  results  from
aberrant  homology  recognition  during  homology-based  DNA  repair  or  meiosis  [6]  (Figure  1A).
CNVs  formed  by  this  mechanism  are  characterized  by  tracts  of  homology  on  either  side  of  the
CNV.  NAHR  is  common  in  repetitive  regions  and  an  important  source  of  tandem  duplications
and  deletions  [8].  Another  mechanism
is  a  double-
>30  bp
strand  break  repair  process  where  broken  ends  are  joined  by  annealing  at  homologies
in  length,  which  can  result  in  significant  deletions  [6]  (Figure  1B).
is single-strand annealing (SSA), which
Transposable  elements  are  also  a  source  of  CNVs;  they  can  result  in  copy  number  change  by
capturing  DNA  segments  during  excision  and  moving  or  deleting  DNA  segments  [6]  (Figure  1C).
Retrotransposon  activity  can  also  create  CNVs  through  retrogenes;  these  are  DNA  insertions
into  the  genome  resulting  from  reverse-transcribed  mRNA  that  might  take  on  a  new  function  or
form  a  chimeric  gene  [9]  (Figure  1D).  For  example,  the  sun  locus  in  tomato  is  a  retrotrans-
poson-mediated  gene  duplication  that  places  the  SUN  gene  under  a  different  regulatory
element,  altering  fruit  development  to  result  in  an  oval  fruit  [10].
CNVs  can  also  arise  following  polyploidization,  when  a  genome  doubles  all  genes  are  dupli-
cated;  subsequent  deletions  in  either  of  the  subgenomes  lead  to  a  change  in  copy  number.
Fractionalization  of  the  maize  genome  has  contributed  to  high  intraspecific  variation  in  copy
number  and  presence–absence  variation  (PAV)  [11,12].
Other  proposed  mechanisms  include  microhomology-mediated  break-induced  replication
(MMBIR),  whereby,  during  meiosis,  a  replication  fork  stalls  and  the  lagging  strand  anneals  to  a
different  replication  in  the  vicinity  as  a  result  of  microhomology,  which  can  lead  to  complex
354
Trends in Plant Science, April 2019, Vol. 24, No. 4
 rearrangements  and  duplications  [13].  Undoubtedly,  there  are  other  stochastic  and  poorly
understood  causes  of  CNV  formation,  especially  ones  that  involve  larger  CNVs  (for  a  compre-
hensive  review  of  these  mechanisms,  see  [6]).
CNVs  Are  Generally  Deleterious  .  .  .
CNVs  have  become  the  subject  of  increased  interest,  broadening  our  understanding  of  the
genetic  basis  of  evolution.  CNVs  are  thought  to  be  generally  deleterious  and  subject  to
purifying  selection  and,  thus,  affect  coding  sequences
less  frequently  than  noncoding
sequences  [14–16].  Deletion  CNVs  can  lead  to  loss  of  function  (LOF),  whereas  duplication
CNVs  affecting  entire  protein-coding  genes  can  be  deleterious  if  they  affect  dosage-sensi-
tive  genes  [14,17].  Simulation  of  the  effects  of  genic  CNVs  in  regulatory  networks  demon-
strated  that  increases  in  gene  copy  number  by  one  or  two  copies  can  have  large  effects  on
overall  expression  patterns  due  to  regulatory  feedbacks  [18].  CNVs  have  been  identified  as
in  altering  gene
expression  quantitative  trait
expression  [19,20].
loci (eQTLs), further demonstrating their role
fitness. Thus, most genic CNVs are predicted to occur
Given  the  generally  deleterious  effects  of  CNVs,  it  would  be  expected  that  CNVs  that  do  affect
gene  expression  should  be  restricted  to  functional  classes  that  can  tolerate  expression
changes  without  costs  to
lowly
expressed  genes  at  the  periphery  of  gene  regulatory  and  gene  interaction  networks,  where
change  in  copy  number  is  less  impactful.  Dopman  and  Hartl  measured  this  in  Drosophila  and
found  significantly
lower  representation  of  deletion  CNV  genes  among  genes  with  known
interactions  [21].  They  also  measured  the  ratio  between  nonsynonymous  (Ka)  and  synonymous
site  (Ks)  mutations  in  open  reading  frames  of  CNV  genes  and  found  that  CNV  genes  had  a
higher  ratio  than  did  non-CNV  genes,  suggesting  that  CNV  genes  are  under  relaxed  selective
constraint  [21].  Keel  et  al.  extended  the  investigation  of  CNVs  in  interaction  networks  to  cows  by
quantifying  the  number  of  interactions  of  CNV  genes  in  a  protein–protein  interaction  network
[22].  They  demonstrated  that  CNV  genes  were  likely  to  have  fewer  network  connections  than
were  non-CNV  genes,  supporting  the  prediction  that  CNV  genes  are  functionally  constrained
and  tend  to  occur  at  the  periphery  of  interaction  networks.
in
CNVs  May  Have  a  Role  in  Rapid  Adaptation  under  Strong  Selective
Pressure
While  CNVs  are  generally  deleterious,  they  also  appear  to  be  a  key  mechanism  that  can
enable  adaptation  during  a  period  of  strong  selection.  This  phenomenon
in
experimental  evolution  of  microbes  under  nutrient  limitation,  where  spontaneous  duplica-
tion  of  nutrient  transporters  repeatably  occurs,  conferring  adaptation  to  the  nutrient-limited
environment.  In  yeast  under  glucose-limited  conditions,  for  example,  amplifications  of  the
HXT6  and  HXT7  genes,  encoding  high-affinity  glucose  transporters,  were  observed;  under
limitation,  there  was  amplification  of  SUL1,  which  encodes  a  high-affinity  sulfate
sulfate
transporter  [23].
is observed
This  effect  has  also  been  observed  in  multicellular  systems.  An  experimental  evolution  study  of
Arabidopsis  thaliana  grown  under  stress  conditions  of  high  heat  and  salicylic  acid  showed
increased  CNV  formation  [24].  Similar  results  were  found  in  an  experimental  evolution  study  in
Caenorhabditis  elegans  that  selected  for  recovered  fecundity  following  inbreeding  and  muta-
gen  application,  where  an  increase  in  the  frequency  of  copy  number  change  was  observed
during  the  adaptive  recovery  stage  [25].  Interestingly,  CNVs
in  replicate  populations  were
identified  at  the  same  genome  regions  but  had  different  breakpoints,  suggesting  recurrent
adaptation  [25].
Trends in Plant Science, April 2019, Vol. 24, No. 4 355
 in  response  to  strong  selective
In  the  natural  environment,  adaptive  CNVs  are  also  found
pressures.  For  example,  amplifications  of  P540  genes  have  conferred  insecticide  resistance  in
[26–28].  Given  that  many  species
aphids  and  multiple  disease  vector  mosquito  species
undergo  a  period  of  strong  selective  pressure  during  domestication,  typically  in  the  postdo-
mestication  diversification  stage,  CNVs  could  have  been  an
important  source  of  genetic
diversity  underlying  adaptation  during  domestication.
in detection methodologies
CNVs  Are  Widespread  in  Domesticated  Species
The  results  of  experimental  evolution  experiments  suggest  that  CNVs  contribute  to  the  rapid
adaptation  associated  with  domestication  and  during  population  expansion  of  the  domesti-
cated  species.  Advances
in  Box  1),  reduced
sequencing  costs,  and  proliferation  of  sequencing  data  have  expanded  CNV  studies,  and
CNVs  have  been  described  in  most  major  crop  plant  species,  including  rice,  maize,  potato,
soybean,  barley,  cucumber,  melon,  apple,  and  grapevine  (Table  1,  Key  Table)  [29–37].  Not
surprisingly,  they  have  also  been  examined  in  domesticated  animal  species,  such  as  silkworm,
sheep,  goat,  pig,  chicken,  cow,  horse,  and  dog  (Table  1)  [22,38–43].  These  studies  demon-
strate  that  CNVs  are  a  pervasive  source  of  genetic  variation
in  domesticated  taxa,  and
examples  from  both  plants  and  animals  serve  to  highlight  both  common  and  contrasting
features  of  CNVs  in  both  groups.
(summarized
Early  studies  of  CNVs  in  domesticated  species  used  few  samples,  although  they  nevertheless
provided  key  insights.  In  rice  (Oryza  sativa),  for  example,  whole-genome  comparisons  of  two
cultivars  found  641  CNVs  ranging  in  size  from  1.1  kb  to  180.7  kb  [44].  An  analysis  of  two  inbred
(cid:1)400  genomic  regions  exhibiting  duplications  and  pervasive
lines  in  maize  (Zea  mays)  found
PAV  affecting  more  than  700  genes  [45].
increasing sample sizes resulted
in  more  thorough  catalogs  of  CNVs  and  other
However,
structural  variants.  Later  analysis  of  11  maize  and  14  wild  relative  teosinte
individuals,  for
example,  found  3889  CNVs,  most  of  which  were  segregating  in  both  maize  and  teosinte  [46].  In
the  case  of  rice,  a  recent  study  of  3010  rice  varieties  identified  thousands  of  deletions  and
hundreds  of  duplications  affecting  between  100  bp  and  1  Mb  [47].  Indeed,  there  are  often
significant  inconsistencies  in  the  results  of  CNV  analyses  from  different  studies  within  the  same
species  due  to  differences  in  sample  sizes,  breeds  used,  and  methodologies  of  CNV  detection
(Box  1)  [22,48,49].  This  was  highlighted  in  a  recent  analysis,  albeit  in  a  domesticated  animal
species,  which  characterized  CNVs  in  European  cattle  (Bos  taurus)  populations  and  compared
the  results  to  18  previous  studies  [39].  Prior  studies  had  identified  between  27  and  3438  CNVRs
and,  of  those  data  sets,  6–46%  of  CNVRs  overlapped  with  CNVRs  discovered  in  the  present
study  [39].  Altogether,  this  analysis
indicated  that  CNVs  may  affect  as  much  as  63  Mb  of
the  genome,  and  that  cattle  have  a  higher  level  of  CNV  diversity  than  a  single  study  would
predict  [39].
The  role  that  CNVs  have  in  the  evolution  of  domesticated  taxa  is  becoming  clear.  Over  past
three  decades,  considerable  effort  has  been  applied  to  identify  the  causal  mutations  and  genes
associated  with  domestication  and  diversification  traits  [1,3].  We  compiled  an  exhaustive  list  of
genes  from  the  literature  and  found  39  examples  where  CNVs  appear  to  have  a  role  in  trait
evolution  in  plant  and  animal  domesticated  species  (Table  1).
The  size  of  these  CNVs  associated  with  domestication  and  diversification  ranged  from
(cid:1)1  kb  to
(cid:1)1  Mb.  Plant  domestication  CNVs  affected  both  domestication  and  diversification  traits,
whereas  animal  CNVs  were  all  associated  with  postdomestication  diversification  traits.  Plant
356
Trends in Plant Science, April 2019, Vol. 24, No. 4
 Key Table
Table 1. Examples of CNVs Affecting Domestication Traitsa
Locus
Ppd-B1
Ppd-D1a
Vrn-A1
Fr-A2
Rht-D1c
Sub1a
GL7
qSW5/
GSE5
SNORKEL1
SNORKEL2
Pup1
sh1
Sh1
Species
Type
Wheat
mCNV: (cid:1)25 kb
Deletion: 2 kb
mCNV: (cid:1)30 kb
mCNVb
Duplication: >1 mb
Insertionb
Duplication: 17.1 kb
Phenotype
early flowering
Photoperiod insensitivity
(short day growth)
Increased vernalization
requirement
Description
Pseudo-response regulator (Ppd-B1)
Pseudo-response regulator (Ppd-D1a)
Trait type
Diversification
Diversification
Refs
[62]
[79]
MADS-box transcription factor
Diversification
[62]
Frost resistance
Transcription factor, C-repeat Binding Factor (CBF-A14)
Dwarf phenotype; increased
yield
DELLA protein, gibberellic acid insensitive
Submergence tolerance
Ethylene receptor
Grain length
Uncharacterized gene function, homologous to LONGIFOLIA
in Arabidopsis
Domestication/
Diversification
Grain width
GSE5, plasma membrane-associated protein
Deletion: 2 alleles: 950
and 1212 bp
Rice
Insertion: 20.9 kb
q-AG-9-2
Insertionb
Submergence tolerance
Ethylene response factor; transcription factor
Diversification
[84]
Anaerobic germination
tolerance
Trehalose-6-phosphate phosphatase (OsTPP7), sugar
signaling and metabolism
Insertion: 90 kb
Low phosphorous tolerance
PSTOL receptor-like cytoplasmic- kinase
African Rice
Deletion: 30 kb
Sorghum
Deletion: 2.2 kb
Shattering
YABBY transcription factor
Seed shattering
YABBY-like transcription factor
Soybean
mCNV: 31 kb
Rhg1-b
Resistance to cyst nematode
disease
Multiple genes: alpha-SNAP involved in snare membrane
traffic, wound-inducible protein 12 (WI12), a
predicted amino acid transporter
Resistance to leaf rust
disease
Resistance to head smut
disease
mCNVb
Rp1
Cluster of leucine-rich repeat high CN haplotypes
Diversification
[88]
Insertion: 147 kb
ZmWAK
Multiple receptor-like kinase alleles
Diversification
[89]
mCNV: 30 kb
MATE1
Aluminum toxicity resistance
Multidrug and toxic compound extrusion 1 (MATE1)
Duplication: (cid:1)1.5 kb
Tunicate (TU)
Pod corn
ZMM19MADS-box gene
mCNVb
mCNVb: (cid:1)6 kb
Duplication: 22 kb
Bot1
HvFT1
VRN-H1
Boron toxicity resistance
Boron efflux transporter (Bot1)
Flowering time
Mobile florigen signaling protein
Freezing tolerance
C-repeat binding factors (CBF2A-CBF4B)
Diversification
Diversification
Diversification
Diversification
Diversification
[67]
[69]
[66]
[90]
[91]
T
r
e
n
d
s
i
n
P
a
n
t
l
i
S
c
e
n
c
e
,
A
p
r
i
l
2
0
1
9
,
V
o
l
.
2
4
,
N
o
.
4
3
5
7
Maize
Barley
Diversification
Diversification
Diversification
[80]
[81]
[74]
[68]
[82]
Domestication/
Diversification
Diversification
[83]
Diversification
Domestication
Domestication
Diversification
[85]
[86]
[52]
[87]
 3
5
8
T
r
e
n
d
s
i
n
P
a
n
t
l
i
S
c
e
n
c
e
,
A
p
r
i
l
2
0
1
9
,
V
o
l
.
2
4
,
N
o
.
4
Goat
Pig
Dog
Table 1. (continued)
Species
Type
Cucumber
Duplication: 30.2 kb
Female (F) locus
Gynoecy
Locus
Phenotype
Description
Retrogene insertion: 24.7 kb
SUN
Elongated fruit shape
IQ67 domain-containing family, function uncharacterized
Tomato
Deletion: 14 kb
CSR-D
Fruit weight
Truncated cell size regulator (CSR-D), uncharacterized protein
Multiple genes, all likely flowering regulatory:
aminocyclopropane-1-carboxylic acid synthase gene (ACS1);
ethylene synthesis; truncated myb transcription factor
(Csa6G496960); branched-chain amino acid
aminotransferase (Csa6G496970)
Transcription factor controls switch from vegetative to
flowering state
Agouti signaling protein (ASIP), S-adenosylhomocysteine
(AHCY), and itchy homolog E3 ubiquitin protein ligase
promoter (ITCH)
Mast/stem cell growth factor receptor linked to tyrosine kinase
receptor genes
Mast/stem cell growth factor receptor
Major lauric acid (medium-chain fatty acid) omega
hydroxylase, lipogenesis
Common bean
Deletion allele: 5840 bp
Insertion allele: 4171 bp
PvTFL1y
Determinate growth
Sheep
Duplication > 100 kb
Ovine ASIP, AHCY
White coat
Duplication 190 kbOvine kb
ASIP, AHCY
ASIP, AHCY
Duplication: 450 kb
Duplication: (cid:1)480 kb
KIT
KIT
Cattle
mCNV: (cid:1)1562 bp
CYP4A11
White coat
Coat Color
White coat
Body fat
Duplication: (cid:1)133 kb
FGF3, FGF4, FGF19,
ORAOV1
Ridgeback
FGF, embryonic development; oral cancer overexpressed
(ORAOV1;) function uncharacterized
Retrogene insertion: 5 kb
fgf4
Chondrodysplasia (short
legs)
FGF4 retrogene
mCNVb
AMYB2
Starch diet
AMYB2, pancreatic amylase
Duplication: 98 kb
Intergenic region
Blue eyes
Amplification: 3.2 kb
SOX5
Peacomb (cold tolerance)
Duplication: 176 kb
PRLR and SPEF2
Late feathering
Intergenic region adjacent to Hox gene ALX4, which has role in
eye development
Intron 1 of SOX5 is a SRY-related HMG box family of
transcription factors
Prolactin receptor (PRLR); inhibits follicle activation; sperm
flagellar protein 2 (SPEF2), thought to be involved in signal
transmission
Chicken
Duplication: 130 kb
EDN3
Fibromelanosis
(pigmentation)
Endothelin 3 gene (EDN3), receptor, melanoblast/melanocyte
mitogen
Duplication: 20 kb
Duplex-comb
Comb shape
Silkworm
mCNVb
CBP
Cocoon color
Duplication upstream of eomesodermin (EOMES), a t-box
transcription factor
Carotenoid-binding protein (CBP); cystolic transporter of
carotenoid pigments
aInsertion and deletion alleles are distinguished by the state that is associated with the phenotype.
bSize of CNV varies, is not precisely known, or amplification units are of variable size.
Trait type
Diversification
Refs
[29]
Diversification
Domestication/
Diversification
[10]
[92]
Domestication
[53]
Diversification
[54]
Diversification
Diversification
Diversification
Diversification
[55]
[93]
[94]
[95]
Diversification
[56]
Diversification
[96]
Diversification
Diversification
[77]
[97]
Diversification
[98]
Diversification
[99]
Diversification
[100]
Diversification
[101]
Diversification
[41]
 CNVs  found  in  14  genes  were  associated  with  duplications  or  amplifications,  while  11  genes
had  CNV  insertions  or  deletions;  by  contrast,  of  the  14  animal  genes  identified,  all  but  one  were
sequence  duplications  and/or  amplifications.  Of  the  duplications  and/or  amplifications,  22  were
tandem  duplications.  The  prevalence  of  tandem  duplications
in  known  crop  CNVs  may  be
find  tandem  rather  than  dis-
because  previous  QTL-mapping  techniques  made
it  easier  to
persed  duplications.  Plants  also  tend  to  have  higher  genetic  redundancy  and,  thus,  may  be
more  robust  to  deletion  and/or  PAV  mutations;  thus,  they  may  be  able  to  better  tolerate  the
deleterious  nature  of  most  CNVs.
There  were  also  some  differences  in  the  types  of  domestication  and/or  diversification  genes
affected  by  CNVs  in  plants  versus  animals.  Crop  CNVs  had  a  more  diverse  array  of  functions,
with  CNV  mutations  found  in  transcription  factors  related  to  photoperiod  signaling,  develop-
ment,  stress  tolerance,  and  resistance  genes  (R-genes).  By  contrast,  animal  CNVs  were  largely
found  in  genes  that  encode  growth  factors  and  receptors,  and  genes  related  to  development.
include seed nonshattering or suppression of seed dormancy and,
CNVs  and  Domestication  Traits  and  Genes
Domestication  traits  are  those  that  distinguish  a  domesticated  species  from  its  wild  progen-
itors,  and  are  the  requisite  traits  for  cohabitation  with  human  societies.  In  crops,  common
domestication  traits
in
animals,  critical  changes  occur  in  behavioral  traits  [1–3].  The  genes  underlying  domestication
‘domestication  genes’,  are  thought  to  have  arisen  early  during  the  evolution  of  crop
traits,  or
and  livestock  species  (or  may  even  be  present  in  the  wild  ancestor  at  low  frequencies)  [3,4].
Therefore,  examining  genetic  differentiation  between  wild  ancestors  and  domesticates  has  long
been  a  strategy  to  discern  the  underlying  genetic  causes  of  domestication.  These  approaches
are  also  SNP-centric,  and  CNVs  between  wild  species  and  domesticates  could  further  illumi-
nate  the  genetic  basis  of  domestication.
Domestication traits are common to all members of a domesticated species