Introduction

MEData is a dataset of underwater fish census surveys conducted along the Mediterranean sea. The dataset includes environmental variables, as well as conservation information regarding MPAs.

medata <- read_rds("data/medata.Rds")

Variables

variable type comments
data.origin fct azz_asi = Azzuro; azz_malta = Azzuro; Belmaker = Belmaker; Claudet = Claudet; Sala - PEW = Enric Sala; Periklis_MER = Periklis Kleitou, MER Lab
country fct Country where site is located
season fct The season in which the survey was conducted; levels: Autumn, Spring, Summer
lon num longitude (decimal degrees); approximate for Linosa, Kornati
lat num latitude (decimal degrees); approximate for Linosa, Kornati
site fct site name
trans int transect number
unique_trans_id int unique identifier for each transect (unique site + transect + coordinates + depth)
protection lgl inside MPA/unfished (TRUE) or outside MPA/fished (FALSE); linked to enforcement level
enforcement int type of enforcement. 3 levels: 1 = Minimal enforcement; 2 = Medium enforcement; 3 = Fully protected (strongest enforcement). Complements protection
total.mpa.ha num total MPA area in hactare; Source for Ustica; Source for Marettimo
size.notake num area size of no-take zone (in ha); Source for Ustica
yr.creation int MPA establishment year
age.reserve.yr int age of the MPA in years (corresponds to yr.creation) at the time of survey
depth num depth of survey in meters
tmean num annual mean sea surface temperature temperature (Source: Bio-ORACLE)
sal_mean num annual mean salinity (Source: Bio-ORACLE)
pp_mean num annual mean primary productivity (Source: Bio-ORACLE)
species fct species scientific name (format: Genus.species); some species are identified to Genus level (includes ‘spp’ suffix) or family level (includes -dae suffix)
sp.n int fish count - how many individuals of this species were observed?
sp.length num length of fish in cm
a num species length-weight relationship constant (Source: FishBase; Type = TL; using method ‘Type I linear regression’)
b num species length-weight relationship constant (Source: FishBase; Type = TL; using method ‘Type I linear regression’)
family fct taxonomic family
exotic lgl whether this species is local (indigenous, FALSE) or introduced (lessepsian migrant, TRUE)
FoodTroph num trophic level of the species. Extracted from FishBase
FoodSeTroph num standard error for trophic level calculation (FoodTroph. Extracted from FishBase
  • *Data types: num = decimal number; fct = factor; lgl = logical

Overview

summary(medata)
##        data.origin       country         season           lon        
##  azz_asi     : 3875   Italy  :13380   autumn:18275   Min.   : 1.159  
##  azz_malt    : 4179   Israel :10324   spring: 5464   1st Qu.: 8.349  
##  Belmaker    :25679   Greece : 6424   summer:21015   Median :15.751  
##  Sala - PEW  : 9111   France : 3617                  Mean   :19.375  
##  Periklis_MER: 1910   Croatia: 3508                  3rd Qu.:34.073  
##                       Spain  : 2959                  Max.   :35.076  
##                       (Other): 4542                                  
##       lat                 site           trans      unique_trans_id   
##  Min.   :32.42   asinara_add: 3033   Min.   :   1   Length:44754      
##  1st Qu.:34.97   gdor       : 2941   1st Qu.: 239   Class :character  
##  Median :36.74   achziv     : 2900   Median :1432   Mode  :character  
##  Mean   :37.63   shikmona   : 2373   Mean   :1232                     
##  3rd Qu.:41.05   habonim    : 2110   3rd Qu.:2000                     
##  Max.   :44.94   malta      : 1582   Max.   :2392                     
##                  (Other)    :29815   NA's   :35                       
##  protection      enforcement   total.mpa.ha        size.notake     
##  Mode :logical   0   :18648   Min.   :    15.95   Min.   :    0.0  
##  FALSE:18648     1   : 7780   1st Qu.:   191.00   1st Qu.:  167.7  
##  TRUE :25784     2   : 9745   Median :   785.00   Median :  519.2  
##  NA's :322       3   : 8259   Mean   :  5487.56   Mean   : 2950.9  
##                  NA's:  322   3rd Qu.:  2375.00   3rd Qu.: 2651.0  
##                               Max.   :207000.00   Max.   :15000.0  
##                               NA's   :16093       NA's   :19618    
##   yr.creation    age.reserve.yr      depth            tmean      
##  2002   : 8879   1      : 4030   Min.   : 1.000   Min.   :16.78  
##  1962   : 3617   55     : 3617   1st Qu.: 5.200   1st Qu.:18.66  
##  1960   : 3508   57     : 3508   Median :10.000   Median :20.09  
##  1986   : 3126   9      : 3193   Mean   : 9.365   Mean   :20.08  
##  2004   : 2941   32     : 3126   3rd Qu.:11.000   3rd Qu.:22.11  
##  (Other):11316   (Other):15913   Max.   :29.000   Max.   :23.00  
##  NA's   :11367   NA's   :11367   NA's   :16376    NA's   :1366   
##     sal_mean        pp_mean         species               sp.n         
##  Min.   :33.99   Min.   :0.0004   Length:44754       Min.   :    0.00  
##  1st Qu.:37.67   1st Qu.:0.0005   Class :character   1st Qu.:    1.00  
##  Median :37.89   Median :0.0008   Mode  :character   Median :    1.00  
##  Mean   :38.08   Mean   :0.0029                      Mean   :   16.64  
##  3rd Qu.:38.79   3rd Qu.:0.0062                      3rd Qu.:    4.00  
##  Max.   :39.25   Max.   :0.0096                      Max.   :10000.00  
##  NA's   :1366    NA's   :1366                                          
##    sp.length                family        exotic          FoodTroph    
##  Min.   :  0.00   Labridae     :19351   Mode :logical   Min.   :2.000  
##  1st Qu.:  8.00   Sparidae     :10028   FALSE:41157     1st Qu.:3.240  
##  Median : 10.00   Serranidae   : 4182   TRUE :3549      Median :3.340  
##  Mean   : 11.66   Pomacentridae: 2956   NA's :48        Mean   :3.271  
##  3rd Qu.: 15.00   Siganidae    : 2403                   3rd Qu.:3.500  
##  Max.   :150.00   Scaridae     : 1341                   Max.   :4.500  
##  NA's   :842      (Other)      : 4493                   NA's   :827    
##   FoodSeTroph           a               b        
##  Min.   :0.0000   Min.   :0.001   Min.   :2.429  
##  1st Qu.:0.4100   1st Qu.:0.011   1st Qu.:2.892  
##  Median :0.4300   Median :0.015   Median :3.042  
##  Mean   :0.4228   Mean   :0.017   Mean   :3.017  
##  3rd Qu.:0.4700   3rd Qu.:0.020   3rd Qu.:3.122  
##  Max.   :0.9100   Max.   :0.062   Max.   :3.482  
##  NA's   :827      NA's   :8444    NA's   :8444
skimr::skim(medata)
Data summary
Name medata
Number of rows 44754
Number of columns 27
_______________________
Column type frequency:
character 2
factor 8
logical 2
numeric 15
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
unique_trans_id 0 1 11 55 0 2269 0
species 0 1 8 28 0 135 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
data.origin 0 1.00 FALSE 5 Bel: 25679, Sal: 9111, azz: 4179, azz: 3875
country 0 1.00 FALSE 9 Ita: 13380, Isr: 10324, Gre: 6424, Fra: 3617
season 0 1.00 FALSE 3 sum: 21015, aut: 18275, spr: 5464
site 0 1.00 FALSE 84 asi: 3033, gdo: 2941, ach: 2900, shi: 2373
enforcement 322 0.99 FALSE 4 0: 18648, 2: 9745, 3: 8259, 1: 7780
yr.creation 11367 0.75 FALSE 16 200: 8879, 196: 3617, 196: 3508, 198: 3126
age.reserve.yr 11367 0.75 FALSE 20 1: 4030, 55: 3617, 57: 3508, 9: 3193
family 0 1.00 FALSE 41 Lab: 19351, Spa: 10028, Ser: 4182, Pom: 2956

Variable type: logical

skim_variable n_missing complete_rate mean count
protection 322 0.99 0.58 TRU: 25784, FAL: 18648
exotic 48 1.00 0.08 FAL: 41157, TRU: 3549

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
lon 0 1.00 19.38 11.59 1.16 8.35 15.75 34.07 35.08 ▅▇▂▅▇
lat 0 1.00 37.63 3.85 32.42 34.97 36.74 41.05 44.94 ▇▇▃▅▅
trans 35 1.00 1231.51 866.81 1.00 239.00 1432.00 2000.00 2392.00 ▇▁▅▅▇
total.mpa.ha 16093 0.64 5487.56 20976.18 15.95 191.00 785.00 2375.00 207000.00 ▇▁▁▁▁
size.notake 19618 0.56 2950.91 5209.93 0.00 167.70 519.20 2651.00 15000.00 ▇▁▁▁▂
depth 16376 0.63 9.37 5.36 1.00 5.20 10.00 11.00 29.00 ▆▇▂▁▁
tmean 1366 0.97 20.08 2.02 16.78 18.66 20.09 22.11 23.00 ▅▆▇▂▇
sal_mean 1366 0.97 38.08 0.86 33.99 37.67 37.89 38.79 39.25 ▁▁▁▇▇
pp_mean 1366 0.97 0.00 0.00 0.00 0.00 0.00 0.01 0.01 ▇▂▁▃▁
sp.n 0 1.00 16.64 152.29 0.00 1.00 1.00 4.00 10000.00 ▇▁▁▁▁
sp.length 842 0.98 11.66 7.69 0.00 8.00 10.00 15.00 150.00 ▇▁▁▁▁
FoodTroph 827 0.98 3.27 0.40 2.00 3.24 3.34 3.50 4.50 ▁▁▇▁▁
FoodSeTroph 827 0.98 0.42 0.13 0.00 0.41 0.43 0.47 0.91 ▁▁▇▁▁
a 8444 0.81 0.02 0.01 0.00 0.01 0.01 0.02 0.06 ▅▇▁▁▁
b 8444 0.81 3.02 0.14 2.43 2.89 3.04 3.12 3.48 ▁▁▇▆▁

Countries

Data was collected from 9 countries:

medata %>% distinct(country) %>% arrange(.$country) %>% print(n = Inf)
## # A tibble: 9 × 1
##   country
##   <fct>  
## 1 Croatia
## 2 France 
## 3 Greece 
## 4 Israel 
## 5 Italy  
## 6 Malta  
## 7 Spain  
## 8 Turkey 
## 9 Cyprus

Sites

Data was collected from 72 sites:

Species

Dataset withholds 124 species

medata %>% distinct(species) %>% arrange(.$species) %>% count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1   135
# proper species
medata %>% distinct(species) %>% arrange(.$species) %>%
  filter(!c(grepl("dae", species) | grepl(".spp", species))) %>%
  print(n = Inf)
## # A tibble: 124 × 1
##     species                     
##     <chr>                       
##   1 Abudefduf.saxatilis         
##   2 Anthias.anthias             
##   3 Apogon.imberbis             
##   4 Atherina.boyeri             
##   5 Balistes.capriscus          
##   6 Belone.belone               
##   7 Boops.boops                 
##   8 Bothus.podas                
##   9 Caranx.crysos               
##  10 Cheilodipterus.novemstriatus
##  11 Chelon.labrosus             
##  12 Chromis.chromis             
##  13 Conger.conger               
##  14 Coris.julis                 
##  15 Ctenolabrus.rupestris       
##  16 Dactylopterus.volitans      
##  17 Dasyatis.pastinaca          
##  18 Dentex.dentex               
##  19 Dicentrarchus.labrax        
##  20 Dicentrarchus.punctatus     
##  21 Diplodus.annularis          
##  22 Diplodus.cervinus           
##  23 Diplodus.puntazzo           
##  24 Diplodus.sargus             
##  25 Diplodus.vulgaris           
##  26 Epinephelus.aeneus          
##  27 Epinephelus.caninus         
##  28 Epinephelus.costae          
##  29 Epinephelus.marginatus      
##  30 Euthynnus.alletteratus      
##  31 Fistularia.commersonii      
##  32 Gobius.auratus              
##  33 Gobius.bucchichi            
##  34 Gobius.cobitis              
##  35 Gobius.cruentatus           
##  36 Gobius.geniporus            
##  37 Gobius.incognitus           
##  38 Gobius.paganellus           
##  39 Gobius.vittatus             
##  40 Gobius.xanthocephalus       
##  41 Herklotsichthys.punctatus   
##  42 Labrus.merula               
##  43 Labrus.mixtus               
##  44 Labrus.viridis              
##  45 Lagocephalus.sceleratus     
##  46 Lichia.amia                 
##  47 Lithognathus.mormyrus       
##  48 Liza.aurata                 
##  49 Mugil.cephalus              
##  50 Mullus.barbatus             
##  51 Mullus.surmuletus           
##  52 Muraena.helena              
##  53 Mycteroperca.rubra          
##  54 Myliobatis.aquila           
##  55 Oblada.melanura             
##  56 Oedalechilus.labeo          
##  57 Pagellus.acarne             
##  58 Pagrus.caeruleostictus      
##  59 Pagrus.pagrus               
##  60 Parablennius.gattorugine    
##  61 Parablennius.incognitus     
##  62 Parablennius.pilicornis     
##  63 Parablennius.rouxi          
##  64 Parablennius.sanguinolentus 
##  65 Parablennius.tentacularis   
##  66 Parablennius.zvonimiri      
##  67 Parupeneus.forsskali        
##  68 Pempheris.rhomboidea        
##  69 Pempheris.vanicolensis      
##  70 Phycis.phycis               
##  71 Plotosus.lineatus           
##  72 Pomadasys.incisus           
##  73 Pomatomus.saltatrix         
##  74 Pomatoschistus.quagga       
##  75 Pseudocaranx.dentex         
##  76 Pteragogus.pelycus          
##  77 Pteragogus.trispilus        
##  78 Pterois.miles               
##  79 Sardina.pilchardus          
##  80 Sargocentron.rubrum         
##  81 Sarpa.salpa                 
##  82 Scartella.cristata          
##  83 Scarus.ghobban              
##  84 Sciaena.umbra               
##  85 Scorpaena.maderensis        
##  86 Scorpaena.notata            
##  87 Scorpaena.porcus            
##  88 Scorpaena.scrofa            
##  89 Seriola.dumerili            
##  90 Serranus.cabrilla           
##  91 Serranus.hepatus            
##  92 Serranus.scriba             
##  93 Siganus.luridus             
##  94 Siganus.rivulatus           
##  95 Sparisoma.cretense          
##  96 Sparus.aurata               
##  97 Sphyraena.chrysotaenia      
##  98 Sphyraena.sphyraena         
##  99 Sphyraena.viridensis        
## 100 Spicara.maena               
## 101 Spicara.smaris              
## 102 Spondyliosoma.cantharus     
## 103 Stephanolepis.diaspros      
## 104 Symphodus.cinereus          
## 105 Symphodus.doderleini        
## 106 Symphodus.mediterraneus     
## 107 Symphodus.melanocercus      
## 108 Symphodus.ocellatus         
## 109 Symphodus.roissali          
## 110 Symphodus.rostratus         
## 111 Symphodus.tinca             
## 112 Synodus.saurus              
## 113 Taeniura.grabata            
## 114 Thalassoma.pavo             
## 115 Torquigener.flavimaculosus  
## 116 Trachinotus.ovatus          
## 117 Trachinus.draco             
## 118 Trachurus.mediterraneus     
## 119 Tripterygion.delaisi        
## 120 Tripterygion.melanurus      
## 121 Tripterygion.tripteronotus  
## 122 Trisopterus.minutus         
## 123 Upeneus.pori                
## 124 Xyrichtys.novacula

NOTE: There are 135 taxa in the dataset but some are not species but genus (Atherina.spp, Symphodus.spp etc.) or family (Labridae, Clupeidae, Belonidae etc.).

Methods

Data collection

Visual fish census surveys took place between the years 2009-2019 in multiple locations along the Mediterranean Sea (figure 1) by teams of skilled SCUBA divers. Locations were comprised of sites within MPAs with varying size, age and enforcement level, and unprotected sites which are adjacent to these MPAs. To date, the database consists of 44,754 observations, in 2,270 transects, of 135 species of fish and includes mostly abundance data.

Figure 1. Survey sites and temperatures (in degrees celsius).

The sampling protocol is described in Frid Ori, et al., 2022.

Data acquired from other resources

Environmental data (temperature, salinity and primary production) were acquired from Bio-ORACLE using sdmpredictors package.

Layers:

  • tmean = “BO_sstmean”
  • sal_mean = “BO2_salinitymean_ss”
  • pp_mean = “BO2_ppmean_ss”

(See “R/bio-oracle extraction code.R” for full extraction code).

Data wrangling notes

  • Linosa and kornati did not have specific coordinates, therefore, an approximate location (lat-lon) was attached to it. If your analysis requires fine-detail for the location, you might want to omit these locations.

  • Data from source asinara_add site are presence-absence only.

Transformations

Many analyses require species matrix where each row is a site and each column is a species.

Here’s the code to create such matrix with this dataset:

# Create a species matrix with where rows are transects and columns are species
# (id_cols can be minimised but I keep it for explicitness)
med_mat <- medata %>%
  pivot_wider(id_cols = c(data.origin, country, season, lon, lat, site, trans, unique_trans_id, protection, enforcement, total.mpa.ha, size.notake, yr.creation, age.reserve.yr, depth, tmean, sal_mean, pp_mean), 
              names_from = species, values_from = sp.n, 
              values_fn = function(x) sum(x, na.rm = T), values_fill = 0)

# Convert the abundance data to pseudo presence-absence
pres_abs_mat <- med_mat
first_species <- which(colnames(med_mat) == "Atherina.boyeri")
pres_abs_mat[first_species:ncol(pres_abs_mat)] <- ifelse(pres_abs_mat[first_species:ncol(pres_abs_mat)] > 0, 1, 0)

Just pay attention that here there is also metadata in the left columns (until the first species). Some ecological anaylses require the data to be a real matrix (only made of one type of data - the abundance of each species), and in this case just make sure you separate the meta-data from the species matrix.

View the data

Species richness

col_scale <- fishualize::fish(n = 9, option = "Thalassoma_pavo")
medata %>% 
  group_by(country, lon, lat) %>% 
  distinct(species) %>% summarise(richness = n()) %>% 
  ggplot() +
  geom_sf(data = med_map$geometry) +
  geom_point(aes(x = lon, y = lat, size = richness, col = country), pch = 21, alpha = 0.4) +
  scale_colour_manual(values = col_scale, name = "Country") +
  labs(x = "", y = "", size = "Richness")
## `summarise()` has grouped output by 'country', 'lon'. You can override using
## the `.groups` argument.

Lessepsian migrants

medata %>% filter(!is.na(exotic)) %>% 
  ggplot() + aes(x = exotic) + geom_bar(fill = c("#62a1c7", "#d53748")) +
  labs(x = "Exotic (Lessepsian migrant)", y = "Total observations count")

medata %>% filter(!is.na(exotic)) %>% distinct(species, exotic) %>% 
  ggplot() + aes(x = exotic) + geom_bar(fill = c("#62a1c7", "#d53748")) +
  labs(x = "Exotic (Lessepsian migrant)", y = "Total species count")

References

Froese, R. and D. Pauly. Editors. 2021.FishBase. World Wide Web electronic publication. www.fishbase.org, (06/2021)

Boettiger C, Temple Lang D, Wainwright P (2012). “rfishbase: exploring, manipulating and visualizing FishBase data from R.” Journal of Fish Biology

Assis J, Tyberghein L, Bosch S, Heroen V, Serrão E, De Clerck O, Tittensor D (2018). Bio-ORACLE v2.0: Extending marine data layers for bioclimatic modelling.” Global Ecology and Biogeography, 27(3),277-284. doi: 10.1111/geb.12693 (https://doi.org/10.1111/geb.12693).

Tyberghein L, Heroen V, Pauly K, Troupin C, Mineur F, De Clerck O (2012). Bio-ORACLE: a global environmental dataset for marine speciesdistribution modelling." Global Ecology and Biogeography, 21(2),272-281. doi: 10.1111/j.1466-8238.2011.00656.x (https://doi.org/10.1111/j.1466-8238.2011.00656.x).