## Useful when sourcing script from Quarto doc.
if (!require(here)) install.packages("here"); library(here)
source(here("R/00-load-packages.R"))
source(here("R/fct-nfi-aggregate3-new.R"))Proceedings of the
Workshop on Data analysis with R for Lao PDR’s fourth National Forest Inventory cycle
01-05 Sep 2025, FIPD office, Vientiane, Lao PDR
Context
A new training course: “Training workshop on Data analysis with R for Lao PDR’s fourth National Forest Inventory cycle”, was organised for FIPD from 01 to 05 September 2025, in FIPD office in Vientiane (Lao PDR), in order to continue capacity building on national forest inventory data analysis and to present the calculation chain for the cycle 4 updated method.
The training objective was to:
- Continue build capacities of FIPD in forest data analysis with R and Rstudio.
- Present the code base used for analyzing the 4th National Forest Inventory of Lao PDR, implemented in 2024 and 2025.
- Explore opportunities to build a R community with research institutions and the civil society to help each other build analytical skills and deal with complex data analysis in the forest, ecology and environment sectors.
The workshop agenda and participants list are shown in Figure 1.
This document contains all the code presented and the exercises with solutions during the training.
Initial Setup
The presentations, data, R projects as ZIP files or links to R projects on Github are available in the Google Classroom set for the training at:
https://classroom.google.com/c/NzU5NDMzMTU1MDI3?cjc=6jrq673a
NOTE: For security purpose the link is only valid for registered participants.
Before loading the data and perform the analysis with R and Rstudio, we need to prepare the environment by loading packages and setting a few options. Here we don’t load directly packages, but we source a script that load them.
1 Session 01: Objectives
Session 1 presented the detailed objectives of the training, as a mixture of basic(refresher) and advanced analysis in R. The technical analysis objectives were to cover: Basic calculations, Aggregation with simple sampling, Ratio estimators, and Double sampling for post-stratification.
The institutional objective was to explore a pool of resources available in other institutions to improve analytical capacities of FIPD. This includes academia and research institutions in Lao PDR to bring them together for broader technical exchange in R and exploring their potential support to FIPD.
Finally, the workshop aimed at defining a work plan of continuous training to bring in additional pools and calculations.
To achieve the above objectives, the training focused on:
Read data into R and visualize trees characteristics
Manipulate tables (merge, split, rename)
Detect outliers and correct the data
Aggregate from tree to plot level
Aggregate plot level data to LC types
Aggregate to sub-population and national level
2 Session 02: Updated calculation chain for the NFI cycle 4 carbon analysis
The NFI cycle 4 of LAO PDR had an updated sampling methodology: two-phases sampling (or double sampling) for stratification with ratio estimators. The plot design also slightly changed, the subplots were positioned 60 m apart center to center on a L shape.
2.1 Sampling design
The updated sampling design consisted of two sampling frames and sizes:
Phase 1: a dense 6 x 6 km grid, on which all grid points were visually interpreted to identify the land cover class and stratum.
Phase 2: an initial systematic subset of the phase 1, a 12 x 18 km grid, for field survey. Then not all plots on the second grid could be visited due to budget constraints, so around half of grid point were visited.
Important advantages of this design:
The final estimates takes into account both phases when producing means and totals.
Mis-classifications in phase 1 are corrected with phase 2 plots.
2-phases designs produce both areas and forest characteristics, which makes final estimates generally more robust than NFIs with one sampling and independent land cover maps (Westfall et al. 2019).
2.2 Plot design
The plot design was also updated from the NFI previous cycles, where the subplots where placed randomly within a designated area. In the cycle 4 design, plots are composed by 4 circular subplots in a L shape, with 60 m between subplot centers. Subplots are nested circles of 8 and 16 m radius for trees between 10 and 30 cm DBH (30 excluded) and trees bigger or equal to 30 cm DBH respectively. Seedling and saplings are counted in a 2m radius circle positioned 6 m east of the subplot centers and lying deadwood are measured on a line transect of 32 m.
A new specificity of the NFI cycle 4 was to observe land cover on 5 points located at the subplot centers and 12 m north, east, south and west from the center. Following equidistance, or Thiessen polygons, land cover sections (LCS) were defined as areas of:
12 m side square at the center,
a quarter of the difference between the 16 m radius outer subplot circle and the 12 m side square for the other LCSs. The diagonals (NW-SE and SW-NE) were used to delineate the quarters.
2.3 Core measurements
The NFI cycle 4 primary remained Carbon stock estimation but it expanded to multi-purpose with additional measurements of tree location for time series and non-timber forest products (NTFPs). The full list of environmental, trees and other measurements can be found in the field manual. Here is a summary of core measurements used for the carbon stock analysis:
Phase 1:
Administrative zone
Land cover at plot center (subplot A) and stratum (natural forest, planted forest, regenerating vegetation, non-forest)
Subplot:
- Land cover at 5 points
Tree:
Tree distance and azimuth from subplot center
Tree DBH
Tree species
Standing deadwood:
- Diameter at base, DBH and diameter at top (for short standing deadwood).
Stump:
Stump height
Stump diameters (2 perpendicular measurements)
Lying deadwood:
Section length
inner and outer ring in case of hollow stem.
2.4 Analysis steps
The data analysis was performed with R and R studio and structured around 4 main steps:
Setup the R environment (inc. loading NFI and ancillary data)
Prepare clean data (clean, correct and join)
Calculate entity level variables
Aggregate measurements to minimal measurement area, plot, strata and totals
Each of the main steps was sub-divided into the following actions, noting that actions in bold with the symbol (*) were the focus of this training:
The initial input data from the field work was stored into an ONA server and downloaded manually as a master CSV. Splitting the master CSV and renaming columns was relatively complex and not the focus of this training. Instead “semi”-cleaned tables tree, subplot and lcs tables were prepared for the participants.
Training practice was also limited to live trees aboveground biomass, as participants should be more familiar with this this carbon pool than deadwood, stump, sapling and lying deadwood.
An overview on how the detailed actions were handled in specific R scripts was also presented:
2.5 Core results
Core results were shown from the result XLSX file that contained the following tables:
Phase 1 data: strata, subpopulations
Phase 2 subplot level: subplot x LCS values for core variables (tree AGB, tree BGB, sapling AGB, DW, stump AGB, LDW, Ctotal)
Plot summary: aggregate subplots to partial plots
Plot unique: for each plot with mixed LC keeps the section with max AGB, if equal, keeps the smallest LC code
Subpop x stratum: aggregate variables for each land cover (i.e domain d)
Subpop: aggregate strata
Totals: aggregate sub-populations
The initial result file was prepared with ERPA vs non-ERPA zone as subpopulations but the final estimates required a few corrections (see final report):
Correction of phase 1 land cover class passed to phase 2 plots. Initially based on FTM2022, these need to be back to Phase 1 land cover.
Correction Factor was missing in EG and DD allometric equations.
Subplot level AGB for root-to-shoot ratio was using wrong weights,
ERPA subpopulations influenced slightly the final estimates vs a scenario with no subpopulation.
3 Session 03: Discussion and remarks on the updated calculations
There were no questions on the calculation chain at the time of the presentation.
4 Session 04: Refresher on R for forest data analysis
This time the refresher didn’t cover very basics of R but went back to a session of a previous training where local data were used to calculate the volume of a forest stand.
The project and it’s correction are available in the training Google Classroom.
The main output were the tree density and volume per hectare for each forest zone (1-5) and for the whole forest stand.
5 Session 05: data preparation
Goals
Load initial tables and visualize core data
Correct errors in subplot table (duplicates in subplot_id)
5.1 Load harmonized tables and ancillary data
5.1.1 Load subplot, lcs, tree tables from NFI
We can read tables or CSV files from the computer with the function read_csv(). We can store the initial table in objects with the suffix ‘_init’ to keep the initial version in the environment.
Example: read the table ‘training_subplot.csv’ from the computer into the object subplot_init:
subplot_init <- read_csv("data/training_subplot.csv", show_col_types = FALSE)Load the tables ‘training_tree.csv’ and ‘training_lcs.csv’ in the Exercise 1.
Answer to Exercise 1:
## !!! Solution
lcs_init <- read_csv("data/training_lcs.csv", show_col_types = FALSE)
tree_init <- read_csv("data/training_tree.csv", show_col_types = FALSE)5.2 Load ancillary data
Ancillary data are not collected in the field but useful for the analysis of field data.
In the training we use plot level environment factor (E) from Chave et al. 2014 and the NFI Phase 1 plot data. Ancillary data are loaded into a list to keep the environment tidy.
anci <- list()
anci$ph1 <- read_csv("data/training_anci_phase1.csv", show_col_types = FALSE)
anci$plot_E <- read_csv("data/training_anci_plotE.csv", show_col_types = FALSE)5.3 Visualize tree locations
We visualize tree locations from the tree_init table with ggplot() and geom_point() to see if any potential error.
Step-by-step for one subplot:
Define
circ16as tibble of 100 rows to draw a circular plot in ggplot, based on a radius and an angle “theta” \(\theta\) in radian.Draw a ggplot with trees from the table
tree_initin plot 631 and subplot “C”, usingfilter(),Use
tree_xandtree_yto show trees based on their location from the subplot center,Add plot boundary with
circ16.
circ16 <- tibble(
theta = seq(0, 2*pi, length = 100),
x = 16 * cos(theta),
y = 16 * sin(theta)
)
tree_init |>
filter(plot_no == 631, subplot_no == "C") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point() +
geom_path(data = circ16, aes(x = x, y = y)) +
coord_fixed()Adapt the previous code to make a tree location map for the subplot ‘1A’ and add different colors based on tree DBH size in Exercise 2.
Answer to Exercise 2:
## !!! Solution
circ08 <- tibble(
theta = seq(0, 2*pi, length = 100),
x = 8 * cos(theta),
y = 8 * sin(theta)
)
tree_init |>
filter(plot_no == 1, subplot_no == "A") |>
mutate(tree_dbh_cat = if_else(tree_dbh < 30, "small", "big")) |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(color = tree_dbh_cat)) +
geom_path(data = circ08, aes(x = x, y = y)) +
geom_path(data = circ16, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", color = "DBH cat.")Visualizations can also be used to detect outliers. We use the example of small trees outside the nested subplot circle of 8m radius as guided exercise in Exercise 3.
Type here answers to Exercise 3 part1:
## !!! Solution
tree_init |>
filter(tree_dbh <= 30) |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(col = "darkred") +
geom_path(data = circ08, aes(x = x, y = y)) +
geom_path(data = circ16, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)")Type here answers to Exercise 3 part2:
## !!! Solution
outliers <- tree_init |> filter(tree_dbh <= 30, tree_distance > 8)
tree_init |>
filter(tree_dbh <= 30) |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(alpha = 0.2) +
geom_point(data = outliers, col = "red", shape = 23, size = 4) +
geom_path(data = circ08, aes(x = x, y = y)) +
geom_path(data = circ16, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)")Type here answers to Exercise 3 part3:
## !!! Solution
tree_init |>
filter(tree_dbh < 30) |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(col = "darkred") +
geom_path(data = circ08, aes(x = x, y = y)) +
geom_path(data = circ16, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)")Conclusion of Exercise 3: the trees found outside the small circle were trees with DBH = 30 cm. They were not measurement errors, they were part of the big trees and therefore could be found outside the 8m radius circle.
5.4 Find and correct errors
One of the main issue found in the raw data was the entry errors of subplot and tree IDs. It is very important in Database Management Systems (DBMS) to have unique IDs for all tables. In NFI it is particularly important for joining information from one table to the other. For example, land cover information is recorded at subplot level and needs to be passed on to tree level to apply the correct allometric equations. These errors can be detected programatically but often have to be solved manually.
This section introduces how to use group_by() and summarise() to find duplicates in subplot IDs.
5.4.1 Identify duplicates in subplot IDs
Here we want to check within subplots if there are duplicates in subplot IDs (subplot_id) and plot numbers (plot_no) in the subplot_init table. The objective is to create a vector of subplot Ids that are duplicated. If there are no duplicate the vector will be of length 0.
step-by-step:
Create a vector
vec_dupand assign it the following code sequence.Group the subplot table by subplot ID with
group_by()and usesummarise()to count the number of subplots that have each ID inside a new columncount. Use the functionn()to count.Filter all the subplot IDs that have a count bigger than 1 with
filter().Pull the subplot IDs.
vec_dup <- subplot_init |>
group_by(subplot_id) |>
summarise(count = n(), .groups = "drop") |>
filter(count > 1) |>
pull(subplot_id)
vec_dup[1] "553D" "631C" "632C"
Notice the results as 3 subplots have duplicate subplot_id.
Now practice the same code sequence to check if there are duplicates in plot_no in the Exercise 4. Note that each plot ID should appears 4 times in the subplot table, one for each subplot, so duplicates are when count > 4.
Answer to Exercise 4:
## !!! Solution
vec_dup2 <- subplot_init |>
group_by(plot_no) |>
summarise(count = n(), .groups = "drop") |>
filter(count > 4) |>
pull(plot_no)
vec_dup2numeric(0)
There should be no duplicates of plot_no.
5.4.2 Correct the subplot ID issues
To correct the duplicates in subplot_id, we filter the subplots from one of the duplicated IDs and visually check what is wrong. In case one ID is missing and one is duplicated we can use time stamps to identify which of the duplicates should be renamed.
Then we can use mutate() and case_when() to apply correction in R.
Step-by-step:
- Filter the subplots of one plot with duplicates: 631C.
- Visualize the table (run
View(tt)or clickttin the Environment tab (top-right window in Rstudio).- In the case of ‘631C’ there are 2 subplots C but no B. we can use time stamps to identify that
ONA_index109 should actually be for subplot B.
- In the case of ‘631C’ there are 2 subplots C but no B. we can use time stamps to identify that
- Perform the correction using
case_when()function for subplot_id 631C. At this stage onlyONA_indexis unique so it can be used to correctly identify the subplot to correct.
tt <- subplot_init |> filter(plot_no == 631)
# View(tt)
subplot <- subplot_init |>
mutate(
subplot_no = case_when(
subplot_id == "631C" & ONA_index == 109 ~ "B",
## ADD more corrections,
TRUE ~ subplot_no
)
)You can now copy/paste the above R chunk and complete the correction for the 2 other subplot IDs with duplicates in Exercise 5.
Note: we correct subplot_no now, but we will need to remake subplot_id once subplot_no is correct.
Solution to Exercise 5:
## !!! Solution
subplot <- subplot_init |>
mutate(
subplot_no = case_when(
subplot_id == "631C" & ONA_index == 109 ~ "B",
subplot_id == "632C" & ONA_index == 113 ~ "B",
subplot_id == "553D" & ONA_index == 265 ~ "C",
TRUE ~ subplot_no
)
)Now we can recreate subplot_id from the corrected subplot_no. Since subplot_id is a text field, we need to make it consistent for number of characters to ensure correct sorting of the rows. For this purpose we include texts “0” and “00” to make the number of characters consistent.
subplot <- subplot |>
mutate(
subplot_id = case_when(
plot_no < 10 ~ paste0("00", plot_no, subplot_no),
plot_no < 100 ~ paste0("0", plot_no, subplot_no),
TRUE ~ paste0(plot_no, subplot_no)
)
)Now if we re-run the vec_dup sequence with subplot instead of subplot_init, vec_dup should be of length 0.
vec_dup <- subplot |>
group_by(subplot_id) |>
summarise(count = n(), .groups = "drop") |>
filter(count > 1) |>
pull(subplot_id)
vec_dupcharacter(0)
5.4.3 Check outliers in tree table
The tree table we can check for outliers in numeric columns with the summary() function.
For example with DBH:
summary(tree_init$tree_dbh) Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 13.50 19.80 25.79 34.20 199.00
There are no trees with DBH < 10 cm or unrealistically big DBH, all good.
We can check for outliers in tree distance and azimuth in Exercise 6.
Solution to Exercise 6:
summary(tree_init$tree_azimuth) Min. 1st Qu. Median Mean 3rd Qu. Max.
0 90 179 179 268 360
summary(tree_init$tree_distance) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 4.335 6.280 6.785 7.900 16.000
5.5 Assigning clean data to objects
After resolving all errors, save the tables to the entity name without “_init” suffix. This way we have both the initial and corrected version in the environment. If tehre were no errors the inital tables can be passed directly to the corrected tables:
tree <- tree_init
lcs <- lcs_initFinally we can remove the temporary objects that won’t be used later.
rm(vec_dup, vec_dup2, tt, outliers)NOTE: use of temporary object list tmp.
In the main analysis the temporary objects are often stored in a list called tmp. This avoids cluttering the environment with temporary objects and makes it easy to remove them with rm(tmp).
## Create empty list
tmp <- list()
## Populate the list (example)
tmp$outliers <- tree_init |> filter(tree_dbh <= 30, tree_distance > 8)
## Remove temporary objects
rm(tmp)6 Session 06: data joins
Goals:
Assign LCS number to trees
Join land cover and Chave E to trees
6.1 Assign LCS number to trees
Since the Land Cover Sections (LCS) are a mix of center square and outside disc portions, we need both tree position as distance \(d\) and azimuth \(az\) and as coordinates \((x, y)\).
See the classroom presentation: “main calculations, TIP02” for explanations on converting distance \(d\) and angle \(\theta\) to coordinates \((x, y)\) for the subplot center, and converting distance \(d\) and azimuth \(az\) to coordinates \((x, y)\).
The R implementation is as follows:
tree <- tree |>
mutate(
tree_x = cos((90 - tree_azimuth) * pi/180) * tree_distance,
tree_y = sin((90 - tree_azimuth) * pi/180) * tree_distance,
)Once we have tree coordinates, we can assign LCS with case_when(), identifying trees in the center square first (with their coordinates), then the outer disc portion based on azimuth:
To be inside the center square, trees must have their coordinates \((x, y)\) both between -6 and 6 meters, or their absolute value smaller or equal to 6:
\[ -6 \le x \le 6 \Leftrightarrow |x| \le 6 \\ -6 \le y \le 6 \Leftrightarrow |y| \le 6 \]
To be in one of the outer disc portions, trees must not be in the center square and be respectively positioned between NW and NE lines, NE and SE, SE and SW or SW and NW for the north, east, south and west disc portions. These lines’ azimuth are 315, 45, 135, 215 degrees respectively for NW, NE, SE and SW lines.
R implementation:
tree <- tree |>
mutate(
lcs_no = case_when(
abs(tree_x) <= 6 & abs(tree_y) <= 6 ~ 1,
tree_azimuth > 315 | tree_azimuth <=45 ~ 2,
tree_azimuth > 45 & tree_azimuth <=135 ~ 3,
tree_azimuth > 135 & tree_azimuth <=225 ~ 4,
tree_azimuth > 225 & tree_azimuth <=315 ~ 5,
TRUE ~ NA_integer_
)
)We can now check if the code implementation worked with a visual check in Exercise 7.
Answer to Exercise 7:
## !!! Solution
tree |>
filter(plot_no == 123, subplot_no == "B") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(color = as.character(lcs_no))) +
geom_path(data = circ16, aes(x = x, y = y)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_text_repel(aes(label = tree_azimuth), min.segment.length = 0) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", color = "LCS")We can observe that azimuths have been well converted to \((x, y)\) since azimuth labels are in their correct quadrants (0-90, 90-180, 180-270 ad 270-360 degree respectively).
6.2 Join tables
6.2.1 Prepare tables to join
We want to join 2 tables to the tree data:
Chave et al. 2014 environment factor \(E\) from the table
anci$plot_E,Land cover code from the table
lcs,
Sometimes we don’t want all the columns from a tables to be passed on to tree. In this case we create a temporary tables with only the key(s) = the matching column(s) between two tables to join, and the information to pass.
For example, we don’t want lcs_name from lcs to go to trees, we already have lcs_no as key between the two tables. We can either remove the undesired column or keep all other columns. In this case, removing is faster, but for sustainability, it is preferable to write the columns you want to keep so that when you re-open this document in a few weeks you see directly what is kept.
tmp_lcs <- lcs |> select(-lcs_name)
## Same but preferable:
tmp_lcs <- lcs |> select(plot_no, subplot_no, lcs_no, lc_no, lc_code)6.2.2 Make the joins
- With changing object
We join LCS (tmp_lcs) with the tree table using their common keys. In this case there are three keys: plot_no, subplot_no and lcs_no.
tree_join <- tree |>
left_join(tmp_lcs, by = join_by(plot_no, subplot_no, lcs_no))It is recommended to change object name when using join to avoid undesired suffixes. Undesired suffixes happen when we join the same tables twice or more by mistake. The second time we join two tables, since the new tables are already in the joined table, they will be passed again but this time with suffixes .x and .y. Observe undesired suffix in the Exercise 8.
Answers to Exercise 8:
## !!! Solution
tree_err <- tree |>
left_join(tmp_lcs, by = join_by(plot_no, subplot_no, lcs_no))
tree_err <- tree_err |>
left_join(tmp_lcs, by = join_by(plot_no, subplot_no, lcs_no))
names(tree_err) [1] "plot_no" "subplot_no" "lcs_no"
[4] "tree_no" "tree_stem_no" "tree_dbh"
[7] "tree_species_code" "tree_species_binomial" "tree_distance"
[10] "tree_azimuth" "tree_x" "tree_y"
[13] "lc_no.x" "lc_code.x" "lc_no.y"
[16] "lc_code.y"
- With keeping the same object name
Often time we don’t want to change object names for a join if we have a long sequence of modifications ahead. We can join and keep the same object names, but in this case we have to control the suffixes.
Step-by-step:
Identify the new column passed from the table to join.
Create these columns in the receiving table with NAs using
mutate(),Join the tables and add suffixes “_rm” to the receiving table with NAs and “” to the joined columns,
Remove the columns that end with “_rm” (since they contains NAs).
## 1. Columns to be passed are 'lc_no' and 'lc_code'
tree <- tree |>
## 2. Create these columns with NAs
mutate(lc_no = NA, lc_code = NA) |>
## 3. join with controlled suffix names
left_join(tmp_lcs, by = join_by(plot_no, subplot_no, lcs_no), suffix = c("_rm", "")) |>
## 4. remove initial columns with NAs
select(-ends_with("_rm"))Practice ‘joins with controlled suffix’ to join Chave E at plot level from anci$plot_E with the Exercise 9.
Answers to Exercise 9:
## Your code
# names(anci$plot_E)
tree <- tree |>
mutate(plot_E = NA) |>
left_join(anci$plot_E, by = join_by(plot_no), suffix = c("_rm", "")) |>
select(-ends_with("_rm"))
summary(tree$plot_E) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.08457 0.19793 0.24559 0.23280 0.28740 0.41069
Now that we have land cover information at tree level we can make a figure of tree location with color based on land cover code in Exercise 10.
Exercise 10 (~ (optional) Figure of tree location with land cover)
Remake the figure of Exercise 7 with these differences:
Change the color of trees based on lc_code
Remove geom_label_repel() to stop showing the azimuths
Remove hline and vline
Add geom_abline() with intercept = 0 and slope = 1
Add geom_abline() with intercept = 0 and slope = -1
Check visually that different land cover are in different quadrants
Type here answers to Exercise 10:
## !!! Solution
tree |>
filter(plot_no == 123, subplot_no == "B") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(color = lc_code)) +
geom_path(data = circ16, aes(x = x, y = y)) +
geom_abline(intercept = 0, slope = 1) +
geom_abline(intercept = 0, slope = -1) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", color = "land cover")Subplot ‘123B’ is split between two land covers, Mixed deciduous (MD) in the land cover sections (LCS) 1, 2 and 5, and agriculture in LCS 3 and 4.
6.2.3 (optional) Final graph
We can add the square center LCS with a specific table
center_sq <- tibble(x = c(-6, 6, 6, -6, -6), y = c(6, 6, -6, -6, 6))
tree |>
filter(plot_no == 123, subplot_no == "B") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(color = lc_code)) +
geom_path(data = circ16, aes(x = x, y = y)) +
geom_abline(intercept = 0, slope = 1) +
geom_abline(intercept = 0, slope = -1) +
geom_path(data= center_sq, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", color = "LCS")6.2.4 Bonus: Tree DBH true to size on figures with the ggforce package.
Graphs with ggplot() don’t natively intend to represent points true to size with geom_point(), as size aesthetic is optimized for visibility. Here we represent the same figure as above but with point size change based on tree DBH
tree |>
filter(plot_no == 123, subplot_no == "B") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(color = lc_code, size = tree_dbh)) +
geom_path(data = circ16, aes(x = x, y = y)) +
geom_abline(intercept = 0, slope = 1) +
geom_abline(intercept = 0, slope = -1) +
geom_path(data= center_sq, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", color = "LCS", size = "DBH (cm)")Now if we want to see trees’ DBH true to size, with the plot area, we can use geom_circle() from the package ggforce.
tree |>
filter(plot_no == 123, subplot_no == "B") |>
ggplot(aes(x = tree_x, y = tree_y)) +
geom_point(aes(fill = lc_code), shape = 21, size = 3, stroke = 0) +
geom_circle(aes(x0 = tree_x, y0 = tree_y, r = tree_dbh/200, fill = lc_code)) +
geom_path(data = circ16, aes(x = x, y = y)) +
geom_abline(intercept = 0, slope = 1) +
geom_abline(intercept = 0, slope = -1) +
geom_path(data= center_sq, aes(x = x, y = y)) +
coord_fixed() +
labs(x = "X (m)", y = "Y (m)", fill = "LC code", size = "DBH (cm)")Here we kept the points as the trees are small and wouldn’t be visible otherwise.
7 Session 07: tree weights and basal area
Goals
Calculate tree weights for ratio estimators and for subplot per ha values
Calculate tree basal area
7.1 Tree weights for nested circles adjustment
7.1.1 Tree weight for ratio estimator
In the context of ratio estimator, tree weight converts small size trees’ variables to their equivalent if small trees were measured in the same area as the bigger sized trees.
For Lao PDR, the ratio between small and big trees is 4:
tree <- tree |> mutate(tree_weight = if_else(tree_dbh < 30, 4, 1))Let’s calculate where this value 4 comes from in Exercise 11.
Type here answers to Exercise 11:
## Solution
Asmall <- pi * 8^2
Abig <- pi * 16^2
Abig / Asmall [1] 4
7.1.2 Tree weight for subplot per ha values
To get per ha at the subplot level, we first assign to each tree its subplot nested circle area, then the weight is the inverse of this area
tree <- tree |>
mutate(
tree_spha = if_else(tree_dbh < 30, pi * 16^2 / 10000, pi * 16^2 / 10000),
tree_weight_spha = 1 / tree_spha
)There might be an error in the above code snippet, let’s solve it in Exercise 12.
Type here answers to Exercise 12:
## !!! Solution
tree <- tree |> mutate(
tree_spha = if_else(tree_dbh < 30, pi * 8^2 / 10000, pi * 16^2 / 10000),
tree_weight_spha = 1 / tree_spha
)7.1.3 Calculate tree basal area
The basal area of a tree is its stem area footprint based on the measurement of it diameter at breast height.
tree <- tree |> mutate(tree_ba = pi * (tree_dbh/200)^2)Let’s check subplot level basal area per land cover class in Exercise 13.
Type here answers to Exercise 13:
## !!! Solution
tmp_sp <- subplot |> select(plot_no, subplot_no, subplot_lc_class_center)
tree |>
group_by(plot_no, subplot_no) |>
summarise(subplot_ba = sum(tree_ba * tree_weight_spha), .groups = "drop") |>
left_join(tmp_sp, by = join_by(plot_no, subplot_no)) |>
ggplot(aes(x = subplot_lc_class_center, y = subplot_ba)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, shape = 21)BONUS: The function geom_jitter() is a nice addition to boxplot figures to see all the subplot values on top of their distribution.
8 Session 08: Tree aboveground biomass
GOALS:
add AGB at tree level for each forest type
Add AGB from chave 05 and 14 without height
Make a figure for natural forest and one for plantations, showing forest type AGB, and chave models
8.1 List of models
"EG" ~ 0.3112 * tree_dbh^2.2331,"MD" ~ 0.523081 * tree_dbh^2,"DD" ~ 0.2137 * tree_dbh^2.2575,"CF" ~ 0.1277 * tree_dbh^2.3944,"MCB" ~ 0.1277 * tree_dbh^2.3944,"P_AC" ~ 0.1173 * tree_dbh^2.454,"P_EC" ~ 0.199 * tree_dbh^2.185,"P_RB" ~ 0.0082 * (pi*tree_dbh)^2.5623, ## Rubber model uses circumference"P_TK" ~ 0.077 * tree_dbh^2.546,"P_OTH" ~ 0.3112 * tree_dbh^2.2331,"RV" ~ 0.6 * exp(-1.499 + 2.148 * log(tree_dbh) + 0.207 * (log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3),"B" ~ 0.6 * exp(-1.499 + 2.148 * log(tree_dbh) + 0.207 * (log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3),Chave14 = round(exp(-1.803 - 0.976plot_E + 0.976log(0.6) + 2.673log(tree_dbh) -0.0299(log(tree_dbh))^2)Chave05 = round(0.6 * exp(-1.499 + 2.148log(tree_dbh) + 0.207(log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3)
8.2 AGB per forest type
We can assign different AGB models to trees for different forest types with mutate() and case_when():
#table(tree$lc_code)
tree <- tree |>
mutate(
tree_agb_final = case_when(
lc_code == "EG" ~ 0.3112 * tree_dbh^2.2331,
## ADD OTHER equations here
TRUE ~ 0
)
)Let’s fill in the other cases in the Exercise 14.
Type here answers to Exercise 14:
## !!! Solution
tree <- tree |>
mutate(
tree_agb_final = case_when(
lc_code == "EG" ~ 0.3112 * tree_dbh^2.2331,
lc_code == "MD" ~ 0.523081 * tree_dbh^2,
lc_code == "DD" ~ 0.2137 * tree_dbh^2.2575,
lc_code == "CF" ~ 0.1277 * tree_dbh^2.3944,
lc_code == "MCB" ~ 0.1277 * tree_dbh^2.3944,
lc_code == "P_AC" ~ 0.1173 * tree_dbh^2.454,
lc_code == "P_EC" ~ 0.199 * tree_dbh^2.185,
lc_code == "P_RB" ~ 0.0082 * (pi*tree_dbh)^2.5623,
## > Rubber model uses circumference
lc_code == "P_TK" ~ 0.077 * tree_dbh^2.546,
lc_code == "P_OTH" ~ 0.3112 * tree_dbh^2.2331,
lc_code == "RV" ~ 0.6 * exp(-1.499 + 2.148 * log(tree_dbh) + 0.207 * (log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3),
lc_code == "B" ~ 0.6 * exp(-1.499 + 2.148 * log(tree_dbh) + 0.207 * (log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3),
TRUE ~ 0
)
)
# tree |>
# ggplot(aes(x = tree_dbh, y = tree_agb_final)) +
# geom_point()
# tree |>
# ggplot(aes(x = tree_dbh, y = tree_agb_final)) +
# geom_line(aes(color = lc_code))8.3 AGB from Chave 2005 and 2014
Let’s add 3 models to all trees, regardless of their forest type to compare with the forest type based models, in Exercise 15.
Type here answers to Exercise 15:
## !!! Solution
tree <- tree |>
mutate(
tree_agb_chave05 =
0.6 * exp(
-1.499 + 2.148*log(tree_dbh) +
0.207*(log(tree_dbh))^2 - 0.0281*(log(tree_dbh))^3
),
tree_agb_chave14 =
exp(
-1.803 - 0.976*plot_E + 0.976*log(0.6) +
2.673*log(tree_dbh) - 0.0299*(log(tree_dbh))^2
),
tree_agb_EG = 0.3112 * tree_dbh^2.2331
)8.4 Compare AGB models
8.4.1 Natural forest
Let’s compare our forest type specific tree aboveground with the Chave’s models and the model for Evergreen forests. Since there are many forest type let’s look at natural forests first with filter().
tree |>
filter(tree_agb_final != 0, lc_no <= 29) |>
ggplot(aes(x = tree_dbh, y = tree_agb_final)) +
geom_point() +
geom_line(aes(y = tree_agb_chave05), color = "grey60") +
geom_line(aes(y = tree_agb_chave14), color = "darkred") +
geom_line(aes(y = tree_agb_EG), color = "darkgreen") +
facet_wrap(~lc_no)We can observe in this figure that the forest type specific models are quite conservative, since our AGB final model predicts smaller AGB for the same DBH compared to both Chave’s models.
8.4.2 (optional) Planted forest
Let’s produce the same figure for planted forests in Exercise 16.
Exercise 16 (~ Compare models for plantations)
- Make a similar graph than 3.1 but for planted forest only
Type here answers to the Exercise 16:
## Your code here
## !!! SOL
tree |>
filter(tree_agb_final != 0, lc_no >= 160) |>
ggplot(aes(x = tree_dbh, y = tree_agb_final)) +
geom_point() +
geom_line(aes(y = tree_agb_chave05), color = "grey60") +
geom_line(aes(y = tree_agb_chave14), color = "darkred") +
geom_line(aes(y = tree_agb_EG), color = "darkgreen") +
facet_wrap(~lc_code)## !!!Plantation models selected for AGB final are also conservative in general, except for the rubber model.
9 Session 09: Aggregate tree to forest type AGB
GOALS:
- Calculate the mean aboveground biomass for all forest types using plot level land cover and systematic sampling.
- Calculate the mean aboveground biomass for all forest types using ratio estimators and 2-phases sampling for post-stratification.
- Compare estimates.
9.1 AGB with plot level land cover and systematic sampling
Step-by-step
Sum tree AGB to subplots.
Make plot majority LC class.
Create plot level AGB based on mean subplot AGB.
Create Forest type AGB as the mean of plot AGB per forest type.
Calculate sampling error.
9.1.1 Sum tree AGB to subplot
We will use group_by() and summarise() to bring tree characteristics to the subplot level.
subplot_agb <- tree |>
group_by(plot_no, subplot_no) |>
summarise(
sp_agb = sum(tree_agb_final * tree_weight_spha / 1000),
sp_ba = sum(tree_ba * tree_weight_spha),
sp_dens = sum(tree_weight_spha),
tree_meas = n(),
.groups = "drop"
)9.1.2 Load plot level main land cover
Plot level land cover was generated, from the subplots center LC, with the following rules:
- For plots with one unique LC, they were directly assigned to plot level.
- For plots with multiple land cover, we used the majority LC for the whole plot.
- If some plots have equal importance of two land covers, we used to minimum land cover class.
plot_lc <- read_csv("data/training_plot_lc.csv", show_col_types = FALSE)9.1.3 Aggregate subplot to plot
We take the mean of subplot variables (except tree measured count).
plot_agb <- subplot_agb |>
group_by(plot_no) |>
summarise(
plot_agb = mean(sp_agb),
plot_ba = mean(sp_ba),
plot_dens = mean(sp_dens),
plot_tree_meas = sum(tree_meas)
) |>
left_join(plot_lc, by = join_by(plot_no))Let’s look at plot AGB per land cover in Exercise 17.
Type here the answers to Exercise 17:
## !!! Solution
plot_agb |>
ggplot(aes(x = plot_lc, y = plot_agb)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, size = 0.8)9.1.4 Aggregate to forest type AGB
Now we can take averages of plot AGB for each forest type. For the uncertainty, we also need standard deviation of AGB and the number of plots.
ftype_agb <- plot_agb |>
group_by(plot_lc) |>
summarise(
plot_count = n(),
total_tree_meas = sum(plot_tree_meas),
ftype_dens = mean(plot_dens),
ftype_ba = mean(plot_ba),
ftype_agb = mean(plot_agb),
ftype_agb_sd = sd(plot_agb)
)We can calculate the uncertainty for systematic design with this formula:
\[ U\% = t^{1 - \alpha/2}_{n-1} \times \frac{\sigma}{\sqrt{n}} \times \frac{1}{\mu} \times 100 \]
The student’s law function in R is qt(), and uncertainty can be calculated from the AGB mean and standard deviation as follows:
ftype_agb <- ftype_agb |>
mutate(
ftype_agb_se = ftype_agb_sd / sqrt(plot_count),
ftype_agb_me = qt(0.975, plot_count - 1) * ftype_agb_se,
ftype_agb_U = ftype_agb_me / ftype_agb * 100,
ftype_agb_cilower = ftype_agb - ftype_agb_me,
ftype_agb_ciupper = ftype_agb + ftype_agb_me
)Warning: There was 1 warning in `mutate()`.
ℹ In argument: `ftype_agb_me = qt(0.975, plot_count - 1) * ftype_agb_se`.
Caused by warning in `qt()`:
! NaNs produced
We can then make a plot of forest type AGB with sampling uncertainty. We added a limit on the y-axis to avoid uncertainty of Coniferus Forest (CF, code 14) condensing the whole figure with ylim(0, 600).
ftype_agb |>
ggplot(aes(x = plot_lc, y = ftype_agb)) +
geom_col(aes(fill = plot_lc), col = "black") +
geom_errorbar(aes(ymin = ftype_agb_cilower, ymax = ftype_agb_ciupper)) +
theme(legend.position = "none") +
ylim(0, 600)9.2 AGB with ratio estimator and 2-phases sampling for post-stratification
Step-by-step:
Prepare subplot x LCS level table
Use the aggregate function ‘nfi_aggregate3()’ to get all aggregation levels
Compare results with equal plot statistiscs (table ‘ftype’)
9.2.1 Prepare the subplot x LCS table for aggregation
9.2.1.1 Initial data preparation
For the second approach, all subplots must be accounted for, visited or not, accessible or not. This table has been prepared and can be loaded into the ancillary data list anci.
anci$ph2 <- read_csv("data/training_anci_phase2.csv", show_col_types = F)
ph2_subplot <- anci$ph2
# format(nrow(anci$ph2) / 20, big.mark = ",")We can observe that anci$ph2 contains 1,067 (1,067) plots, as initially planned in the 12 x 18 km grid.
To generate this table, instead of starting from tree aggregates or from the subplot table, we start from the vector of planned Phase 2 plots and we recreate all combinations of subplot x LCS with expand_grid() .
vec_ph2 <- anci$ph1 |>
filter(!is.na(plot_id)) |>
pull(plot_id) |>
sort()
test_ph2 <- expand_grid(
plot_id = vec_ph2,
subplot_no = c("A", "B", "C", "D"),
lcs_no = 1:5
) |>
mutate(subplot_id = paste0(subplot_no, lcs_no))We can check that this table is identical to anci$ph2 with all.equal():
tmp_ph2 <- anci$ph2 |>
select(plot_id, subplot_no, lcs_no, subplot_id)
all.equal(tmp_ph2, test_ph2)[1] TRUE
We then need to add accessibility from subplot and subpopulation and strata from phase 1 data with left_join() in Exercise 18.
Exercise 18 (~ (optional) Join core info to phase 2 table)
Prepare ‘tmp_sp’ a table containing
plot_no,subplot_noandsubplot_accessfrom the subplot table and renameplot_nointoplot_id.Prepare ‘tmp_ph1’ a table containing
plot_id,subpop,stratum,ph1_prov_noandph1_prov_namefrom the tableanci$ph1and renameph1_prov_nointoprov_noandph1_prov_nameintoprov_nameJoin these 2 tables to
ph2_subplotwith suffix control method.
Type here answers to Exercise 18:
## !!! Solution
tmp_sp <- subplot |>
select(plot_id = plot_no, subplot_no, subplot_access)
tmp_ph1 <- anci$ph1 |>
select(plot_id, subpop, stratum, prov_no = ph1_prov_no, prov_name = ph1_prov_name)
ph2_subplot <- ph2_subplot |>
mutate(
subplot_access = NA,
subpop = NA,
stratum = NA,
prov_no = NA,
prov_name = NA
) |>
left_join(tmp_ph1, by = join_by(plot_id), suffix = c("_rm", "")) |>
left_join(tmp_sp, by = join_by(plot_id, subplot_no), suffix = c("_rm", "")) |>
select(-ends_with("_rm"))In the final preparation phase 2 subplot table, correction for shifted plots was implemented beforehand.
9.2.1.2 (NOTE) Investigating phase 2 plots in stratum 4: non-forest
The Phase 2 focused on forest and RV land cover, but some plots from Phase 1 and planned for phase 2 were non-forest in the Phase 1 interpretation and still visited.
tmp_ph2_nonforest <- anci$ph1 |>
filter(!is.na(plot_id), stratum == 4)
vec_ph2nf <- tmp_ph2_nonforest |> pull(ph1_plot_no)
tmp_ph2nf_visited <- subplot |>
filter(plot_no %in% vec_ph2nf, subplot_no == "A")
## Save the tables in a new folder
if (!"results" %in% list.files()) dir.create("results")
write_csv(tmp_ph2_nonforest, "results/ph2_nonforest_planned.csv")
write_csv(tmp_ph2nf_visited, "results/ph2_nonforest_visited.csv")9.2.1.3 Re-coding accessibility
Accessibility plays an important role in the final estimates. Since not all the initial 1,067 plots were visited, there is missing information on many non-forest plots and some forest plots. To simplify the calculations, it was considered that:
all non-visited forest plots would be considered as non-accessible,
all non-visited non-forest plots would be considered as accessible with biomass 0.
This is important to keep in mind in case biomass of trees outside forest becomes relevant for future NFI cycles.
Code for re-coding accessibility:
ph2_subplot <- ph2_subplot |>
mutate(
access = case_when(
plot_id <= 636 & subplot_access == "accessible" ~ TRUE,
plot_id <= 636 & subplot_access != "accessible" ~ FALSE,
plot_id <= 636 & is.na(subplot_access) ~ FALSE,
stratum %in% 1:3 ~ FALSE,
stratum == 4 ~ TRUE,
TRUE ~ NA
)
)9.2.1.4 Subplot largest area
For aggregation with ratio estimator we need to keep track of the areas where trees are measured. Since we have a tree weight for small trees, we are only interested in the larger area:
12 m side square of a quarter
difference between the 16m radius circle and the 12m side square.
ph2_subplot <- ph2_subplot |>
mutate(
sp_area = if_else(lcs_no == 1, 12^2, (pi*16^2 - 12^2)/4) / 10000
)9.2.1.5 Aggregate tree to subplot x LCS
We can now aggregate trees to the subplot x LCS level and join with the phase 2 subplot table.
ph2_sp_tree <- tree |>
select(plot_id = plot_no, subplot_no, lcs_no, tree_weight, tree_ba, tree_agb_final) |>
mutate(
subplot_id = paste0(subplot_no, lcs_no)
) |>
group_by(plot_id, subplot_id) |>
summarise(
dens = sum(tree_weight),
ba = sum(tree_ba * tree_weight),
agb = sum(tree_agb_final * tree_weight) / 1000, ## Convert kg to tons
.groups= "drop"
)ph2_subplot <- ph2_subplot |>
mutate(
dens = NA,
ba = NA,
agb = NA
) |>
left_join(ph2_sp_tree, by = join_by(plot_id, subplot_id), suffix = c("_rm", "")) |>
select(-ends_with("_rm")) |>
mutate(
dens = if_else(!is.na(dens), n_tree, 0),
ba = if_else(!is.na(ba), ba, 0),
agb = if_else(!is.na(agb), agb, 0)
)9.2.2 Run the aggregation function
The aggregation function 3, for 2-phases sampling with ratio estimator, takes phase 1 and phase 2 data as input as well as the attributes \(y\) and \(x\) for the ratio. \(y\) can change based on what attribute we want to aggregate, while \(x\) is always the minimum measured area.
Let’s see an example with tree density and no subpopulation.
ph2_data <- ph2_subplot |> mutate(subpop = 1)
res3_dens <- nfi_aggregate3(
.ph1_df = anci$ph1,
.ph2_sp = ph2_data,
.class_d = lc_no,
.attr_y = dens,
.attr_x = sp_area,
.aoi_area = 23680000
)
res3_dens_tot <- res3_dens$totals_shortThe function output is a list with different levels of data aggregation.
Practice now the function nfi-aggregate3() to get basal area and aboveground biomass in Exercise 19.
Exercise 19 (~ Aggregate basal area and AGB)
(optional) Create
res3_baas the output of the aggregation function with basal area as.attr_y.Create
res3_agbas the output of the aggregation function with aboveground biomass as.attr_y.Save the totals_short output from res3_agb to a new object
res3_agb_tot.Make a barplot with AGB per ha against land cover with their error bars, for natural forest only.
Type here answers to Exercise 19:
## !!! Solution
ph2_data <- ph2_subplot |> mutate(subpop = 1)
res3_agb <- nfi_aggregate3(
.ph1_df = anci$ph1,
.ph2_sp = ph2_data,
.class_d = lc_no,
.attr_y = agb,
.attr_x = sp_area,
.aoi_area = 23680000
)
res3_agb_tot <- res3_agb$totals_short
res3_agb_tot |>
filter(lc_no < 20) |>
ggplot(aes(x = lc_no, y = Rd)) +
geom_col(aes(fill = lc_no), col = "grey80") +
theme(legend.position = "none") +
geom_errorbar(aes(ymin = Rd - Rd*Rd_mep/100, ymax = Rd + Rd*Rd_mep/100)) +
labs(
x = "Land cover code",
y = "AGB (ton (DM)/ha)"
)We can use an additional package: kableExtra, to customize tables with adding extra headers and even colors. However this package focuses on HTML and PDF outputs, for DOCX, we stick to simple kable() function. Let’s show the totals for AGB:
tab_agb <- res3_agb$totals_d |>
select(lc_no, Xd, Xd_mep, Xtot, Yd, Yd_mep, Rd, Rd_mep, Ytot) |>
filter(lc_no <= 22 | lc_no >= 160)
if (knitr::is_html_output() | knitr::is_latex_output()){
kbl(
tab_agb,
col.names = c(
"Land cover", "prop.", "U (perc)", "tot.", "prop.", "U (perc)",
"ton (DM)/ha", "U (perc)", "tot."
),
format.args = list(big.mark = ",", digits = 2),
booktabs = F
) |>
kable_styling(
bootstrap_options = c("bordered", "condensed", "responsive"),
full_width = FALSE
) |>
add_header_above(
c(" " = 1, "Area" = 3, "AGB" = 5), border_left = T, border_right = T
) |>
column_spec(1, border_left = T) |>
column_spec(9, border_right = T)
} else {
kable(
tab_agb,
col.names = c(
"Land cover", "Area prop.", "Area U (perc)", "Area tot. (ha)", "AGB prop.",
"AGB prop. U (perc)", "AGB (ton DM/ha)", "AGB U (perc)", "AGB tot. (tons)"
),
format.args = list(big.mark = ",", digits = 2)
)
}| Land cover | prop. | U (perc) | tot. | prop. | U (perc) | ton (DM)/ha | U (perc) | tot. |
|---|---|---|---|---|---|---|---|---|
| 11 | 0.00525 | 47.4 | 124,260 | 1.297 | 60.0 | 247 | 35.2 | 3.1e+07 |
| 12 | 0.15406 | 6.2 | 3,648,029 | 15.750 | 8.1 | 102 | 5.5 | 3.7e+08 |
| 13 | 0.01993 | 23.9 | 471,951 | 0.845 | 35.3 | 42 | 22.6 | 2.0e+07 |
| 14 | 0.00107 | 101.4 | 25,424 | 0.133 | 141.4 | 124 | 69.8 | 3.2e+06 |
| 15 | 0.00456 | 45.9 | 107,889 | 0.601 | 49.5 | 132 | 12.0 | 1.4e+07 |
| 21 | 0.02898 | 18.7 | 686,166 | 1.354 | 29.2 | 47 | 23.0 | 3.2e+07 |
| 22 | 0.07202 | 10.7 | 1,705,431 | 1.495 | 21.1 | 21 | 18.3 | 3.5e+07 |
| 161 | 0.00041 | 116.5 | 9,725 | 0.017 | 143.5 | 42 | 73.6 | 4.1e+05 |
| 162 | 0.00159 | 84.0 | 37,725 | 0.029 | 164.7 | 18 | 125.0 | 6.9e+05 |
| 164 | 0.00736 | 34.5 | 174,181 | 1.025 | 40.6 | 139 | 22.5 | 2.4e+07 |
| 165 | 0.00035 | 105.5 | 8,177 | 0.028 | 101.2 | 81 | 11.7 | 6.6e+05 |
| 169 | 0.00132 | 84.8 | 31,237 | 0.041 | 132.9 | 31 | 78.8 | 9.7e+05 |
9.3 Compare AGB estimates from the 2 aggregation methods
Check AGB from 2 methods
tmp_agb1 <- ftype_agb |>
filter(plot_lc %in% c("11", "12", "13", "14", "15", "21", "22")) |>
mutate(
plot_lc = as.numeric(plot_lc),
method = "simple"
) |>
select(method, lc_no = plot_lc, agb = ftype_agb, agb_U = ftype_agb_U)
tmp_agb2 <- res3_agb$totals_short |>
filter(lc_no <= 22) |>
mutate(method = "ratio estimator") |>
select(method, lc_no, agb = Rd, agb_U = Rd_mep)
agb_compa <- tmp_agb1 |> bind_rows(tmp_agb2)
agb_compa |>
ggplot(aes(x = as.character(lc_no), y = agb)) +
geom_col(aes(fill = method), col = "grey30", position = position_dodge()) +
geom_errorbar(aes(group = method, ymin = agb - agb*agb_U/100, ymax = agb + agb*agb_U/100), position = position_dodge()) +
ylim(0, 600)The ratio estimator method had big impact on EG and RV. Also big influence on uncertainty of CF.
10 Session 10: Building a R community of practice in Lao PDR
The idea behind this sessions was to understand who else than FIPD is using R and R studio for forestry data analysis and if it would be of interest to pool resources together to increase capacities and share knowledge on these analyses.
For this meeting, FIPD participants were joined by representatives from: NuOL, Lao National Statistic Office, NAFRI, DALAM, WWF and WCS.
Participants presented scientific and project studies, for which the data analysis involved R codes, but in most cases the analysis were carried out by international experts.
Participants expressed interest using R for forestry data analysis, but the capacity are overall relatively low. They also expressed interest in building a group to share studies, and maybe a private group where non-public data could also be shared.
FIPD, F-REDD and FAO will discuss ion the feasibility to host a communication channel and how other institutions could be involved in further training on R for forestry data analysis.
Conclusion
The training aimed to continue FIPD capacity building to handle the NFI data analysis with R. This training session focused on core parts of full calculation chain for the NFI cycle 4. The trainings elements were:
Solving subplot coding errors manually based on time stamps
Joining subplot level and phase 1 information to trees
Calculating tree weights, basal area and aboveground biomass
Aggregating tree aboveground biomass to domain average AGB per ha based on equal area plots and simple averages
Practicing a custom function to get areas and AGB estimates per ha and totals following double sampling for post-stratification with ratio estimator.
This cycle adopted a more complex method than previous cycles with the introduction of ratio estimators and the inclusion of the Phase 1 observations in the estimation of domain estimates of core variables’ averages and totals. This method is more sensitive to stratum assignment and few plots from stratum 4 (non-forest) were found to be forested and have tree records. This may explain the large differences between simple averages and weighted averages for two-phases sampling, in particular for evergreen forest and regenerating vegetation and requires further investigation. These plots may be shifted plots where the land cover was re-assigned but stratum was not.
In terms of capacity building, FIPD team expressed interest in adopting R for forest management and small scale forest estimations, which are more frequent requests for them, in order to have more opportunities to practice. Additional training was also suggested to share and practice the calculations for other pools.
ANNEX: Agenda
ANNEX: participants list
Note: Participants below FAO joined only the last morning session on building a community of practice.