Sustainability Journal (MDPI)
2009 | 1,010,498,008 words
Sustainability is an international, open-access, peer-reviewed journal focused on all aspects of sustainability—environmental, social, economic, technical, and cultural. Publishing semimonthly, it welcomes research from natural and applied sciences, engineering, social sciences, and humanities, encouraging detailed experimental and methodological r...
Creating Sustainable Flood Maps Using Machine Learning and Free Remote...
Héctor Leopoldo Venegas-Quiñones
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
Pablo García-Chevesich
Department of Civil and Environmental Engineering, Colorado School of Mines, 1500 Illinois St., Golden, CO 80401, USA
Rodrigo Valdés-Pineda
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
Ty P. A. Ferré
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
Hoshin Gupta
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
Derek Groenendyk
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
Juan B. Valdés
Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA
John E. McCray
Department of Civil and Environmental Engineering, Colorado School of Mines, 1500 Illinois St., Golden, CO 80401, USA
Laura Bakkensen
School of Government and Public Policy, University of Arizona, 1145 S Campus Drive, Tucson, AZ 85721, USA
Download the PDF file of the original publication
Year: 2024 | Doi: 10.3390/su16208918
Copyright (license): Creative Commons Attribution 4.0 International (CC BY 4.0) license.
[Full title: Creating Sustainable Flood Maps Using Machine Learning and Free Remote Sensing Data in Unmapped Areas]
[[[ p. 1 ]]]
[Summary: This page provides citation information for the study and lists the authors and their affiliations. It includes the abstract, which summarizes the study's use of a Random Forest model to predict flood hazard in several states. Keywords related to the research are also listed, along with an introduction highlighting the increasing impact of floods.]
Citation: Venegas-Quiñones, H.L.; Garc í a-Chevesich, P.; Vald é s-Pineda, R.; Ferr é , T.P.A.; Gupta, H.; Groenendyk, D.; Vald é s, J.B.; McCray, J.E.; Bakkensen, L. Creating Sustainable Flood Maps Using Machine Learning and Free Remote Sensing Data in Unmapped Areas Sustainability 2024 , 16 , 8918. https:// doi.org/10.3390/su 16208918 Academic Editor: Xander Wang Received: 14 August 2024 Revised: 10 October 2024 Accepted: 11 October 2024 Published: 15 October 2024 Copyright: © 2024 by the authors Licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/) sustainability Article Creating Sustainable Flood Maps Using Machine Learning and Free Remote Sensing Data in Unmapped Areas H é ctor Leopoldo Venegas-Quiñones 1, * , Pablo Garc í a-Chevesich 2,3 , Rodrigo Vald é s-Pineda 1 , Ty P. A. Ferr é 1 , Hoshin Gupta 1 , Derek Groenendyk 1 , Juan B. Vald é s 1 , John E. McCray 2 and Laura Bakkensen 4 1 Hydrology and Water Resources Department, University of Arizona, 1133 E James E. Rogers Way, Tucson, AZ 85721, USA; [email protected] (R.V.-P.); [email protected] (T.P.A.F.); [email protected] (H.G.); [email protected] (D.G.); jbv [email protected] (J.B.V.) 2 Department of Civil and Environmental Engineering, Colorado School of Mines, 1500 Illinois St., Golden, CO 80401, USA; [email protected] (P.G.-C.); [email protected] (J.E.M.) 3 Intergubernmental Hydrological Programme, United Nations Educational, Scientific, and Cultural Organization, Montevideo 11200, Uruguay 4 School of Government and Public Policy, University of Arizona, 1145 S Campus Drive, Tucson, AZ 85721, USA; [email protected] * Correspondence: [email protected] Abstract: This study leverages a Random Forest model to predict flood hazard in Arizona, New Mexico, Colorado, and Utah, focusing on enhancing sustainability in flood management. Utilizing the National Flood Hazard Layer (NFHL), an intricate flood map of Arizona was generated, with the Random Forest Classification algorithm assessing flood hazard for each grid cell. Weather variable predictions from TerraClimate were integrated with NFHL classifications and Digital Elevation Model (DEM) analyses, providing a comprehensive understanding of flood dynamics. The research highlights the model’s capability to predict flood hazard in areas lacking NFHL classifications, thereby supporting sustainable flood management by elucidating weather’s influence on flood hazard. This approach aligns with sustainable development goals by aiding in resilient infrastructure design and informed urban planning, reducing the impact of floods on communities. Despite recognizing constraints such as input data precision and the model’s potential limitations in capturing complex variable interactions, the methodology offers a robust framework for flood hazard evaluation in other regions. Integrating diverse data sources, this study presents a valuable tool for decision-makers, supporting sustainable practices, and enhancing the resilience of vulnerable regions against flood hazards. This integrated approach underscores the potential of advanced modeling techniques in promoting sustainability in environmental hazard management Keywords: machine learning; flood hazard assessment; remote sensing; random forest model; flood mapping 1. Introduction Flood occurrences and intensity are rising, significantly impacting various regions [ 1 ]. Changes in rainfall and land use amplify flood hazards [ 2 ], with long-lasting effects such as soil erosion, water pollution, and disease spread [ 3 , 4 ]. Understanding flood trends and causes is crucial for developing effective hazard reduction and adaptation strategies [ 5 ]. Floods can have severe economic and human impacts [ 6 ]. A Stanford study found that increased rainfall caused a third of the USD 199 billion flood damages in the U.S. from 1988 to 2017 [ 7 ]. Floods also result in fatalities, physical harm, and psychological distress [ 7 ], with 1336 flood-related deaths reported in the U.S. from 2000 to 2010 [ 8 ]. Demographic factors like age and income influence flood mortality rates [ 9 ]. Accurate, current flood maps are essential for effective flood hazard management and decision-making at all levels [ 10 , 11 ]. Sustainability 2024 , 16 , 8918. https://doi.org/10.3390/su 16208918 https://www.mdpi.com/journal/sustainability
[[[ p. 2 ]]]
[Summary: This page discusses the importance of accurate flood maps for land use planning, infrastructure development, and emergency response. It contrasts flood hazard maps with flood risk maps and mentions FEMA's role in creating flood maps. The page also highlights the cost of creating and maintaining these maps and the challenges faced by developing countries.]
Sustainability 2024 , 16 , 8918 2 of 17 These maps inform land use planning, infrastructure development, and emergency response. They also raise public awareness and promote proactive measures [ 11 , 12 ]. The rising frequency and severity of floods globally, driven by climate change, urbanization, and deforestation, underscore the need for reliable flood maps [ 13 ]. Advances in technology and data collection have led to more detailed and accurate flood maps, using tools like remote sensing and GIS mapping [ 14 ]. These maps are crucial for saving lives, protecting property, and guiding decision-making. In the U.S., the Federal Emergency Management Agency (FEMA) produces flood maps to manage flood hazards [ 11 , 15 ], including Flood Insurance Rate Maps (FIRMs), which use diverse data sources to identify flood hazard areas (Figure 1 ) [ 16 , 17 ]. A flood hazard map illustrates areas potentially affected by flood events, highlighting regions prone to flooding based on historical data, topography, and hydrological models. Conversely, a flood risk map integrates hazard information with socio-economic factors, assessing the potential impact on populations, infrastructure, and assets. It considers not only the likelihood of flooding but also the vulnerability and exposure of the affected area, providing a more comprehensive view of the potential consequences. FEMA classifies flood hazard areas into various zones to help communities and individuals understand their risk and make informed decisions [ 16 ]. The cost of creating and maintaining flood maps is significant, influenced by factors like area size, detail required, and data availability [ 18 ]. Developing these maps involves substantial data collection, modeling, and expert analysis [ 16 , 19 ]. FEMA’s national FIRM creation could cost up to USD 11.8 billion, with annual maintenance costs between USD 107 million and USD 480 million [ 20 ]. Sustainability 2024 , 16 , x FOR PEER REVIEW 2 of 17 levels [10,11]. These maps inform land use planning, infrastructure development, and emergency response. They also raise public awareness and promote proactive measures [11,12]. The rising frequency and severity of fl oods globally, driven by climate change, urbanization, and deforestation, underscore the need for reliable fl ood maps [13]. Advances in technology and data collection have led to more detailed and accurate fl ood maps, using tools like remote sensing and GIS mapping [14]. These maps are crucial for saving lives, protecting property, and guiding decision-making. In the U.S., the Federal Emergency Management Agency (FEMA) produces fl ood maps to manage fl ood hazards [11,15], including Flood Insurance Rate Maps (FIRMs), which use diverse data sources to identify fl ood hazard areas (Figure 1) [16,17]. A fl ood hazard map illustrates areas potentially a ff ected by fl ood events, highlighting regions prone to fl ooding based on historical data, topography, and hydrological models. Conversely, a fl ood risk map integrates hazard information with socio-economic factors, assessing the potential impact on populations, infrastructure, and assets. It considers not only the likelihood of fl ooding but also the vulnerability and exposure of the a ff ected area, providing a more comprehensive view of the potential consequences. FEMA classi fi es fl ood hazard areas into various zones to help communities and individuals understand their risk and make informed decisions [16]. The cost of creating and maintaining fl ood maps is signi fi cant, in fl uenced by factors like area size, detail required, and data availability [18]. Developing these maps involves substantial data collection, modeling, and expert analysis [16,19]. FEMA’s national FIRM creation could cost up to USD 11.8 billion, with annual maintenance costs between USD 107 million and USD 480 million [20]. Approximately 80% of those at risk from river fl oods live in 15 developing countries, including India, Bangladesh, China, and Vietnam, which often lack resources to mitigate fl ood impacts [21]. In these countries, the absence of fl ood hazard maps exacerbates disaster readiness and response challenges. Figure 1. Geographic Distribution of FEMA Flood Insurance Rate Maps in the United States. Machine Learning, Random Forest Classi fi cation, Research, and Results Advanced technologies like GIS, cloud computing, and machine learning (ML) are revolutionizing fl ood hazard map creation, making it quicker, more cost-e ff ective, and highly accurate [22]. ML uncovers hidden pa tt erns in data, enhancing fl ood forecasting, estimating maximum fl ood depth, and providing alerts [23,24]. The Google Flood Hub exempli fi es ML’s potential, o ff ering near-instantaneous fl ood forecasts worldwide by Figure 1. Geographic Distribution of FEMA Flood Insurance Rate Maps in the United States Approximately 80% of those at risk from river floods live in 15 developing countries, including India, Bangladesh, China, and Vietnam, which often lack resources to mitigate flood impacts [ 21 ]. In these countries, the absence of flood hazard maps exacerbates disaster readiness and response challenges Machine Learning, Random Forest Classification, Research, and Results Advanced technologies like GIS, cloud computing, and machine learning (ML) are revolutionizing flood hazard map creation, making it quicker, more cost-effective, and highly accurate [ 22 ]. ML uncovers hidden patterns in data, enhancing flood forecasting, estimating maximum flood depth, and providing alerts [ 23 , 24 ]. The Google Flood Hub exemplifies ML’s potential, offering near-instantaneous flood forecasts worldwide by
[[[ p. 3 ]]]
[Summary: This page details how machine learning (ML) and Random Forest (RF) models are revolutionizing flood hazard map creation. It mentions the limitations of current models due to data availability. The page states the goal of using RF in flood hazard mapping is to create cost-effective, fast models applicable worldwide. It also introduces the methodology used in the study.]
Sustainability 2024 , 16 , 8918 3 of 17 combining ML algorithms with satellite data [ 25 ]. However, integrating FEMA flood maps with ML for global applicability remains a challenge Random Forest (RF) models, a type of ML, predict flood events by combining results from multiple regression trees based on sampled data [ 26 ]. RF has been used in remote sensing to classify land cover and predict natural events like forest fires [ 27 ], urban areas [ 28 ], agricultural land [ 29 ], landslides [ 30 ], and floods [ 31 , 32 ]. Recent research focuses on using RF with remote sensing and national data to model floodplains and predict flood damage accurately Despite progress, the global application of these models is limited by data availability For example, Collins and Sanchez’s work using U.S. national data restricts applicability outside the U.S. [ 33 ]. Woznicki and Baynes developed RF models for U.S. watersheds, showing promise for floodplain mapping but limited by the same data constraints [ 17 ]. The goal of using RF in flood hazard mapping is to create cost-effective, fast models applicable worldwide. By analyzing diverse data sets, RF aims to produce accurate flood maps for data-scarce regions, improving flood hazard management and safety. This research, focusing on Arizona, provides accurate flood maps to help local communities prepare for floods. All data is publicly accessible through an ArcGIS Pro 3.3.2 interactive map, encouraging further research and aiding those in flood-prone areas. Thus, this paper focuses solely on the creation of flood hazard maps, emphasizing the identification, and characterization of areas at risk of flooding. By doing so, it provides a critical foundation for understanding flood dynamics and informing subsequent risk assessments and mitigation efforts 2. Methodology The methodological process is depicted in Figure 2 . Initially, topographic and weather data from Arizona are used to calibrate the model and create a grid of cells with designated National Flood Hazard Layer (NFHL) values. The RF Classification algorithm estimates the NFHL for each cell, and this calibrated model is then applied to generate NFHL values for countries in New Mexico, Colorado, and Utah. The final product is an interactive ArcGIS map available for public use 2.1. Current Data in Arizona, Utah, Colorado, and New Mexico This study employs the NFHL as the principal data source for flood map analysis in Arizona. The NFHL, a geospatial database developed by FEMA, offers current information on flood hazards across the United States, including Arizona. FEMA’s flood zone classifications are essential in understanding flood risks across different regions. Zone A represents areas with a 1% annual chance of flooding, also known as the 100-year floodplain, where no detailed flood elevations are provided. Zone AE also falls within the 100-year floodplain but includes specific base flood elevations, offering more precise risk assessments. Zone AH indicates areas with shallow flooding, typically ponding, with flood depths ranging from 1 to 3 feet. Zone AO is designated for areas with shallow flooding, usually sheet flow, with average depths of 1 to 3 feet. Zone D covers areas with possible but undetermined flood hazards, where no analysis has been conducted. Finally, Zone X represents areas of minimal flood risk, outside the 100-year and 500-year floodplains, where the likelihood of flooding is low. To initiate the analysis, the latest version of the Arizona NFHL was first procured. This database encompasses information on flood zones, base flood elevations, and floodway boundaries, among other flood hazard-related data. This data was extracted and processed to construct a high-resolution flood map of Arizona. In Arizona, these classifications encompass A, AE, AH, AO, D, X, and an “area not included” designation. It is crucial to note that all the counties in Arizona have been assessed and classified using the NFHL. Conversely, certain counties in the State of New Mexico have not yet undergone flood hazard analysis. Specifically, the counties of Catron, Hidalgo, Sierra, Torrance, Guadalupe, De Baca, Quay, Mora, Harding, and Union currently lack an NFHL classification (see Figure 3 ). Similar to the State of Colorado, Utah also has counties
[[[ p. 4 ]]]
[Summary: This page outlines the methodology, starting with topographic and weather data from Arizona to calibrate the model. It describes the use of USGS 3DEP Standard DEMs for detailed elevation data and FEMA flood zone classifications. The page also specifies which counties in New Mexico, Colorado, and Utah lack NFHL classifications.]
Sustainability 2024 , 16 , 8918 4 of 17 that have not been classified by the National Flood Hazard Layer (NFHL). Specifically, the counties of Rich, Juab, Millard, Beaver, Iron, Kane, Garfield, Piute, Sevier, Emery, Wayne, San Juan, Grand, Duchesne, and Daggett currently lack an NFHL classification (Figure 3 ). Sustainability 2024 , 16 , x FOR PEER REVIEW 4 of 17 Figure 2. Flowchart showing the development of an RF-based model for estimating NFHL values in New Mexico, Colorado, and Utah using topographic and weather data. 2.1. Current Data in Arizona, Utah, Colorado, and New Mexico This study employs the NFHL as the principal data source for fl ood map analysis in Arizona. The NFHL, a geospatial database developed by FEMA, o ff ers current information on fl ood hazards across the United States, including Arizona. FEMA’s fl ood zone classi fi cations are essential in understanding fl ood risks across di ff erent regions. Zone A represents areas with a 1% annual chance of fl ooding, also known as the 100-year fl oodplain, where no detailed fl ood elevations are provided. Zone AE also falls within the 100- year fl oodplain but includes speci fi c base fl ood elevations, o ff ering more precise risk assessments. Zone AH indicates areas with shallow fl ooding, typically ponding, with fl ood depths ranging from 1 to 3 feet. Zone AO is designated for areas with shallow fl ooding, usually sheet fl ow, with average depths of 1 to 3 feet. Zone D covers areas with possible but undetermined fl ood hazards, where no analysis has been conducted. Finally, Zone X represents areas of minimal fl ood risk, outside the 100-year and 500-year fl oodplains, where the likelihood of fl ooding is low. To initiate the analysis, the latest version of the Arizona NFHL was fi rst procured. This database encompasses information on fl ood zones, base fl ood elevations, and fl oodway boundaries, among other fl ood hazard-related data. This data was extracted and processed to construct a high-resolution fl ood map of Arizona. In Arizona, these classi fi cations encompass A, AE, AH, AO, D, X, and an “area not included” designation. It is crucial to note that all the counties in Arizona have been assessed and classi fi ed using the NFHL. Conversely, certain counties in the State of New Mexico have not yet undergone fl ood hazard analysis. Speci fi cally, the counties of Catron, Hidalgo, Sierra, Torrance, Guadalupe, De Baca, Quay, Mora, Harding, and Union currently lack an NFHL classi fi cation (see Figure 3). Similar to the State of Colorado, Utah also has counties that have not been classi fi ed by the National Flood Hazard Layer (NFHL). Speci fi cally, the counties of Rich, Juab, Millard, Beaver, Iron, Kane, Gar fi eld, Figure 2. Flowchart showing the development of an RF-based model for estimating NFHL values in New Mexico, Colorado, and Utah using topographic and weather data 2.2. Topographic Information To precisely forecast flood hazard in regions without NFHL classifications, the highly detailed USGS 3 D Elevation Program (3 DEP) Standard DEMs were used. These DEMs supply crucial elevation data that were used to inform the RF model by estimating terrain parameters such as elevation, slope, aspect, hill shade, curvature, flow direction, and flow accumulation. Figure 4 illustrates the DEM analysis for Arizona 2.3. Weather Information—TerraClimate (Remote Sensing Data) To enhance our flood hazard analysis, estimations of weather variables from the Terra- Climate dataset were integrated. These variables, including precipitation and temperature, can exert a significant influence on flood hazard in a specific area. TerraClimate offers gridded estimates of monthly climate and water balance variables at an approximate resolution of four kilometers. This data is compiled using a blend of weather station data, remote sensing, and other climate datasets, resulting in a comprehensive and precise depiction of climate patterns in a specific area. By integrating this data into our flood hazard analysis, a deeper understanding of the relationship between weather patterns and flood hazard in each area can be gained. It is crucial to highlight that only the 2020 weather information from TerraClimate was used, coinciding with the most recent update of the NFHL map of Arizona. This ensures that the obtained flood hazard analysis is grounded in the most recent and relevant data available. To encapsulate the total impact of weather on flood haz-
[[[ p. 5 ]]]
[Summary: This page discusses the integration of weather variables from the TerraClimate dataset, including precipitation and temperature, to enhance flood hazard analysis. It emphasizes the use of 2020 weather data and the calculation of the annual sum of each weather variable. The page references a figure showing weather variables for Arizona.]
Sustainability 2024 , 16 , 8918 5 of 17 ard in each cell, the annual sum of each weather variable obtained from TerraClimate was calculated. This provided a holistic view of the overall influence of weather on flood hazard in each area. Figure 5 shows some of the weather variables obtained from TerraClimate for Arizona Sustainability 2024 , 16 , x FOR PEER REVIEW 5 of 17 Piute, Sevier, Emery, Wayne, San Juan, Grand, Duchesne, and Dagge tt currently lack an NFHL classi fi cation (Figure 3). Figure 3. ( A ) National Flood Hazard Layer (NFHL) Analysis in Arizona, New Mexico, Colorado, and Utah: Extracting and Processing Data to Create High-Resolution Flood Maps. ( B ) NFHL Classi fi cation for Arizona. ( C ) NFHL-Based Flood Hazard Analysis in Arizona Using Grid Cell Classi fi - cation for Arizona (worst-case threshold). 2.2. Topographic Information To precisely forecast fl ood hazard in regions without NFHL classi fi cations, the highly detailed USGS 3 D Elevation Program (3 DEP) Standard DEMs were used. These DEMs supply crucial elevation data that were used to inform the RF model by estimating terrain parameters such as elevation, slope, aspect, hill shade, curvature, fl ow direction, and fl ow accumulation. Figure 4 illustrates the DEM analysis for Arizona. Figure 3. ( A ) National Flood Hazard Layer (NFHL) Analysis in Arizona, New Mexico, Colorado, and Utah: Extracting and Processing Data to Create High-Resolution Flood Maps. ( B ) NFHL Classification for Arizona. ( C ) NFHL-Based Flood Hazard Analysis in Arizona Using Grid Cell Classification for Arizona (worst-case threshold) Thus, the topographic variables incorporated in this study encompass elevation, slope, aspect, hill shade, curvature, flow direction, and flow accumulation, as previously mentioned. It also included a range of meteorological variables sourced from TerraClimate such as total annual 2020 precipitation, runoff, soil moisture, snow water equivalent, actual evapotranspiration, climate water deficit, downward surface shortwave radiation, maximum and minimum temperatures, vapor pressure, wind speed, vapor pressure deficit, and the Palmer Drought Severity Index.
[[[ p. 6 ]]]
[Summary: This page presents a figure showing the Digital Elevation Model (DEM) analysis of Arizona. It lists the topographic variables incorporated in the study, including elevation, slope, and flow accumulation. The page also mentions the meteorological variables sourced from TerraClimate, such as precipitation, runoff, and soil moisture.]
Sustainability 2024 , 16 , 8918 6 of 17 Sustainability 2024 , 16 , x FOR PEER REVIEW 6 of 17 Figure 4. Digital Elevation Model (DEM) analysis of Arizona for fl ood hazard prediction using USGS 3 D Elevation Program (3 DEP). 2.3. Weather Information—TerraClimate (Remote Sensing Data) To enhance our fl ood hazard analysis, estimations of weather variables from the TerraClimate dataset were integrated. These variables, including precipitation and temperature, can exert a signi fi cant in fl uence on fl ood hazard in a speci fi c area. TerraClimate o ff ers gridded estimates of monthly climate and water balance variables at an approximate resolution of four kilometers. This data is compiled using a blend of weather station data, remote sensing, and other climate datasets, resulting in a comprehensive and precise depiction of climate pa tt erns in a speci fi c area. By integrating this data into our fl ood hazard analysis, a deeper understanding of the relationship between weather pa tt erns and fl ood hazard in each area can be gained. It is crucial to highlight that only the 2020 weather information from TerraClimate was used, coinciding with the most recent update of the NFHL map of Arizona. This ensures that the obtained fl ood hazard analysis is grounded in the most recent and relevant data available. To encapsulate the total impact of weather on fl ood hazard in each cell, the annual sum of each weather variable obtained from TerraClimate was calculated. This provided a holistic view of the overall in fl uence of weather on fl ood hazard in each area. Figure 5 shows some of the weather variables obtained from TerraClimate for Arizona. Figure 4. Digital Elevation Model (DEM) analysis of Arizona for flood hazard prediction using USGS 3 D Elevation Program (3 DEP) 2.4. Grid Creation To incorporate the weather variable estimation from TerraClimate into the flood hazard analysis, a grid for Arizona that aligns with the resolution of TerraClimate was built. This facilitated the spatial alignment of the climate data with other information sources such as the NFHL classifications and DEM analysis. The grid that was developed consists of cells with a uniform spatial resolution, enabling the computation of climate variables for each cell. By employing a consistent grid, it standardized the data across the entire region, simplifying the analysis and comparison of different areas. As an integral part of the flood hazard analysis in Arizona, the NFHL classifications was used to ascertain the flood hazard for each cell in the grid. The highest NFHL classification within each cell is designated as the representation of the flood hazard. To evaluate flood hazard for each cell, four different thresholds were applied including the worst-case scenario, 5%, 10%, 25%, 50%, and 75%, based on the area of the classification in each cell. Under the worst-case scenario threshold, if any portion of a cell has a flood hazard level of A, the entire cell is deemed to have a flood hazard level of A (Figure 3 C). For the 25% threshold, if the area of the cell with a flood
[[[ p. 7 ]]]
[Summary: This page explains the grid creation process, aligning with TerraClimate's resolution. It describes how NFHL classifications are used to determine flood hazard for each cell, applying different thresholds. The flood hazard assessment involved an evaluation of results obtained using various thresholds, aiming to identify the most suitable threshold for each grid in the study area.]
Sustainability 2024 , 16 , 8918 7 of 17 hazard level of A encompasses more than 25% of the cell, the entire cell is deemed to have a flood hazard level of A. This process was repeated with 5%, 10%, 50%, and 75% thresholds Sustainability 2024 , 16 , x FOR PEER REVIEW 7 of 17 Figure 5. TerraClimate weather variables for Arizona. Thus, the topographic variables incorporated in this study encompass elevation, slope, aspect, hill shade, curvature, fl ow direction, and fl ow accumulation, as previously mentioned. It also included a range of meteorological variables sourced from TerraClimate such as total annual 2020 precipitation, runo ff , soil moisture, snow water equivalent, actual evapotranspiration, climate water de fi cit, downward surface shortwave radiation, maximum and minimum temperatures, vapor pressure, wind speed, vapor pressure de fi cit, and the Palmer Drought Severity Index. 2.4. Grid Creation To incorporate the weather variable estimation from TerraClimate into the fl ood hazard analysis, a grid for Arizona that aligns with the resolution of TerraClimate was built. This facilitated the spatial alignment of the climate data with other information sources such as the NFHL classi fi cations and DEM analysis. The grid that was developed consists of cells with a uniform spatial resolution, enabling the computation of climate variables for each cell. By employing a consistent grid, it standardized the data across the entire region, simplifying the analysis and comparison of di ff erent areas. As an integral part of the fl ood hazard analysis in Arizona, the NFHL classi fi cations was used to ascertain the fl ood hazard for each cell in the grid. The highest NFHL classi fi cation within each cell is designated as the representation of the fl ood hazard. To evaluate fl ood hazard for each cell, four di ff erent thresholds were applied including the worst-case scenario, 5%, 10%, Figure 5. TerraClimate weather variables for Arizona The flood hazard assessment involved an evaluation of results obtained using various thresholds, aiming to identify the most suitable threshold for each grid in the study area The chosen threshold plays a crucial role in enhancing the accuracy of flood hazard analysis and supporting decision-making in disaster risk management. The application of different thresholds enables the identification of optimal flood hazard levels within each cell, thereby improving the precision of flood hazard predictions. Subsequently, we combined the A and AE categories, resulting in the creation of a new merged category labelled “A”. This consolidation serves the purpose of streamlining the flood hazard analysis process and facilitating a clearer understanding of the flood hazard associated with the areas under examination. Going forward, the “A” classification was used to represent areas with a 1% annual chance of flooding, regardless of the presence of base flood elevations. It was anticipated that this approach will contribute to a more efficient and comprehensive analysis of flood hazard in the affected regions. For instance, consider a cell in the grid situated in Phoenix, Arizona. This cell possesses a resolution of approximately 4 km × 4 km and encompasses multiple NFHL classifications including zones A and D. Using the
[[[ p. 8 ]]]
[Summary: This page describes the Random Forest (RF) configuration used in the study, including the partitioning of data into training and testing sets. It explains the use of Scikit-Learn’s RandomizedSearchCV for hyperparameter tuning and the Gini index for feature importance. It also describes how the calibrated RF model was used to predict NFHL for New Mexico, Colorado, and Utah.]
Sustainability 2024 , 16 , 8918 8 of 17 methodology of designating the highest NFHL classification within each cell, it would be attributed a flood hazard level of A to this cell, as it signifies the most severe flood hazard classification within that cell. This guarantees that the most critical flood hazard within the cell was captured and used to formulate the flood hazard predictions by applying various thresholds 2.5. Random Forest Configuration In this research, a RF Classification methodology was employed to estimate the NFHL for each cell in Arizona. Topographic and weather data for each cell were incorporated as input to the model. The dataset was partitioned into a training set comprising 70% of the data and a testing set comprising the remaining 30%. Scikit-Learn’s RandomizedSearchCV was used, which conducted a random search of parameters within a specified range for each hyperparameter. Specifically, the parameter distribution of the number estimator (ranging from 10 to 500) was used, and the maximum number of trees (ranging from 2 to 20) to randomly sample 10 combinations of hyperparameters. A cross-validation approach with 10 folds was employed to assess the performance of each set of hyperparameters. Additionally, the Gini index was used to evaluate the importance of features and identify which input variables were most influential in predicting flood hazard. The Gini index is a metric used in decision trees to evaluate the quality of a split. It measures the degree of disorder or impurity in a dataset, where 0 represents perfect purity (all elements belong to a single class), and higher values indicate more impurity. Mathematically, the Gini index is expressed as: Gini = 1 − ∑ n i = 1 p 2 i Here, p 2 i denotes the proportion of items classified as class i within the node, while n signifies the total number of classes. By discerning the pivotal features driving the model’s predictive performance, we aim to enhance the accuracy and efficacy of flood hazard assessments, thereby contributing to more informed decision-making processes in flood management and mitigation strategies 2.6. Predicting NFHL for New Mexico, Colorado, and Utah Upon calibrating the RF model with the data from Arizona, it was employed to estimate the NFHL for each cell in the counties of New Mexico, Colorado, and Utah that were devoid of flood hazard information. Initially, the topography and weather for each cell were estimated, then fed these as inputs into the trained RF model to generate the NFHL Finally, using a calibrated RF model, the NFHL for each cell was estimated based on topographic and weather data. This data was then integrated into an interactive ArcGIS map, enhancing public understanding. This tool provides a straightforward means for visualizing and analyzing flood hazard across Arizona, New Mexico, Colorado, and Utah. The map, which is publicly accessible, serves as a valuable resource for local authorities, emergency management agencies, and residents in these regions for planning and preparedness purposes 3. Results 3.1. Grid In this study, Arizona was divided into 16,999 standardized cells, each with an area of 17.24 km². A flood hazard assessment was then conducted under a worst-case scenario to identify the most flood-prone areas. Under this scenario, 5906 cells were classified as A zones, with no cells in AH zones, and only six cells in AO zones. Additionally, 3263 cells were classified as D zones, while the remaining 7824 cells fell under X zones. These findings, along with other threshold scenarios, are detailed in Table 1 . Notably, majority of the cells classified as A zones were identified under the worst-case scenario, underscoring the potential flood vulnerability of these areas under the strictest assessment. In contrast,
[[[ p. 9 ]]]
[Summary: This page presents the results, starting with the grid creation in Arizona, dividing it into 16,999 cells. It details the flood hazard assessment under a worst-case scenario and lists the number of cells classified into different flood zones. The page also refers to a figure illustrating the flood hazard assessment outcomes for Arizona.]
Sustainability 2024 , 16 , 8918 9 of 17 only 98 cells were classified as A zones under the 75% threshold evaluation, suggesting a comparatively lower flood hazard in these areas Table 1. Thresholds and Flood Hazard Classification Results for Arizona Threshold A (A + AE) AH AO X D Worst-case scenario 5906 0 6 3263 7824 5% 2888 3 23 5192 8893 10% 1677 3 25 6109 9185 25% 597 4 29 6994 9375 50% 205 4 31 7321 9438 75% 98 4 31 7321 9545 Sum of cells 11,371 18 145 36,200 54,260 Figure 6 illustrates an all-encompassing visual depiction of the flood hazard assessment outcomes for Arizona, considering various threshold parameters. The figure underscores the dynamic classification of diverse regions within the State, contingent on the analytical criteria employed. This offers detailed insight into the geographical distribution of flood hazard across the region Sustainability 2024 , 16 , x FOR PEER REVIEW 10 of 17 Figure 6. Flood hazard analysis results for Arizona: spatial distribution across di ff erent threshold se tt ings. 3.2. Grid and TerraClimate Information Merge To construct a predictive model for fl ood hazard, topographic and meteorological data for each fl ood hazard classi fi cation cell throughout Arizona was gathered. These data points served as features in the ML RF model, while the fl ood hazard classi fi cation of each cell was used as the target variable. This methodology enabled the creation of a fl ood hazard predictive model that was customized to the distinct a tt ributes of Arizona’s various regions. The integration of topographic and meteorological data into our model allowed us to consider a broad spectrum of factors in fl uencing fl ood hazard, thereby enhancing the accuracy of the fl ood hazard predictions across the State. All these variables were employed as features in our RF Classi fi er model, with the fl ood hazard classi fi cation acting as the target variable. The total feature count for each fl ood hazard classi fi cation cell target was 19, inclusive of the aforementioned topographic and meteorological variables. 3.3. Random Forest An RF Classi fi er was used to forecast the fl ood hazard categorization for each cell in Arizona. The model was calibrated using the RandomizedSearchCV method, which entails the selection of the optimal hyperparameters through repeated cycles of random Figure 6. Flood hazard analysis results for Arizona: spatial distribution across different threshold settings.
[[[ p. 10 ]]]
[Summary: This page describes the merging of grid and TerraClimate information to construct a predictive model for flood hazard. It explains that topographic and meteorological data were used as features in the ML RF model. The page states the RF Classifier exhibited superior performance, with a Training Accuracy of 1.0 and a testing sample accuracy surpassing 0.8 for each threshold setting.]
Sustainability 2024 , 16 , 8918 10 of 17 3.2. Grid and TerraClimate Information Merge To construct a predictive model for flood hazard, topographic and meteorological data for each flood hazard classification cell throughout Arizona was gathered. These data points served as features in the ML RF model, while the flood hazard classification of each cell was used as the target variable. This methodology enabled the creation of a flood hazard predictive model that was customized to the distinct attributes of Arizona’s various regions. The integration of topographic and meteorological data into our model allowed us to consider a broad spectrum of factors influencing flood hazard, thereby enhancing the accuracy of the flood hazard predictions across the State. All these variables were employed as features in our RF Classifier model, with the flood hazard classification acting as the target variable. The total feature count for each flood hazard classification cell target was 19, inclusive of the aforementioned topographic and meteorological variables 3.3. Random Forest An RF Classifier was used to forecast the flood hazard categorization for each cell in Arizona. The model was calibrated using the RandomizedSearchCV method, which entails the selection of the optimal hyperparameters through repeated cycles of random parameter sampling. The outcomes are detailed in Table 2 . The RF Classifier exhibited superior performance, with a Training Accuracy of 1.0 and a testing sample accuracy surpassing 0.8 for each threshold setting. To elucidate the performance metrics, it was noted that accuracy is the ratio of correctly classified instances to the total instances. Precision is the ratio of true positives to all positive instances, and recall is the ratio of true positives to all instances that are genuinely positive. All these metrics were found to exceed 0.8 Table 2. Performance metrics of RF Classifier for flood hazard classification in Arizona Scenario Performances Hyperparameters Accuracy Precision Recall Nº Estimators Max Depth Worst-case scenario 0.8 0.8 0.8 404 18 5% 0.82 0.82 0.82 201 18 10% 0.84 0.84 0.84 468 18 25% 0.88 0.88 0.88 125 18 50% 0.88 0.88 0.88 318 19 75% 0.85 0.85 0.85 271 19 Furthermore, a Confusion Matrix was provided for each threshold setting in Table 3 , offering an in-depth evaluation of the model’s performance. Table 4 , on the other hand, displays the accuracy metrics for the A zone classification under varying threshold settings, spanning from the worst-case scenario to more forgiving criteria like the 75% threshold The worst-case scenario yielded the peak accuracy value of 84.59%, signifying that the RF Classifier was successful in accurately categorizing a substantial number of cells into the A zone category, indicative of areas with the highest flood hazard. As the threshold setting becomes more relaxed, the accuracy diminishes, with the minimum accuracy value of 28% noted at the 75% threshold The Gini index was used to gauge the significance of variables across various scenarios (refer to Figure 7 ). The findings revealed that the Palmer Drought Severity Index consistently emerged as the most significant variable across all scenarios. Elevation, downward surface shortwave radiation, and wind speed were also identified as key variables in most scenarios. Precipitation, climate water deficit, and slope held moderate significance in certain scenarios, while variables such as soil moisture, actual evapotranspiration, and vapor pressure deficit were of low significance in most scenarios. In some scenarios, topography-related variables such as curvature, flow accumulation, and aspect held greater
[[[ p. 11 ]]]
[Summary: This page continues presenting the results, including a Confusion Matrix for each threshold setting and accuracy metrics for the A zone classification. It identifies the Palmer Drought Severity Index as the most significant variable across all scenarios. It includes tables with confusion matrix information and accuracy of flood hazard classification.]
Sustainability 2024 , 16 , 8918 11 of 17 significance. The snow water equivalent was consistently the least significant variable across all scenarios. Collectively, these results suggest that climate-related variables and topographical features are pivotal in predicting the target variable across different scenarios, as evidenced by their high Gini index values Table 3. Confusion Matrix for each threshold setting, testing sample Worst-Case Scenario D X AO AH A (A + AE) 5% D X AO AH A (A + AE) D 2032 45 0 - 213 D 2464 168 0 0 58 X 217 576 0 - 251 X 234 1133 0 0 157 AO 0 0 0 - 1 AO 1 1 0 0 4 AH - - - - - AH 0 0 0 0 2 A (A + AE) 191 78 0 - 1477 A (A + AE) 114 183 0 0 562 10% D X AO AH A (A + AE) 25% D X AO AH A (A + AE) D 2558 180 0 - 20 D 2556 207 0 0 8 X 284 1442 0 - 57 X 297 1816 0 0 7 AO 24 4 0 - 2 AO 2 5 1 0 0 AH - - - - - AH 0 0 0 0 1 A (A + AE) 68 202 0 - 262 A (A + AE) 30 84 0 0 67 50% D X AO AH A (A + AE) 75% D X AO AH A (A + AE) D 2601 240 0 0 3 D 2667 232 0 0 1 X 285 1879 0 0 2 X 274 1865 1 0 0 AO 1 9 0 0 0 AO 3 5 0 0 0 AH 0 1 0 0 0 AH 0 1 0 0 0 A (A + AE) 9 31 0 0 20 A (A + AE) 11 12 0 0 9 Table 4. Accuracy of flood hazard classification for A zones across different thresholds Scenario Percentage Accuracy for A (A + AE) Worst-case scenario 85 5% 65 10% 49 25% 37 50% 33 75% 28 3.4. New Flood Hazard Maps The calibrated RF Classifier, originally trained on the flood hazard map of Arizona, was used to generate new flood hazard maps for Utah, Colorado, and New Mexico (refer to Figure 8 ). The 25% scenario was chosen as the threshold setting for classification due to its consistently superior performance relative to other models. These flood hazard maps offer crucial insights to local authorities, empowering them to make well-informed decisions pertaining to flood hazard management and emergency readiness. It is noteworthy that the threshold setting employed in the creation of these maps was scenario-specific, and local authorities may opt for different configurations based on their unique requirements and priorities. By leveraging the RF Classifier and the flood hazard map of Arizona as a calibration instrument, we have showcased the potential of this methodology to be extended to other regions, offering a flexible and scalable approach for flood hazard evaluation and management.
[[[ p. 12 ]]]
[Summary: This page describes the creation of new flood hazard maps for Utah, Colorado, and New Mexico using the calibrated RF Classifier. It mentions the selection of the 25% scenario as the threshold setting and highlights the insights provided to local authorities. This page shows the potential of this methodology to be extended to other regions.]
Sustainability 2024 , 16 , 8918 12 of 17 Sustainability 2024 , 16 , x FOR PEER REVIEW 12 of 17 Scenario Percentage Accuracy for A (A + AE) Worst-case scenario 85 5% 65 10% 49 25% 37 50% 33 75% 28 The Gini index was used to gauge the signi fi cance of variables across various scenarios (refer to Figure 7). The fi ndings revealed that the Palmer Drought Severity Index consistently emerged as the most signi fi cant variable across all scenarios. Elevation, downward surface shortwave radiation, and wind speed were also identi fi ed as key variables in most scenarios. Precipitation, climate water de fi cit, and slope held moderate signi fi - cance in certain scenarios, while variables such as soil moisture, actual evapotranspiration, and vapor pressure de fi cit were of low signi fi cance in most scenarios. In some scenarios, topography-related variables such as curvature, fl ow accumulation, and aspect held greater signi fi cance. The snow water equivalent was consistently the least signi fi cant variable across all scenarios. Collectively, these results suggest that climate-related variables and topographical features are pivotal in predicting the target variable across di ff erent scenarios, as evidenced by their high Gini index values. Figure 7. Variable importance based on Gini Index for di ff erent scenarios. 3.4. New Flood Hazard Maps The calibrated RF Classi fi er, originally trained on the fl ood hazard map of Arizona, was used to generate new fl ood hazard maps for Utah, Colorado, and New Mexico (refer to Figure 8). The 25% scenario was chosen as the threshold se tt ing for classi fi cation due to its consistently superior performance relative to other models. These fl ood hazard maps o ff er crucial insights to local authorities, empowering them to make well-informed Figure 7. Variable importance based on Gini Index for different scenarios Sustainability 2024 , 16 , x FOR PEER REVIEW 13 of 17 decisions pertaining to fl ood hazard management and emergency readiness. It is noteworthy that the threshold se tt ing employed in the creation of these maps was scenariospeci fi c, and local authorities may opt for di ff erent con fi gurations based on their unique requirements and priorities. By leveraging the RF Classi fi er and the fl ood hazard map of Arizona as a calibration instrument, we have showcased the potential of this methodology to be extended to other regions, o ff ering a fl exible and scalable approach for fl ood hazard evaluation and management. Figure 8. New fl ood hazard maps for Utah, Colorado, and New Mexico based on RF classi fi cation using Arizona fl ood hazard map calibration. 3.5. Maps Website The results of this study, which estimated the NFHL for each cell in Arizona, New Mexico, Colorado, and Utah using a calibrated RF model with topographic and weather data are available to the public through an ArcGIS interactive map. The map allows users to easily visualize and analyze the fl ood hazard in their area and can be used for planning and preparedness purposes by local authorities, emergency management agencies, and residents in these regions. The map is accessible to the public at the following link: www.bit.ly/43 w 6 PyO, accessed on 14 October 2024. 4. Discussion While there exists a plethora of machine learning con fi gurations for fl ood hazard analysis, the focus of this paper is to showcase a comprehensive approach that integrates diverse data sources and modeling techniques to provide an e ff ective solution. The fi ndings presented in this study o ff er signi fi cant insights into the application of an RF Classifi cation methodology for fl ood hazard analysis, utilizing a combination of topographic Figure 8. New flood hazard maps for Utah, Colorado, and New Mexico based on RF classification using Arizona flood hazard map calibration.
[[[ p. 13 ]]]
[Summary: This page details the availability of the study's results through an ArcGIS interactive map, providing public access to flood hazard information for Arizona, New Mexico, Colorado, and Utah. The page transitions into the discussion section, emphasizing the study's comprehensive approach and the integration of diverse data sources. ]
Sustainability 2024 , 16 , 8918 13 of 17 3.5. Maps Website The results of this study, which estimated the NFHL for each cell in Arizona, New Mexico, Colorado, and Utah using a calibrated RF model with topographic and weather data are available to the public through an ArcGIS interactive map. The map allows users to easily visualize and analyze the flood hazard in their area and can be used for planning and preparedness purposes by local authorities, emergency management agencies, and residents in these regions. The map is accessible to the public at the following link: www.bit.ly/43 w 6 PyO , accessed on 14 October 2024 4. Discussion While there exists a plethora of machine learning configurations for flood hazard analysis, the focus of this paper is to showcase a comprehensive approach that integrates diverse data sources and modeling techniques to provide an effective solution. The findings presented in this study offer significant insights into the application of an RF Classification methodology for flood hazard analysis, utilizing a combination of topographic and weather data. Through rigorous analysis and modelling, it has been demonstrated the potential of this approach to provide accurate flood hazard assessments, not only for Arizona but also for the neighbouring regions of New Mexico, Colorado, and Utah One of the key strengths of this new methodology lies in the integration of diverse data sources and the flexibility that it offers for customization based on stakeholder needs. It demonstrated one realization among many possibilities, emphasizing that stakeholders must define which machine learning approach, datasets, and tuning strategies are best suited for their specific requirements. This investigation highlights that by training machine learning models with flood hazard maps and using public remote sensing data such as the NFHL, 3 DEP DEMs, and TerraClimate, flood hazard prediction can be achieved economically and efficiently. This integration allows for a comprehensive understanding of flood hazard dynamics, significantly enhancing the accuracy of our predictive models and enabling the creation of flood hazard maps on a global scale. Moreover, this methodology demonstrated scalability and flexibility by extending flood hazard analysis from Arizona to neighbouring states. This highlights the potential for our approach to be adapted and applied to diverse geographical regions, offering a versatile tool for flood hazard assessment The feature importance analysis conducted as part of this study revealed several key variables that significantly influenced flood hazard estimation using the RF Classification model. Among these variables, the Palmer Drought Severity Index (PDSI), elevation, downward surface shortwave radiation, wind speed, precipitation, and slope emerged as the most influential factors. The PDSI serves as a crucial indicator for flood hazard assessment by capturing long-term drought conditions, which can exacerbate flooding by drying out soil and reducing its water absorption capacity. Additionally, elevation influences flood hazard, with higher areas experiencing lower hazard as water flows away, while low-lying regions are more susceptible to inundation. Downward surface shortwave radiation affects flood hazard through its impact on evaporation rates and soil moisture levels, while wind speed intensifies precipitation patterns, particularly in coastal areas. Precipitation itself is a significant predictor of flood hazard, overwhelming drainage systems and leading to widespread flooding, especially during intense rainfall events. Lastly, slope plays a vital role by influencing surface runoff, with steeper slopes increasing erosion and flash flood hazard in mountainous terrains, while flatter areas elevate the hazard of localized flooding due to water accumulation However, our study also has its limitations. While our analysis benefited from multiple data sources, certain limitations exist particularly regarding the spatial resolution and coverage of available datasets. Improving the resolution and coverage of topographic and weather data could enhance the precision of flood hazard predictions. Additionally, the calibration of the RF model relied on flood hazard data from Arizona, which may not fully capture the unique characteristics of regions in New Mexico, Colorado, and Utah,
[[[ p. 14 ]]]
[Summary: This page discusses the study's limitations, including spatial resolution and data coverage. It acknowledges potential biases from using Arizona data to predict flood hazards in other states. The page also notes the sensitivity of flood hazard cell classification to threshold settings and states how to mitigate the limitations of the model.]
Sustainability 2024 , 16 , 8918 14 of 17 introducing biases and limitations in the accuracy of flood hazard assessments for these areas. Moreover, the classification of flood hazard cells was sensitive to threshold settings, impacting the distribution of hazard categories across different scenarios. However, to mitigate these limitations, one can integrate flood hazard maps from other states and countries and incorporate additional public remote sensing information to enhance the correlation between input and target variables, thereby improving the overall accuracy and reliability of our flood hazard assessments Ultimately, this new approach relies on the accuracy of the input data. While the NFHL and DEM data are highly detailed and accurate, the weather variable estimations from TerraClimate may have some errors or uncertainties. It is important to note that even though there are errors in TerraClimate, these errors are consistent across the entire world, meaning they apply uniformly in the future. The machine learning model will incorporate these constant errors, which is beneficial when estimating the Flood Risk Map. Each measurement carries some degree of error, but if these errors remain constant, they can aid in the overall estimation of flood risk. Another limitation is that this approach is based on a Random Forest (RF) model, which may not capture all the complex interactions between the input variables and flood hazard. Despite these limitations, this approach provides a valuable tool for estimating flood hazard in areas lacking NFHL classifications and can be useful for decision-makers in areas prone to flooding One notable advantage of this new approach lies in its applicability to regions lacking NFHL classifications. Take New Mexico, for instance, where certain counties have yet to receive NFHL designations. Through the use of DEMs and estimations of weather variables, it was possible to gauge flood hazard in these areas. Additionally, this method offers a holistic view of impacts of weather on flood hazard by aggregating the sum of various weather variables sourced from TerraClimate throughout the year. This facilitates a deeper understanding of the interplay between weather patterns and flood susceptibility in each locale. By doing so, one can reconsider the methodology for generating flood hazard maps, focusing on the most significant features identified by ML algorithms, thereby optimizing resource allocation. Ultimately, establishing a reliable model enables its deployment across diverse geographical regions worldwide Looking towards future research directions, several avenues can be identified. Improved data integration could involve incorporating additional data sources such as highresolution remote sensing imagery and hydrological models, to further enhance the accuracy and granularity of flood hazard assessments. Validation and sensitivity analyses could provide insights into the robustness of predictive models and help identify sources of uncertainty. Moreover, incorporating socioeconomic factors into flood hazard models can provide a more comprehensive understanding of the potential impacts of flooding on communities, supporting more effective hazard management and adaptation strategies. Exploring advanced ML techniques such as deep learning algorithms could offer opportunities to further improve the accuracy and efficiency of flood hazard prediction models, ultimately contributing to the continued evolution of flood hazard assessment methodologies and supporting more resilient and sustainable communities in the face of increasing flood hazards 5. Conclusions In conclusion, this study presents an RF-based model for estimating flood hazard in Arizona, New Mexico, Colorado, and Utah using topographic and weather data. The study illustrates a step-by-step methodology, which includes obtaining the most recent version of the NFHL, creating a high-resolution flood map, and applying the RF Classification algorithm to estimate the NFHL for each cell. The study also demonstrates the advantages of this new approach including its ability to estimate flood hazard in areas lacking NFHL classifications, providing a comprehensive picture of the overall impact of weather on flood hazard in each area. However, the study acknowledges some limitations including the
[[[ p. 15 ]]]
[Summary: This page concludes the discussion by highlighting the advantages of the new approach, particularly its applicability to regions lacking NFHL classifications. It emphasizes the holistic view of weather impacts on flood hazard and suggests future research directions. It discusses machine learning as a powerful tool that can greatly enhance the stakeholder’s process for flood hazard management.]
Sustainability 2024 , 16 , 8918 15 of 17 accuracy of the input data and the potential inability of the RF model to capture all the complex interactions between the input variables and flood hazard This research has important implications for decision-makers in areas prone to flooding, as it provides a valuable tool for estimating flood hazard and can aid in planning and preparedness efforts. The availability of the results through an ArcGIS interactive map makes the data accessible to the public and can facilitate local authorities, emergency management agencies, and residents in these regions to make informed decisions about flood hazard. Further research can be conducted to refine and improve the accuracy and robustness of this new approach, and to explore other ML algorithms that may provide even better results. Also, the methodology presented in this study provides a useful framework for assessing flood hazard in other regions and could be extended to areas outside of Arizona, New Mexico, Colorado, and Utah. The study highlights the potential for integrating different types of data such as topographic and weather data to provide a comprehensive picture of flood hazard and could pave the way for future research in this area Machine learning is a powerful tool that can greatly enhance the stakeholder’s process for flood hazard management. By leveraging ML algorithms, one can estimate flood hazard in areas lacking accurate data, while providing decision-makers with a comprehensive picture of the overall impact of weather on flood hazard in each region. This can help local authorities, emergency management agencies, and residents in flood-prone regions to plan and prepare for future flood events. Moreover, ML-based flood hazard estimation can be made available to the public through interactive maps, which can be accessed for free by anyone. As ML algorithms continue to evolve, one can expect to see further improvements in the accuracy and robustness of flood hazard estimation models. The availability of high-quality data and the continued development of machine learning techniques will allow users to refine models and improve their understanding of the complex interactions between topography, weather patterns, and flood hazard. As the technology continues to advance and data becomes more readily available, one can expect to see significant advancements in this field. By making these tools accessible to the public and continually refining obtained models, we can work together to better understand and prepare for future flood events Author Contributions: Conceptualization, H.L.V.-Q.; Methodology, H.L.V.-Q.; Investigation, H.L.V.- Q.; Writing—original draft, H.L.V.-Q., P.G.-C., R.V.-P. and T.P.A.F.; Writing—review & editing, J.E.M. and L.B.; Visualization, P.G.-C.; Supervision, P.G.-C., R.V.-P., T.P.A.F., H.G., D.G., J.B.V., J.E.M. and L.B All authors have read and agreed to the published version of the manuscript Funding: This research received no external funding Data Availability Statement: Data is contained within the article Conflicts of Interest: The authors declare no conflict of interest References 1 Tellman, B.; Sullivan, J.A.; Kuhn, C.; Kettner, A.J.; Doyle, C.S.; Brakenridge, G.R.; Erickson, T.A.; Slayback, D.A. Satellite imaging reveals increased proportion of population exposed to floods Nature 2021 , 596 , 80–86. [ CrossRef ] [ PubMed ] 2 Li, J.; Bortolot, Z.J. Quantifying the impacts of land cover change on catchment-scale urban flooding by classifying aerial images J. Clean. Prod 2022 , 344 , 130992. [ CrossRef ] 3 Prettenthaler, F.; Kortschak, D.; Albrecher, H.; Köberl, J.; Stangl, M.; Swierczynski, T. Can 7000 Years of flood history inform actual flood risk management? A case study on Lake Mondsee, Austria Int. J. Disaster Risk Reduct 2022 , 81 , 103227. [ CrossRef ] 4 Khosravi, K.; Rezaie, F.; Cooper, J.R.; Kalantari, Z.; Abolfathi, S.; Hatamiafkoueieh, J. Soil water erosion susceptibility assessment using deep learning algorithms J. Hydrol 2023 , 618 , 129229. [ CrossRef ] 5 Jongman, B. Effective adaptation to rising flood risk Nat. Commun 2018 , 9 , 1986. [ CrossRef ] [ PubMed ] 6 Dottori, F.; Szewczyk, W.; Ciscar, J.-C.; Zhao, F.; Alfieri, L.; Hirabayashi, Y.; Bianchi, A.; Mongelli, I.; Frieler, K.; Betts, R.A.; et al. Increased human and economic losses from river flooding with anthropogenic warming Nat. Clim. Chang 2018 , 8 , 781–786 [ CrossRef ]
[[[ p. 16 ]]]
[Summary: This page provides a list of references used in the study, citing various research papers and reports related to flood risk management, data analysis, and machine learning applications.]
Sustainability 2024 , 16 , 8918 16 of 17 7 Davenport, F.V.; Burke, M.; Diffenbaugh, N.S. Contribution of historical precipitation change to US flood damages Proc. Natl Acad. Sci. USA 2021 , 118 , e 2017524118. [ CrossRef ] 8 Powell, T.; Hanfling, D.; Gostin, L.O. Emergency preparedness and public health: The lessons of Hurricane Sandy Jama 2012 , 308 , 2569–2570. [ CrossRef ] 9 Seligman, J.; Felder, S.S.; Robinson, M.E. Substance Abuse and Mental Health Services Administration (SAMHSA) Behavioral Health Disaster Response App Disaster Med. Public Health Prep 2015 , 9 , 516–518. [ CrossRef ] 10 Chen, Y. Flood hazard zone mapping incorporating geographic information system (GIS) and multi-criteria analysis (MCA) techniques J. Hydrol 2022 , 612 , 128268. [ CrossRef ] 11 Mudashiru, R.B.; Sabtu, N.; Abustan, I.; Balogun, W. Flood hazard mapping methods: A review J. Hydrol 2021 , 603 , 126846 [ CrossRef ] 12 Tyler, J.; Sadiq, A.-A.; Noonan, D.S.; Entress, R.M. Decision Making for Managing Community Flood Risks: Perspectives of United States Floodplain Managers Int. J. Disaster Risk Sci 2021 , 12 , 649–660. [ CrossRef ] 13 Kam, P.M.; Aznar-Siguan, G.; Schewe, J.; Milano, L.; Ginnetti, J.; Willner, S.; McCaughey, J.W.; Bresch, D.N. Global warming and population change both heighten future risk of human displacement due to river floods Environ. Res. Lett 2021 , 16 , 044026 [ CrossRef ] 14 Chamin é , H.I.; Pereira, A.J.S.C.; Teodoro, A.C.; Teixeira, J. Remote sensing and GIS applications in earth and environmental systems sciences SN Appl. Sci 2021 , 3 , 870. [ CrossRef ] 15 Xian, S.; Lin, N.; Hatzikyriakou, A. Storm surge damage to residential areas: A quantitative analysis for Hurricane Sandy in comparison with FEMA flood map Nat. Hazards 2015 , 79 , 1867–1888. [ CrossRef ] 16 National Research Council; Board on Earth Sciences and Resources; Mapping Science Committee; Water Science and Technology Board; Committee on FEMA Flood Maps Mapping the Zone: Improving Flood Map Accuracy ; National Academies Press: Washington, DC, USA, 2009 17 Woznicki, S.A.; Baynes, J.; Panlasigui, S.; Mehaffey, M.; Neale, A. Development of a spatially complete floodplain map of the conterminous United States using random forest Sci. Total Environ 2019 , 647 , 942–953. [ CrossRef ] 18 Sun, Q.; Miao, C.; Duan, Q.; Ashouri, H.; Sorooshian, S.; Hsu, K.-L. A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons Rev. Geophys 2018 , 56 , 79–107. [ CrossRef ] 19 Zheng, F.; Tao, R.; Maier, H.R.; See, L.; Savic, D.; Zhang, T.; Chen, Q.; Assumpç ã o, T.H.; Yang, P.; Heidari, B.; et al. Crowdsourcing Methods for Data Collection in Geophysics: State of the Art, Issues, and Future Directions Rev. Geophys 2018 , 56 , 698–740 [ CrossRef ] 20 Association of State Floodplain Managers Flood Mapping for the Nation: A Cost Analysis for Completing and Maintaining the Nation’s NFIP Flood Map Inventory ; Association of State Floodplain Managers: Madison, WI, USA, 2020 21 Penning-Rowsell, E.; Becker, M Flood Risk Management: Case Studies of Governance, Policy and Communities ; Routledge: New York, NY, USA, 2019 22 Antzoulatos, G.; Kouloglou, I.-O.; Bakratsas, M.; Moumtzidou, A.; Gialampoukidis, I.; Karakostas, A.; Lombardo, F.; Fiorin, R.; Norbiato, D.; Ferri, M.; et al. Flood Hazard and Risk Mapping by Applying an Explainable Machine Learning Framework Using Satellite Imagery and GIS Data Sustainability 2022 , 14 , 3251. [ CrossRef ] 23 Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine Learning and Data Mining Methods in Diabetes Research Comput. Struct. Biotechnol. J 2017 , 15 , 104–116. [ CrossRef ] 24 Karim, F.; Armin, M.A.; Ahmedt-Aristizabal, D.; Tychsen-Smith, L.; Petersson, L. A Review of Hydrodynamic and Machine Learning Approaches for Flood Inundation Modeling Water 2023 , 15 , 566. [ CrossRef ] 25 Nevo, S.; Morin, E.; Gerzi Rosenthal, A.; Metzger, A.; Barshai, C.; Weitzner, D.; Voloshin, D.; Kratzert, F.; Elidan, G.; Dror, G. Flood forecasting with machine learning models in an operational framework Hydrol. Earth Syst. Sci 2022 , 26 , 4013–4032. [ CrossRef ] 26 Shrivastava, D.; Sanyal, S.; Maji, A.K.; Kandar, D. Chapter 17—Bone cancer detection using machine learning techniques. In Smart Healthcare for Disease Diagnosis and Prevention ; Paul, S., Bhatia, D., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 175–183 [ CrossRef ] 27 Gigovi´c, L.; Pourghasemi, H.R.; Drobnjak, S.; Bai, S. Testing a new ensemble model based on SVM and random forest in forest fire susceptibility assessment and its mapping in Serbia’s Tara National Park Forests 2019 , 10 , 408. [ CrossRef ] 28 Zhou, L.; Dang, X.; Sun, Q.; Wang, S. Multi-scenario simulation of urban land change in Shanghai by random forest and CA-Markov model Sustain. Cities Soc 2020 , 55 , 102045. [ CrossRef ] 29 Ayala-Izurieta, J.E.; M á rquez, C.O.; Garc í a, V.J.; Recalde-Moreno, C.G.; Rodr í guez-Llerena, M.V.; Dami á n-Carri ó n, D.A. Land cover classification in an ecuadorian mountain geosystem using a random forest classifier, spectral vegetation indices, and ancillary geographic data Geosciences 2017 , 7 , 34. [ CrossRef ] 30 Kim, J.-C.; Lee, S.; Jung, H.-S.; Lee, S. Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea Geocarto Int 2018 , 33 , 1000–1015. [ CrossRef ] 31 Schoppa, L.; Disse, M.; Bachmair, S. Evaluating the performance of random forest for large-scale flood discharge simulation J. Hydrol 2020 , 590 , 125531. [ CrossRef ]
[[[ p. 17 ]]]
[Summary: This page presents the conclusions of the study, summarizing the RF-based model for estimating flood hazard. It acknowledges the limitations of the model and highlights its implications for decision-makers. The page emphasizes the potential for future research and the integration of different types of data.]
Sustainability 2024 , 16 , 8918 17 of 17 32 Abedi, R.; Costache, R.; Shafizadeh-Moghadam, H.; Pham, Q.B. Flash-flood susceptibility mapping based on XGBoost, random forest and boosted regression trees Geocarto Int 2022 , 37 , 5479–5496. [ CrossRef ] 33 Collins, E.L.; Sanchez, G.M.; Terando, A.; Stillwell, C.C.; Mitasova, H.; Sebastian, A.; Meentemeyer, R.K. Predicting flood damage probability across the conterminous United States Environ. Res. Lett 2022 , 17 , 034006. [ CrossRef ] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
