- Local Setup
- Architecture Overview
- Code Structure
- Key Components
- Data Processing Pipeline
- Earth Engine Integration
- Visualization Components
- Adding New Features
- Common Issues and Debugging
To run the streamlit app locally, you'll need to install the required dependencies including streamlit:
- Create a virtual environment however you prefer (e.g.,
python3 -m venv venv) - Install dependencies from requirements.txt (e.g.,
pip install -r requirements.txt)
Example local installation with a virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Note: If you installed the requirements in a virtual environment, you will have to run the following commands from that virtual environment.
Get Google Earth Engine authentication token if you don't have one already:
earthengine authenticate
This will open a page in your browser for you to confirm Google permissions to allow your account to use Google Earth Engine.
Create a project in Google Cloud:
- Go to https://console.cloud.google.com/
- Create a new project and make a note of the project id (which can be different from the name)
Copy config.yaml.example to config.yaml
cp config.yaml.example config.yaml
Add your project id to the config.yaml file.
Enable the Earth Engine API for your project:
- Go to https://console.cloud.google.com/apis/library/earthengine.googleapis.com
- Select your project and click "Enable".
Register your project ID with Earth Engine: https://code.earthengine.google.com/register?project=your-gcp-project-id
Running streamlit app locally:
streamlit run app.py
Streamlit will start a local development server. By default, it opens in your browser at: http://localhost:8501
The Beaver Impacts Tool is built on:
- Streamlit: For the web interface
- Google Earth Engine (GEE): For satellite imagery processing
- Pandas/NumPy: For data manipulation
- Seaborn/Matplotlib: For visualization
The application follows a step-by-step workflow where users:
- Upload dam locations
- Select waterway datasets
- Validate dam locations
- Generate or upload non-dam locations
- Create buffered analysis zones
- Analyze and visualize environmental metrics
Each step involves interactions between the frontend (Streamlit) and backend processing using Earth Engine's Python API.
beaver-impacts-tool/
├── pages/ # Streamlit pages
│ ├── Exports_page.py # Main analysis workflow
│ └── [other pages...] # Additional functionality
├── service/ # Core business logic
│ ├── Sentinel2_functions.py # Sentinel-2 image processing
│ ├── Export_dam_imagery.py # Image export functionality
│ ├── Visualize_trends.py # Visualization and metrics computation
│ ├── Negative_sample_functions.py # Non-dam point generation
│ ├── Parser.py # Data parsing and input handling
│ ├── Data_management.py # Data management utilities
│ └── Validation_service.py # Validation logic
├── assets/ # Static assets
├── app.py # Main application entry point
├── README.md # This documentation
└── requirements.txt # Dependencies
credentials_info = {
"type": st.secrets["gcp_service_account"]["type"],
"project_id": st.secrets["gcp_service_account"]["project_id"],
# Other credentials
}
credentials = service_account.Credentials.from_service_account_info(
credentials_info,
scopes=["https://www.googleapis.com/auth/earthengine"]
)
ee.Initialize(credentials, project="ee-beaver-lab")This establishes the connection to Earth Engine using service account credentials.
# Initialize session state variables
if "Positive_collection" not in st.session_state:
st.session_state.Positive_collection = None
# More state variables...The application uses Streamlit's session state to maintain state between user interactions.
Each analysis step is implemented as an expandable section:
with st.expander("Step 1: Upload Dam Locations", expanded=not st.session_state.step1_complete):
# Step 1 implementationThe application implements a complex data processing pipeline that transforms user inputs into actionable insights. The following six steps accurately reflect the actual code implementation:
Input: CSV/GeoJSON files containing dam/non-dam locations Processing:
# Upload and standardize dam points
feature_collection = upload_points_to_ee(uploaded_file, widget_prefix="Dam")
feature_collection = feature_collection.map(set_id_year_property)
# For non-dam points
negative_points = sampleNegativePoints(positive_dams_fc, hydroRaster, innerRadius, outerRadius, samplingScale)
negative_points = negative_points.map(set_id_negatives)This function:
- Validates spatial data (coordinates)
- Standardizes date formats
- Assigns unique identifiers (P1, P2... for dams; N1, N2... for non-dams)
- Sets properties like dam status (positive/negative)
Output: Earth Engine FeatureCollection with standardized points
Input: Standardized FeatureCollection of points Processing:
Buffered_collection = Merged_collection.map(add_dam_buffer_and_standardize_date)This function:
- Creates circular buffers (default: 150m radius)
- Applies elevation masking (±3m from point elevation)
- Preserves original point geometry as a property
- Sets date-related properties for time series analysis
Output: FeatureCollection with polygon geometries (buffers) constrained by elevation
Input: Buffered FeatureCollection Processing:
# For combined analysis
S2_cloud_mask_batch = ee.ImageCollection(S2_Export_for_visual(dam_batch_fc))
# For upstream/downstream analysis
S2_IC_batch = S2_Export_for_visual_flowdir(dam_batch_fc, waterway_fc)This function:
- Determines time window (±6 months from survey date)
- Applies spatial filter (using buffer geometries)
- Applies cloud masking using QA bands
- Selects least cloudy image for each month (cloud coverage < 20%)
- Standardizes band names and properties
Output: Earth Engine ImageCollection with filtered monthly Sentinel-2 imagery
Input: Sentinel-2 ImageCollection Processing:
S2_with_LST_batch = S2_ImageCollection_batch.map(add_landsat_lst_et)This function:
- Acquires synchronous Landsat 8 thermal data for each Sentinel-2 image
- Applies radiometric calibration to thermal bands
- Calculates Land Surface Temperature (LST) using NDVI-based emissivity
- Retrieves monthly evapotranspiration (ET) data from OpenET
- Handles edge cases using median values when multiple images exist
- Provides fallback values (99) when data is unavailable
Output: Enhanced ImageCollection with LST and ET bands added
Input: Enhanced ImageCollection with LST and ET Processing:
# For combined analysis
results_fc_lst_batch = S2_with_LST_batch.map(compute_all_metrics_LST_ET)
# For upstream/downstream analysis
results_batch = S2_with_LST_ET.map(compute_all_metrics_up_downstream)This function calculates:
- NDVI (Normalized Difference Vegetation Index): (NIR-Red)/(NIR+Red)
- NDWI (Normalized Difference Water Index): (Green-NIR)/(Green+NIR)
- LST statistics (mean temperature in buffer area)
- ET statistics (mean evapotranspiration in buffer area)
- For upstream/downstream: calculates separate metrics for areas above and below dam points
Output: FeatureCollection with calculated environmental metrics
Input: FeatureCollection with calculated metrics Processing:
# Convert to DataFrame
df_batch = geemap.ee_to_df(results_fcc_lst_batch)
df_list.append(df_batch)
df_lst = pd.concat(df_list, ignore_index=True)
# Data preparation
df_lst['Image_month'] = pd.to_numeric(df_lst['Image_month'])
df_lst['Image_year'] = pd.to_numeric(df_lst['Image_year'])
df_lst['Dam_status'] = df_lst['Dam_status'].replace({'positive': 'Dam', 'negative': 'Non-dam'})
# Visualization
fig, axes = plt.subplots(4, 1, figsize=(12, 18))
for ax, metric, title in zip(axes, metrics, titles):
sns.lineplot(data=df_lst, x="Image_month", y=metric, hue="Dam_status", style="Dam_status",
markers=True, dashes=False, ax=ax)This function:
- Converts Earth Engine data to DataFrame format
- Standardizes data types (numeric months, years)
- Applies proper labeling for visualization
- Creates time series plots with confidence intervals (95% by default)
- Computes statistical significance between dam and non-dam areas
- Generates exportable visualizations and data tables
Output: Interactive visualizations and downloadable CSV data
The application extensively uses Google Earth Engine for geospatial analysis. Key integration points include:
One of the most critical patterns is batch processing to manage memory:
total_count = Dam_data.size().getInfo()
batch_size = 10
num_batches = (total_count + batch_size - 1) // batch_size
for i in range(num_batches):
# Get current batch
dam_batch = Dam_data.toList(batch_size, i * batch_size)
dam_batch_fc = ee.FeatureCollection(dam_batch)
# Process batch
# ...This pattern:
- Divides large collections into manageable batches
- Processes each batch independently
- Combines results after processing
The Land Surface Temperature calculation demonstrates complex Earth Engine operations:
def robust_compute_lst(filtered_col, boxArea):
# Compute NDVI
ndvi = img.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI')
# Calculate vegetation fraction
fv = ndvi.subtract(ndvi_min).divide(ndvi_max.subtract(ndvi_min)).pow(2).rename('FV')
# Calculate emissivity
em = fv.multiply(0.004).add(0.986).rename('EM')
# Apply radiative transfer equation
lst = thermal.expression(
'(TB / (1 + (0.00115 * (TB / 1.438)) * log(em))) - 273.15',
{'TB': thermal, 'em': em}
).rename('LST')
return lstCloud masking is essential for reliable analysis:
def cloud_mask(image):
qa = image.select('QA_PIXEL')
mask = qa.bitwiseAnd(1 << 3).eq(0).And(
qa.bitwiseAnd(1 << 5).eq(0))
return image.updateMask(mask)The application creates several visualization types:
fig, axes = plt.subplots(4, 1, figsize=(12, 18))
metrics = ['NDVI', 'NDWI_Green', 'LST', 'ET']
titles = ['NDVI', 'NDWI Green', 'LST (°C)', 'ET']
for ax, metric, title in zip(axes, metrics, titles):
sns.lineplot(data=df_lst, x="Image_month", y=metric, hue="Dam_status",
style="Dam_status", markers=True, dashes=False, ax=ax)
ax.set_title(f'{title} by Month', fontsize=14)
ax.set_xticks(range(1, 13))def melt_and_plot(df, metric, ax):
melted = df.melt(['Image_year','Image_month','Dam_status'],
[f"{metric}_up", f"{metric}_down"],
'Flow', metric)
melted['Flow'].replace({f"{metric}_up":'Upstream',
f"{metric}_down":'Downstream'},
inplace=True)
sns.lineplot(data=melted, x='Image_month', y=metric,
hue='Dam_status', style='Flow',
markers=True, ax=ax)To add new features to the application:
-
Add new Earth Engine functions:
- Create functions in the appropriate service module
- Ensure proper error handling
- Test processing on small datasets first
-
Add new UI components:
- Add new sections to the appropriate Streamlit page
- Use
st.session_stateto maintain state - Follow the step pattern of existing code
-
Add new metrics:
- Modify the
compute_all_metrics_LST_ETfunction - Add processing code for the new metric
- Update visualization code to include the new metric
- Modify the
The most common issue is memory limits in Earth Engine:
# Use batch processing
total_count = Dam_data.size().getInfo()
batch_size = 10 # Adjust this value based on data complexity
num_batches = (total_count + batch_size - 1) // batch_size
for i in range(num_batches):
# Process in batches
dam_batch = Dam_data.toList(batch_size, i * batch_size)
# ...Always implement proper error handling:
try:
# Process data
# ...
except Exception as e:
st.warning(f"Error processing batch {i+1}: {e}")
# Continue with next batch
continueUse cloud masking and select least cloudy images:
def get_monthly_least_cloudy_images(Collection):
months = ee.List.sequence(1, 12)
def get_month_image(month):
monthly_images = Collection.filter(
ee.Filter.calendarRange(month, month, 'month'))
return ee.Image(monthly_images.sort('Cloud_coverage').first())
monthly_images_list = months.map(get_month_image)
return ee.ImageCollection.fromImages(monthly_images_list)Happy coding!