I am currently looking for research fellows to participate in my summer 2025 undergraduate forestry data science research group (UFDS) at Bucknell University. This opportunity is open to any Bucknell undergraduate students who are not graduating seniors. The eight week program will tentatively run from Monday, June 2nd to Friday July 25th at Bucknell’s Dominguez Center for Data Science in Taylor Hall. Students will be paid $4,500 for full-time work and are eligible for on-campus housing at no cost, both of which are subject to both income tax and FICA tax withholding.
The summer 2025 UFDS application can be found here. The application is due no later than Friday, January 31st, 2025.
Students will work on problems generated by the US Forest Inventory and Analysis Program (FIA) and will collaborate directly with Research Statisticians and Foresters at FIA. FIA is responsible for monitoring the status and trends in forested ecosystems throughout the US. With the rise of new data sources, such as satellite imagery and large scale photography, and with the explosion of new statistical learning tools, a wealth of estimation techniques are available to consider. But with these new methods also come a load of important statistical questions related to robustness, bias, efficiency, and more!
Over the eight week period, students will work on approximately two projects in groups of 1-3. The overarching theme of the projects will be the development, evaluation, and distribution of statistical methods and tools for improving estimation of forest attributes. However, individual projects will vary from exploratory analyses to methodology comparisons to software and dashboard development. Here’s a rough timeline of the summer research process:
Random Forests are a powerful predictive tool, but in the remote sensing and environmental literature they are getting a bad rap for being prone to overfitting. The problem is that people are unknowingly feeding their models cluster-correlated data (i.e., data collected in spatially or temporally correlated clusters) which leads to highly exaggerated accuracies, amongst other things. Through a series of simulations, we will clearly illustrate this problem and demonstrate potential solutions.
In recent years, FIA has experienced greater need for estimates of forest parameters over smaller geographic regions. For example, the Forest Service manages wild fires and tries to estimate the impact of these fires on important forest attributes. This area of research is called “small area estimation.” This project will explore the utility of several different estimators for estimating forest attributes over small areas.
A previous fellow and I wrote the R data package pdxTrees
using data collected by the Portland Parks and Rec Department. I use this package extensively in my teaching and it is featured in Modern Data Science with R. Now it is time to package up tree data in Pennsylvania! In this project, we will create R
packages and tutorial materials with these data.
A: I have so many answers that I wrote a whole article about this question! Here’s the shortened version:
Learning by doing data science: Practicing data science can really help you develop as a data scientist.
Communication skills: You will have multiple opportunities to share your work (both in writing and orally) to your peers, your mentor (me!), the stakeholders, and novices. I will give you feedback to help you hone your communication skills.
Professional identity and belonging: Research can help strengthen your connection to the discipline of statistics.
Graduate school and career preparation/clarity: The experience will demystify what research is, helping you decide if you want to pursue an advanced degree. And, grad school or not, the tools and skills learned will help prepare you for your professional life after undergrad.
And, it is fun: The data are messy! The questions are vague! The answers are unknown! What more could you want?
A: At the start of a project, it is very difficult to predict whether or not it will result in a publication. And, for some projects, a journal article may not be the most useful final product. So, I can’t say with any certainty whether or not your work will be published but I can say that we will find ways for you to share the work. For example, the group will present their findings to FIA researchers and will be expected to participate in any relevant campus research presentation events. I will also strongly encourage you to submit the final technical report to the Undergraduate Statistics Research Project Competition and/or a video presentation to the Electronic Undergraduate Statistics Research Conference (eUSR). One of the 2022 projects won “Best Video Presentation” at this year’s eUSR! On top of all that, we will also look for relevant statistics and data science conferences for you to share the work.
Deliverables from previous projects have included journal articles, peer-reviewed technical reports, dashboards, and software development (links include an example of each).
A: The work will be highly collaborative. Three to four days a week we will have a team meeting in the morning where everyone presents their progress, discusses issues, and talks through their next steps. For the rest of the day, your time will likely be split between your two projects and will be a mix of coding, writing, problem-solving, and dealing with merge conflicts in GitHub.
A: All our work will be done using R
/RStudio
and git
/GitHub
. Previous experience with R
is required but previous experience with git
is not.
A: The projects will vary in terms of the computational and statistical skills needed but each research fellow should have prior experience coding in R
and building statistical models. Useful courses to have taken include an intro stats course (such as one of MATH 216, MATH 227, ANOP 102, PSYC 215, or ENGR 226), a coding course (such as one of MATH 230, CSCI 203 or ANOP 203), and a modeling course (such as one of MATH 217, CSCI 349, ANOP 330, or MECH 484). If you haven’t taken these courses, you are still encouraged to apply but should address your level of proficiency in R
and your experience with statistical modeling in your application. If you don’t have prior experience in R
but can code in another language (such as Python
), make sure to mention this in your application.
A: No! While some of my research students have been statistics majors, others have majored in other disciplines such as Economics or English. Coming from a different field often brings a very valuable and unique perspective!
During the summer of 2022, one of my research students, Jing Shang, created an artistic rendering of every team member’s favorite tree.