Hello world!

This is the first of many updates I plan to post to track my PhD journey. My goal with these is two-fold: (1) to document and keep track of the work I’ve done over time, and (2) to make my thoughts publicly available to help receive early feedback. We’ll see how it goes, but my aim is to post an update roughly every two weeks. It’s been fantastic to finally be back here in Cambridge! To set the scene – after I finished my Masters in Machine Learning and Machine Intelligence here back in 2022, I spent a few years in industry as a Machine Learning Engineer at Aerobotics in Cape Town, and am now starting my PhD in Computer Science under the supervision of Professor Anil Madhavapeddy in the Energy and Environment Group (EEG).

My goal for my first few weeks has just been to start exploring initial research directions I could take for my PhD. You may be wondering – did I not need to already have decided on a topic when I applied? Kind of – whilst yes I did have to submit a proposal for my application (see my proposal on automated bird call classification), this was primarily just to demonstrate my ability to (a) review the current literature in a domain of interest, (b) identify research gaps and (c) present a carefully thought-out research plan. My plan for my actual PhD, however, was always to iterate towards a topic for impactful research once I arrive after chatting with various people in the space – with the umbrella goal of combining my background in computer science and machine learning, with my passion for nature and sustainability, to contribute towards tackling the global biodiversity crisis. And what better place to do this than as part of the Cambridge Conservation Institute (CCI) (housed at the David Attenborough Building), home to world-leading interdisciplinary Cambridge researchers, alongside globally important conservation organisations including the United Nations Environmental Programme World Conservation Monitoring Centre (UNEP-WCMC), the International Union for Conservation of Nature (IUCN) and BirdlifeInternational. With that in mind, over the past couple weeks I’ve had meetings with various people at the EEG and CCI. It started with a fantastic first meeting with Anil in his office at Pembroke ahead of welcome drinks at the pub with a couple of his PhD students. After giving me some initial PhD advice, and some discussions about advancements in AI, we discussed initial directions I could take. Firstly, Anil mentioned there’s potential collaboration with the IUCN that he thought I could be a good fit for. They’ve recently received funding from Google.org to help with mapping data-deficient plant species. If I understand correctly, the IUCN guys want to see if there’s data hidden in the depths of scientific literature that could be retrieved at scale using AI. This would involve multi-modal LLMs to scan both text and graphs. There may be a particular focus on the Fynbos biome in South Africa’s Western Cape, which of course is close to my heart having studied and lived in Cape Town for several years. I still need to find out more details here but hope to have a meeting with some of the IUCN guys next week. To get an initial feel for what scanning through this literature would entail though, I downloaded the PLOS corpus and got a feel for the XML and how one would access figures. The other direction I could take is to build on the group’s incredibly exciting new geospatial foundation model: TESSERA. One potential exciting application for this is in habitat mapping (validated by our trip to search for hedgehog habitats that went viral). In this vein, my next meeting involved going to the David Attenborough Building (DAB) for the first time to meet with Professor David Coomes and Dr James Ball (who was funnily enough my cricket captain at Magdelelene back in 2022!) from the Plant Sciences department. They’re leading the habitat mapping work with TESSERA and are working towards a vision of a global habitat map of the earth at varying levels of granularity enabled by geospatial foundation models. David outlined their access to a great dataset of field data and aircraft-captured LIDAR scans in the Cairgormmes that he’d love to test out leveraging TESSERA to analyse. I think this could be a great first project for me. (Side note: the DAB is an extremely inspiring place, with a David Attenborough quote beneath the 17m high living wall wall as you walk in, saying “There are few things more important in the world today than what you are doing here”)

E1111CF3-EE7E-4F08-9BBE-33B0E60CE343_1_102_o.jpeg

I also joined in on a call between 3rd year PhD student Jovana Knezevic and David Ball, where Jovana took us through the exploratory work she’s been doing comparing TESSERA and AlphaEarth for change detection. Lastly, Sadiq invited me to meet with the Conservation Evidence (CE) team, where I met some of the amazing people working in this space including Sam Reynolds, Alec Christie and Bill Sutherland. They’ve been building AI pipelines to help scale out the CE’s hugely impactful work. This could also be an interesting area to get involved in at some point, and really impactful. I do think initially for my PhD though it’ll be useful to stay within the EEG group to get more of a lab feel, but I could perhaps pivot here at a later stage in the PhD.

Coming away from these meetings, I’m thinking as my starting point to look at applications of TESSERA for a few reasons. One, it looks like it’ll be an enjoyable, collaborative environment to be part of – with Frank (2nd year PhD), Jovana (3rd year PhD) and James (2nd year post-doc) all doing research in this space, in contrast to the Conservation Evidence space where there aren’t other PhD students working currently. Two, I like the direct real-world problems this work could contribute towards. The IUCN’s Red List is one of the world’s most important resource for conservation, and they are severely bottlenecked on their ability to do assessments by time and resources. TESSERA could be an incredibly powerful tool to help scale out mapping of habitats and plant distributions.

For this next week or two, my plan is to get my hands dirty with actually playing around with TESSERA, and to that end I’ll start with asking David and James for more details about the Scottish Cairgormes dataset. I’ll also be meeting with Dr Michael Dales, who has done really amazing work with habitat maps for LIFE, built fantastic python libraries in yirgacheffe and aoh-calculator, and worked closely with the IUCN, so I am very excited to pick his brain. I might also look at doing some experiments on tree species distribution mapping in Cambridge using iNaturalist data. Finally, it’s also the big IUCN conference this week which I’m considering attending virtually…

Lots of exciting things to explore. Onwards and upwards!