A geosciences paper is fraudulent? I’m not surprised.

I usually try to steer clear of anything political in this blog because, frankly, I find the subject usually pretty distasteful, especially as it’s handled on the Internet. They say politics makes strange bedfellows, which is true, but they don’t mention the corollary that writing about politics makes a bunch of smelly, weird strangers jump into your bed. Frankly, I’d rather avoid the attention.

However, I just read a piece through Twitter that’s making me bend my rules slightly, as it intersects with my personal experience. In this piece published in The Free Press, an Earth scientist/geoscientist discusses how he changed his research on wildfires to focus on global warming as a cause, even though he didn’t think that was the most honest way of framing it. He talks about how changing the focus of his piece to global warming, instead of other wildfire-related factors that he thought were more important, allowed him to publish in a high impact journal (in this case, Nature) instead of a low impact one, and he says explicitly that he thinks this invisible editorial pressure to focus on global warming above all else was politically motivated.

Now, I don’t want to get into this guy’s research specifically1, but I want to say that his experience does not surprise me. I got my undergraduate degree in geosciences and I did a fair amount of research in geosciences, for an undergrad, at least. And, overall, I was not impressed. Finding out that a Nature paper in geosciences was edited for political purposes is about as surprising to me as finding out that two professional wrestlers who hate each other in the ring are friends in real life. There are a lot of shenanigans in that field.

Geosciences is in a weird place as a field. First of all, it’s way too big of a field. Geosciences covers meteorology, geology, oceanography, geography, microbiology, ecology, and physics. There are just way too many niches people can take, and there’s a lot of room for people to establish their own little citation kingdoms, especially when you get the giant 100 person papers. Nobody is incentivized to go against the consensus.

Like, when I was an undergrad, I took a research class on gas exchange and ecology, which is basically how different biomes create their own microclimates (e.g. forests release a lot of humidity through transpiration, which makes them more humid than the tundra). One of the papers we read was a collaboration effort where basically every North American university with an arboretum measured the gas output of their forest, with the idea of seeing whether deciduous-dominant forests would release different gasses than coniferous-dominant forests. Not brilliant, but whatever.

Anyways, when I was looking through the paper, I looked into their dataset, and found out that they just mislabeled their dataset. Like, literally ⅓ of the forests that were supposed to be deciduous were mislabeled as coniferous. It was a very obvious spreadsheet error that had huge impacts on the paper’s conclusions. But, when I brought this up to the professor, he brushed me off.

At the time, I was mad, but now I understand better. Yes, it probably is problematic that a paper about “deciduous vs. coniferous forests” was confused about which forests were which. But, my professor was up for tenure, and literally every single major university in North America was on this paper. It had to be in order to get the dataset. How was he going to go against the consensus?

This ties into the second problem: datasets. Geosciences is the study of the Earth. The Earth is big. Experiments are almost impossible to run. So, there are really only three types of papers:

a) We got a bunch of data, looked at the data, and saw something weird.

b) We got a bunch of data, backfit a model to it, and then predicted some wild things in the future

c) We got a bunch of data, backfit a model to it, and then extrapolated some things in the past

None of these are easily falsifiable. The closest you can usually come to falsifying any of these papers is usually by finding new data, like when the entire field of Antarctic ice sheet research had to grapple with their predictions of ice sheet collapse clashing with new data that showed the Antarctic ice sheet was actually growing. But, new data is really hard to come by. Everyone in the Antarctic ice sheet research community was working off of the same satellite measurements as everyone else, and making the same mistakes as everyone else.

This leads to some major problems. People are so desperate for data that they really misuse the data sets they do have. It’s a classic streetlight effect. One of the major research projects I did as an undergrad was proving that the use of tidal gauge datasets to measure the intensity of historic hurricanes was a mistake. Tidal gauges are present in all major harbors, and because they’re an accurate, hourly measure of local sea level that, in some cases, go back centuries, hurricane researchers had been using them as proxies for hurricane intensities (i.e. higher sea level rise means more intense hurricane). But, as I found, tidal gauges really weren’t meant for that sort of thing, and, in fact, break during intense hurricanes, registering ludicrously low sea levels even in intense hurricanes like Hurricane Sandy. So papers using these tidal gauge datasets were measuring the intensity of hurricanes with broken instruments, leading them to false conclusions. There’s no way to fix this, and tidal gauge data just shouldn’t be used to measure the intensity of hurricanes.

But, in geosciences, like in most subjects, you can’t publish a paper that says, “Don’t use these datasets to study this subject”, especially if you don’t have anything to replace the dataset with. Nobody wants to throw out their carefully digitized spreadsheets and just admit defeat for the topic in question. So, without any better dataset, people keep using the same flawed datasets and keep producing more bogus studies.

Like, for example, this study, published in 2020 with 9 citations, using tidal gauges in an effort to quantify flooding during Hurricane Sandy, or this one from 2021 with 10 citations on tidal gauges with regards to Hurricane Harvey, or this one from 2023 on tidal gauges and the past century of east coast hurricanes with 4 citations. Now, given that these people have access to satellite data and can literally check their flooding data with pictures (I had to use newspaper data for the historic hurricanes), they probably should take the step of realizing, like I did, that tidal gauges can’t be used to measure the intensity of hurricanes. But, there’d be no second step. They’d have to just stop writing about this topic that they’ve spent so much time and effort on. So, instead, they all just keep using the datasets that exist and protect them from any attack.

I saw this protectiveness over the datasets that do exist over and over again. Like, I took a class with our department head, and one of her big research projects was looking at the metabolism of bacteria in Antarctic lakes. The way she did this was by going to Antarctica, sticking electric probes in the water, and then extrapolating from changes in the voltage readouts what sort of chemical reactions the bacteria were creating.

The trip to Antarctica she had taken to get the latest round of data had proven super fruitful for her and her collaborators. They had gotten a whole series of papers out of them. Unfortunately, as I found when she assigned analyzing her research paper as homework, her electric probes were not meant to be used in extreme conditions. In fact, the manufacturer said in the Internet-accessible user’s manual that using those electric probes in extreme conditions could lead to anomalous readouts.

So, those changes in the voltage readouts could have just been, well, anomalous readouts caused by using the instruments in extreme conditions. Whoops! But, when I pointed this out to her, she brushed me off and told me to forget it. Because, you know, she had gone all the way to Antarctica to get this data. What was she going to do, go back and replicate it on the off chance she had to retract like 5 papers?

There were so, so many problems like this. When you took the problems of niches dominated by incestuous citation rings and limited data sets, and then added a whole sector of the country that thought one of the foundational theories of the field, global warming, was totally fraudulent and was determined to prove it, correcting the field ended up being basically impossible.

And that’s where the field still is today! I have a friend now who’s an active researcher in geosciences, and she says there are still whole sectors of geosciences that she has to avoid interacting with because they’re so bad. They’re dominated by researchers who all cite each other and write papers with each other, whose predictions can’t be falsified (pro tip: if you say something has a 30% chance of happening 100 years in the future, it is literally impossible to prove you wrong), and whose only opposition is Republicans, ironically strengthening these researchers’ dominance because nobody wants to get in bed with Mitch McConnell.

My friend just works in a small subfield that nobody pays attention to which she feels like she can be intellectually honest in. Her subfield also suffers from very limited datasets, but she thinks the mathematical/modeling basis of the subfield is strong enough that she can make real scientific progress without resorting to bullshit. I hope she’s right.

Meanwhile, as for the rest of geosciences, I hope that articles like the one linked above will scare the field into behaving correctly. However, I have a bad feeling that it won’t.

Although if I did, I’d talk about how much I dislike machine learning research papers like his for exactly the reason he describes. From the abstract, it seems like they just took a bunch of datasets, ran a bunch of correlations, and then wrote a just-so story about one correlation they found, which was, in this case, temperature, aridity, and wildfire. I think the relationship between these papers and “The morning routines of billionaires that can also make you a billionaire” blog posts is closer than anyone likes to admit.