WEBVTT Kind: captions Language: en 00:00:00.640 --> 00:00:03.280 [ Music ] 00:00:03.280 --> 00:00:13.540 [ Silence ] 00:00:14.480 --> 00:00:16.220 … everyone. 00:00:16.230 --> 00:00:18.440 One short announcement. For next week, our speaker 00:00:18.440 --> 00:00:22.040 will be Men-Andrin Meier from Caltech who will discuss 00:00:22.040 --> 00:00:24.810 Early Warning – or, Early Earthquake Warning – Potential for 00:00:24.810 --> 00:00:28.530 Damage Mitigation versus Physical and Technical Limitations. 00:00:29.300 --> 00:00:32.980 And today it’s my pleasure to introduce Mark McClure as our speaker. 00:00:32.980 --> 00:00:37.200 Mark received his B.S. in chemical engineering, his M.S. in petroleum 00:00:37.200 --> 00:00:42.819 engineering, and Ph.D. in energy resources engineering, all from Stanford. 00:00:42.819 --> 00:00:47.309 And after graduating, he took a position at the University of Texas in Austin 00:00:47.309 --> 00:00:51.160 as an assistant professor in the department of petroleum and 00:00:51.160 --> 00:00:55.060 geosystems engineering where he worked for the next two years. 00:00:55.060 --> 00:00:57.620 And recently, Mark moved back to Palo Alto 00:00:57.629 --> 00:01:01.059 to found McClure Geomechanics to develop modeling tools for 00:01:01.059 --> 00:01:04.159 problems that combine fluid flow and geomechanics in the subsurface. 00:01:04.159 --> 00:01:08.390 And I think he’ll talk a little bit more about that journey today maybe? 00:01:08.390 --> 00:01:10.160 A little bit. 00:01:11.280 --> 00:01:13.360 And today we’ll be hearing about some of his research on 00:01:13.370 --> 00:01:16.380 potentially induced seismicity in Oklahoma and California. 00:01:16.380 --> 00:01:17.990 So welcome. 00:01:22.260 --> 00:01:25.500 - Thanks, Josie, for the introduction, and thanks, everybody, for having me. 00:01:25.500 --> 00:01:28.320 Yeah. So first I want to mention my co-authors on this work. 00:01:28.330 --> 00:01:30.220 I have three co-authors. 00:01:30.220 --> 00:01:34.070 And I also want to just start with a disclaimer that this is not quite finished, 00:01:34.070 --> 00:01:37.460 so this is not yet even submitted. Maybe I’ll submit in the next few weeks. 00:01:37.460 --> 00:01:39.760 So I’m a little nervous submitting – you know, presenting some 00:01:39.760 --> 00:01:41.420 not-quite-finished results. 00:01:41.420 --> 00:01:43.800 But it’s close enough that I’m comfortable presenting it. 00:01:43.800 --> 00:01:47.100 But I do want to just mention that as a disclaimer to start off. 00:01:47.100 --> 00:01:48.480 So my three co-authors. 00:01:48.490 --> 00:01:51.990 Riley Gibson is an undergraduate research assistant that worked for me 00:01:51.990 --> 00:01:54.580 for a year and a half at University of Texas on this project. 00:01:54.580 --> 00:01:56.750 Kit-Kwan Chiu was a master’s student 00:01:56.750 --> 00:01:59.310 who worked on this project for one semester. 00:01:59.310 --> 00:02:03.620 And then Rajesh Ranganath, he is a – he’s finishing his Ph.D. this year 00:02:03.620 --> 00:02:07.120 at the department – in the department of computer science at Princeton. 00:02:07.120 --> 00:02:10.480 So he is actually a statistician. I am not a statistician, 00:02:10.480 --> 00:02:13.900 so we’ve really joked on this project that he’s been my adviser. 00:02:13.900 --> 00:02:18.160 Because I’ve been kind of the person driving the work that’s been happening, 00:02:18.160 --> 00:02:21.560 but he’s the person that is really the expert on statistics that I’ve been 00:02:21.560 --> 00:02:24.140 going to, talking to, and has provided – 00:02:24.140 --> 00:02:27.069 really, all the good ideas are from him in this – in this project. 00:02:27.069 --> 00:02:31.420 And me and the two students have been the ones doing the work, basically. 00:02:32.820 --> 00:02:35.600 So, yeah, this is about identifying potentially induced seismicity 00:02:35.610 --> 00:02:39.790 and assessing statistical significance in Oklahoma and California. 00:02:39.790 --> 00:02:41.540 So I want to just start off with an abstract of 00:02:41.540 --> 00:02:43.930 what I’m going to say to start off. 00:02:43.930 --> 00:02:47.170 We looked at 34 years of wastewater disposal data 00:02:47.170 --> 00:02:50.400 and seismicity data in California and 14 years in Oklahoma. 00:02:50.400 --> 00:02:52.730 That’s all of the wastewater disposal 00:02:52.730 --> 00:02:56.020 and all of the earthquakes above a magnitude of completeness. 00:02:56.020 --> 00:02:59.410 We developed a statistical method to infer whether the wastewater disposal 00:02:59.410 --> 00:03:02.460 or associated activities have induced seismicity. 00:03:02.460 --> 00:03:04.590 And to quantify that with a p value. 00:03:04.590 --> 00:03:06.200 So I really want to emphasize the importance 00:03:06.200 --> 00:03:10.880 of assessing the possibility of a coincidental association. 00:03:10.880 --> 00:03:14.200 And the vast majority of the thinking that we’ve done on this project has been 00:03:14.200 --> 00:03:19.000 to make sure that we do that very carefully and in a way that is robust. 00:03:20.000 --> 00:03:22.800 And I also said wastewater disposal or associated activities. 00:03:22.800 --> 00:03:25.800 So specifically, we look for associations between 00:03:25.800 --> 00:03:28.100 wastewater disposal injection and earthquakes. 00:03:28.100 --> 00:03:31.200 But there is, of course, a correlation between wastewater disposal 00:03:31.200 --> 00:03:34.900 and other activities – for example, oil and gas depletion. 00:03:34.900 --> 00:03:39.200 So it may be that, if depletion is causing seismicity, our algorithm 00:03:39.200 --> 00:03:42.000 might still pick that up because it’s correlated with injection. 00:03:42.000 --> 00:03:44.600 So that’s something to be aware of. 00:03:44.600 --> 00:03:47.480 And just to get to the results, in Oklahoma, which is a place 00:03:47.480 --> 00:03:51.200 we already would have expected to find evidence of induced seismicity, 00:03:51.200 --> 00:03:55.000 the overall result is that there is a 1 in a million chance that – 00:03:55.000 --> 00:03:58.300 that’s about the p value we got – that the increase – the recent increase 00:03:58.300 --> 00:04:02.200 in seismicity in Oklahoma is not induced by injection. 00:04:02.200 --> 00:04:05.000 Then we looked at California. We didn’t know what we would find. 00:04:05.000 --> 00:04:07.400 And we maybe found some weak evidence 00:04:07.400 --> 00:04:10.200 looking at the entire state as a whole. 00:04:10.200 --> 00:04:12.600 It’s not a low enough statewide p value to be 00:04:12.600 --> 00:04:16.700 considered conclusive, but it’s maybe suggestive. 00:04:16.700 --> 00:04:20.800 We’ve also identified a few areas in the state that look like 00:04:20.800 --> 00:04:24.100 there might have been induced seismicity, but we can’t say for sure. 00:04:24.100 --> 00:04:27.820 And those areas could perhaps be the subject of more 00:04:27.820 --> 00:04:31.180 careful site-specific study in the future. 00:04:31.180 --> 00:04:34.580 I think the primary contribution of this work is actually the methodology – 00:04:34.580 --> 00:04:38.490 so the algorithm we developed in order to do the data analysis 00:04:38.490 --> 00:04:40.710 and to assess the p values. 00:04:40.710 --> 00:04:44.600 And I am excited to present this because I hope that other people 00:04:44.600 --> 00:04:47.191 will use this technique and apply it and use it in different ways. 00:04:47.191 --> 00:04:50.530 So you could not only apply it to other data sets, other areas, 00:04:50.530 --> 00:04:52.940 you could also do things like look for risk factors. 00:04:52.940 --> 00:04:58.030 So are there different things that increase the probability of induced seismicity? 00:04:58.030 --> 00:05:00.300 So that’s what I’ll talk about today. 00:05:00.300 --> 00:05:02.580 When I’ve presented this work to people and explained it to people, 00:05:02.590 --> 00:05:05.880 sometimes I get asked the question, it seems awfully complicated. 00:05:05.880 --> 00:05:08.350 Why is this so complicated? 00:05:08.350 --> 00:05:10.870 And I’m going to spend quite a bit of time explaining why I think 00:05:10.870 --> 00:05:13.780 this is a very difficult problem and why you have to do something 00:05:13.780 --> 00:05:15.170 that’s a little bit complicated. 00:05:15.170 --> 00:05:18.200 Because it’s a challenging statistical problem. 00:05:18.200 --> 00:05:21.190 Okay. So a quick outline of the whole talk. 00:05:21.190 --> 00:05:22.960 I’m going to give you some intro about who I am, 00:05:22.960 --> 00:05:25.340 my background, just so you guys know who I am. 00:05:25.340 --> 00:05:30.790 Review the statistical challenges for quantifying induced seismicity. 00:05:30.790 --> 00:05:32.360 Describe the approach, give the results, 00:05:32.360 --> 00:05:34.910 and then I’ll discuss some opportunities for future work. 00:05:34.910 --> 00:05:37.580 Personally, I am not planning to work on this in the future. 00:05:37.580 --> 00:05:40.720 I just left, and I’m trying to start a company, and all the time I’ve been 00:05:40.720 --> 00:05:43.340 putting into this has been a diversion from what I should have been doing. 00:05:43.340 --> 00:05:44.590 But I’ve really believed in this project, 00:05:44.590 --> 00:05:47.780 so I worked really hard on it to make sure that it’s come to a completion. 00:05:47.780 --> 00:05:50.970 But I hope other people will follow along on this work. 00:05:50.970 --> 00:05:53.170 So yeah, so my background – thanks for the introduction. 00:05:53.170 --> 00:05:55.350 Yeah, I was a – I was a professor at – 00:05:55.350 --> 00:05:58.620 assistant professor at University of Texas for almost three years. 00:05:58.620 --> 00:06:00.990 And then last fall, I moved back here to Palo Alto 00:06:00.990 --> 00:06:03.930 starting this software company, McClure Geomechanics LLC. 00:06:03.930 --> 00:06:07.770 So I’m a petroleum engineer, although I’ve worked a lot on geothermal energy, 00:06:07.770 --> 00:06:12.940 hydraulic fracturing, and induced seismicity modeling. 00:06:12.940 --> 00:06:16.070 My company is making a hydraulic fracturing simulator for oil and gas. 00:06:16.070 --> 00:06:18.400 I’m still working on it, so it’s not out yet. 00:06:18.400 --> 00:06:21.729 The reason that I live in Palo Alto, California, is because my wife, 00:06:21.729 --> 00:06:24.729 who is right here, she’s a resident at Stanford Hospital. 00:06:24.729 --> 00:06:28.070 So that’s why I’m starting an oil and gas software company in Palo Alto, 00:06:28.070 --> 00:06:29.840 which wouldn’t really make sense otherwise. 00:06:29.840 --> 00:06:33.040 I also have a little position as a consulting professor at Stanford. 00:06:33.040 --> 00:06:34.790 So I teach one class a year at Stanford. 00:06:34.790 --> 00:06:38.250 I teach Introduction to Petroleum Engineering. 00:06:38.250 --> 00:06:40.450 And so here are some of the people I’ve worked with over the years. 00:06:40.450 --> 00:06:43.520 Down here in the lower right, this is my geothermal research group 00:06:43.520 --> 00:06:45.880 with my former adviser, Roland Horne. 00:06:45.880 --> 00:06:47.730 So these are some of the people I worked with at Stanford. 00:06:47.730 --> 00:06:52.830 And this is my research group up here from University of Texas. 00:06:54.789 --> 00:06:57.940 And just to kind of mention some of the work I’ve done in the past, 00:06:57.940 --> 00:06:59.960 so you might note that what I’m talking about today 00:06:59.960 --> 00:07:03.070 is nothing like any of the other research that I’ve done before. 00:07:03.070 --> 00:07:05.660 So I generally do computational modeling, and some of my early work 00:07:05.660 --> 00:07:09.250 on induced seismicity was coupling rate-and-state friction 00:07:09.250 --> 00:07:11.890 earthquake simulation with fluid simulation. 00:07:11.890 --> 00:07:17.139 In fact, I talked about that research in this room at USGS several years ago. 00:07:17.680 --> 00:07:21.020 I’ve done a lot of work with discrete fracture networks, coupling, fluid flow, 00:07:21.030 --> 00:07:23.410 and the stresses induced by fracture opening and sliding. 00:07:23.410 --> 00:07:25.790 Used to be 2D, and recently we’ve done a lot of work 00:07:25.790 --> 00:07:28.740 in big, 3-dimensional discrete fracture networks. 00:07:28.740 --> 00:07:31.530 So that’s a lot of the work that I’ve done in the past. 00:07:31.530 --> 00:07:33.280 This project today is all about statistics. 00:07:33.280 --> 00:07:35.300 So it’s really different from what I’ve done, and that’s why 00:07:35.300 --> 00:07:38.500 it’s been so important to collaborate with the statistician, Rajesh. 00:07:38.500 --> 00:07:41.180 So, as I said, he’s basically been my adviser. 00:07:41.180 --> 00:07:43.820 And this has – this has been one of those projects that kind of just goes on forever. 00:07:43.820 --> 00:07:46.680 So I’ve been working on this for about three and a half years, 00:07:46.680 --> 00:07:48.540 and finally I think we’re going to be – 00:07:48.540 --> 00:07:51.780 we’re going to be submitting a paper and publishing it soon. 00:07:52.940 --> 00:07:54.979 Okay. So here are some of the challenges 00:07:54.979 --> 00:07:57.770 in statistically quantifying induced seismicity. 00:07:57.770 --> 00:08:00.110 There are a lot of earthquakes, and there’s a lot of injection wells. 00:08:00.110 --> 00:08:02.810 How can we know if a temporal or spatial correlation 00:08:02.810 --> 00:08:06.850 is actually induced seismicity or if it’s a coincidence? 00:08:06.850 --> 00:08:08.919 So some correlations are very strong to the point where 00:08:08.919 --> 00:08:11.801 it’s totally unambiguous, and there’s no question about it. 00:08:11.801 --> 00:08:15.840 So for example, there are a lot of enhanced geothermal systems projects 00:08:15.840 --> 00:08:18.210 where there’s not really any seismicity going on. 00:08:18.210 --> 00:08:20.289 They drill a well, they start injecting at a really high rate, 00:08:20.289 --> 00:08:21.979 and there’s thousands of earthquakes that occur 00:08:21.979 --> 00:08:24.790 right around the well immediately when they start injecting. 00:08:24.790 --> 00:08:27.699 So there’s no question that that was induced seismicity. 00:08:27.699 --> 00:08:31.289 But there’s a lot of other cases that are on the border, and it’s not as clear. 00:08:31.289 --> 00:08:33.490 So Davis and Frohlich and Frohlich et al., those are two 00:08:33.490 --> 00:08:36.409 recent papers that listed off kind of a checklist of 00:08:36.409 --> 00:08:40.270 these are the properties of what we think induced seismicity might look like. 00:08:40.270 --> 00:08:42.550 And I 100% think that’s a good approach. 00:08:42.550 --> 00:08:44.700 But the limitation to that is that it’s qualitative. 00:08:44.700 --> 00:08:48.500 So it’s still a subjective interpretation. We might say, well, this appears to 00:08:48.500 --> 00:08:51.330 have some of the properties that we would expect from induced seismicity. 00:08:51.330 --> 00:08:55.250 But how do we really know it wasn’t a coincidence? 00:08:55.250 --> 00:08:57.870 We can do a detailed site-specific study, 00:08:57.870 --> 00:09:00.580 integrate geomechanical data, all sorts of other reservoir data. 00:09:00.580 --> 00:09:04.360 That’s going to be expensive and time- consuming and require a lot of data. 00:09:04.360 --> 00:09:06.350 And you might still end up with something that’s 00:09:06.350 --> 00:09:08.110 not a 100% airtight case. 00:09:08.110 --> 00:09:12.220 So this Hornbach et al. paper from last year was very interesting. 00:09:12.220 --> 00:09:14.700 I thought it was pretty persuasive that they argued that this induced – 00:09:14.700 --> 00:09:18.060 the seismicity near Azle, Texas, appeared to be induced. 00:09:18.060 --> 00:09:20.620 But then the Texas Railroad Commission, the regulatory agency, 00:09:20.620 --> 00:09:24.010 took a look at it, had a hearing, allowed the companies that 00:09:24.010 --> 00:09:27.899 were injecting to rebut, and they concluded that they were not willing 00:09:27.899 --> 00:09:30.820 to ask those companies, or order those companies, to shut in the well. 00:09:30.820 --> 00:09:34.050 So they found that it wasn’t 100% persuasive. 00:09:34.050 --> 00:09:39.660 So even with the gathering of all that data, it can still turn out to be subjective. 00:09:39.660 --> 00:09:41.839 So the approach that I’m trying to use is to 00:09:41.839 --> 00:09:46.399 use a very large data set to address the question on at least a statistical basis. 00:09:46.399 --> 00:09:49.649 So number one, we can identify by just looking at a lot of data – 00:09:49.649 --> 00:09:51.460 perhaps identify areas where there might have been 00:09:51.460 --> 00:09:54.600 induced seismicity that we haven’t previously noticed. 00:09:54.600 --> 00:09:57.410 But number two, we want to assess the overall statistical significance. 00:09:57.410 --> 00:10:01.149 So are there so many associations that it’s a lot more than we 00:10:01.149 --> 00:10:04.690 would have expected to happen by chance already? 00:10:04.690 --> 00:10:06.970 So problem number one. 00:10:06.970 --> 00:10:10.120 Earthquakes and injection wells are not located at random. 00:10:10.120 --> 00:10:14.020 The first problem is that could create the danger of a spurious correlation. 00:10:14.020 --> 00:10:16.880 So that’s a problem with the cross-sectional study design. 00:10:16.880 --> 00:10:20.070 If you look at differences between places. 00:10:20.070 --> 00:10:24.420 So for example, in California, wastewater disposal tends to be 00:10:24.420 --> 00:10:27.660 located in the Central Valley where there is not a lot of earthquakes. 00:10:27.660 --> 00:10:29.480 There’s an underlying causal process. 00:10:29.480 --> 00:10:31.529 It’s the fact that the Central Valley is an extensional basin. 00:10:31.529 --> 00:10:33.220 That’s where oil has tended to accumulate. 00:10:33.220 --> 00:10:36.200 It also tends to mean that there’s not a lot of earthquakes there. 00:10:36.200 --> 00:10:40.920 So if you just made a plot of amount of earthquakes versus injection volume, 00:10:40.920 --> 00:10:43.761 you would see an inverse correlation. In a cross-sectional design study, 00:10:43.761 --> 00:10:46.380 you might conclude that injection prevents earthquakes. 00:10:46.380 --> 00:10:48.730 The problem is that there’s a spurious correlation 00:10:48.730 --> 00:10:52.350 because there’s an underlying causal process that’s creating a correlation. 00:10:52.350 --> 00:10:54.400 The correlation is not a causal relationship. 00:10:54.400 --> 00:10:56.780 So we have to worry about that and think about, how can we 00:10:56.780 --> 00:11:00.500 analyze the data in a way that accounts for that possibility. 00:11:00.500 --> 00:11:04.020 Second issue is correlated observations weaken statistical significance. 00:11:04.020 --> 00:11:06.850 So for example, if two wells are close together, 00:11:06.850 --> 00:11:09.459 the observation at those two wells is correlated. 00:11:09.459 --> 00:11:14.180 So the probability of two wells that are adjacent, that they’ll be near another – 00:11:14.180 --> 00:11:17.630 near an earthquake is almost 100%. They’ll have the same observation. 00:11:17.630 --> 00:11:20.380 So when you’re calculating the statistical significance, 00:11:20.380 --> 00:11:22.899 if you don’t account for that, you’ll overestimate statistical significance. 00:11:22.899 --> 00:11:25.760 So for example, in the Weingarten paper recently from last year, 00:11:25.760 --> 00:11:29.000 looking at induced seismicity across central United States, they calculated 00:11:29.000 --> 00:11:32.030 the p values assuming that all the wells were independent observations. 00:11:32.030 --> 00:11:35.010 But they’re not independent observations, and it’s a – 00:11:35.010 --> 00:11:38.810 would have led to an overestimation of the p values. 00:11:38.810 --> 00:11:40.620 So you have to assess the risk of false positives. 00:11:40.620 --> 00:11:43.170 You have to take into account how many hypotheses are tested. 00:11:43.170 --> 00:11:46.050 So if you’re looking at a huge number of wells and a huge number of earthquakes, 00:11:46.050 --> 00:11:51.220 then there’s a 5% chance, even if – by coincidence that a p value 00:11:51.220 --> 00:11:53.200 is going to be below 0.05. 00:11:53.200 --> 00:11:55.100 So you’ve got to be really careful about how to do that. 00:11:55.110 --> 00:11:58.720 So this Goebel paper from last year, they looked at induced seismicity 00:11:58.720 --> 00:12:00.090 across California. 00:12:00.090 --> 00:12:01.790 They looked at a lot of wells. They looked at a lot of earthquakes. 00:12:01.790 --> 00:12:04.560 And they found some earthquakes that happened near injection wells. 00:12:04.560 --> 00:12:07.700 There did seem to be an association. 00:12:07.700 --> 00:12:14.890 And they did, I think, attempt to calculate p values, but I felt that 00:12:14.890 --> 00:12:17.820 the discussion in that paper about how they did that was too short. 00:12:17.820 --> 00:12:19.740 There was two sentences in the paper explaining 00:12:19.750 --> 00:12:21.120 how they calculated the p values. 00:12:21.120 --> 00:12:23.529 And I think that’s the crux of the entire problem. 00:12:23.529 --> 00:12:27.200 If you don’t do that, then it really is a significant issue. 00:12:27.200 --> 00:12:29.540 And finally, earthquakes occur in fat-tailed distribution, 00:12:29.540 --> 00:12:31.959 so they cluster very strongly in time and space. 00:12:31.959 --> 00:12:34.839 So you have to make sure you use a flexible enough statistical model. 00:12:34.839 --> 00:12:36.839 As a trivial example, a linear regression model, 00:12:36.839 --> 00:12:39.880 if you have one data point that’s very different from the other data points, 00:12:39.880 --> 00:12:43.000 that one data point dominates the outcome of the entire regression. 00:12:43.000 --> 00:12:46.760 And so you have to make sure that you do something that can handle that. 00:12:46.760 --> 00:12:50.890 So this is an example of an issue, or the difference between a cross-sectional 00:12:50.890 --> 00:12:54.130 study design and what we’re using, which is a longitudinal study design. 00:12:54.130 --> 00:12:56.459 So here’s a trivial example, and it’s made up. 00:12:56.459 --> 00:12:58.950 Site A has had 30 earthquakes in the last 10 years. 00:12:58.950 --> 00:13:00.260 And at that site, in the last 10 years, 00:13:00.260 --> 00:13:04.540 there’s been a million barrels of wastewater injected. 00:13:04.540 --> 00:13:07.329 Site B has had 10 earthquakes in the past 10 years, 00:13:07.329 --> 00:13:09.089 and there’s been 10 million barrels. 00:13:09.089 --> 00:13:12.300 So site A has more earthquakes and less injection. 00:13:12.300 --> 00:13:15.610 So if we used a cross-sectional study design, we just looked at 00:13:15.610 --> 00:13:18.040 cumulative volume injected versus number of earthquakes, 00:13:18.040 --> 00:13:19.830 you would see an inverse relationship. 00:13:19.830 --> 00:13:24.089 You might conclude that the injection is preventing earthquakes. 00:13:24.089 --> 00:13:26.890 Now, that could be a coincidence because we only have two data points. 00:13:26.890 --> 00:13:28.910 So that’s not very statistically significant. 00:13:28.910 --> 00:13:33.070 Or there could be an underlying causal process that’s causing that to occur. 00:13:33.070 --> 00:13:37.550 Like, for example, as in California, the injection wells are near oil reservoirs, 00:13:37.550 --> 00:13:40.300 which tend to be in places that don’t have natural seismicity. 00:13:41.320 --> 00:13:44.240 Now, an alternative study design is a longitudinal study design. 00:13:44.250 --> 00:13:47.770 In that study design, you’re going to look – within individuals 00:13:47.770 --> 00:13:50.740 in a population, you’re going to look at changes over time. 00:13:50.740 --> 00:13:53.050 So now we can look again at site A and site B. 00:13:53.050 --> 00:13:55.610 And now what we’ll see is that, in more recent years, injection has 00:13:55.610 --> 00:13:59.190 increased at site A, and the number of earthquakes has increased at site A. 00:13:59.190 --> 00:14:03.330 In site B, injection has been decreasing over time, and so has seismicity. 00:14:03.330 --> 00:14:06.120 So looking at the data in this way, longitudinally, now it actually 00:14:06.120 --> 00:14:08.910 looks like there might be induced seismicity in this data set. 00:14:08.910 --> 00:14:10.639 It’s the opposite conclusion we would have gotten 00:14:10.639 --> 00:14:13.740 from the cross-sectional study design. 00:14:13.740 --> 00:14:17.680 So if you want to establish causality from correlation, it’s important to 00:14:17.680 --> 00:14:22.380 establish that the treatment variable is random with respect to the observation. 00:14:22.380 --> 00:14:25.639 So in this case, the treatment variable is the injection. 00:14:25.639 --> 00:14:29.589 Now, where humans inject wastewater is not random 00:14:29.589 --> 00:14:32.490 with respect to where earthquakes occur. 00:14:32.490 --> 00:14:37.510 However, the timing of when do – when do humans inject wastewater 00:14:37.510 --> 00:14:40.260 is basically random with respect to natural seismicity. 00:14:40.260 --> 00:14:43.820 There’s no underlying causal relationship that could cause people to 00:14:43.820 --> 00:14:47.660 start injecting in the year when there was already going to be a natural earthquake. 00:14:47.660 --> 00:14:50.269 And so if we look at changes over time and associations over time, 00:14:50.269 --> 00:14:53.240 we can confidently state that there’s a causal relationship. 00:14:53.240 --> 00:14:54.980 But it has to be time series. 00:14:54.990 --> 00:14:58.010 So that’s why we’re using this longitudinal study design. 00:14:58.010 --> 00:15:00.600 Certainly not the only study that I’ve seen where people looked at 00:15:00.600 --> 00:15:04.510 changes over time, but I really want to emphasize how important I think that is. 00:15:04.510 --> 00:15:06.970 Next, the magnitude of completeness. 00:15:06.970 --> 00:15:10.010 The quality of the seismic network is a nonrandom function of time and space. 00:15:10.010 --> 00:15:12.290 Results can only be robust if you consider events above the 00:15:12.290 --> 00:15:14.950 completeness magnitude over the entire study area and duration. 00:15:14.950 --> 00:15:17.070 So again, I’m sorry to pick on the Weingarten paper. 00:15:17.070 --> 00:15:19.480 I think it was a really good paper, and I don’t necessarily disagree 00:15:19.480 --> 00:15:23.329 with their conclusions, but an issue was one of the results 00:15:23.329 --> 00:15:26.470 was more seismicity occurs near very high-rate wells. 00:15:26.470 --> 00:15:28.910 They did not – and in some of the results, 00:15:28.910 --> 00:15:31.770 they only used events above the magnitude of completeness. 00:15:31.770 --> 00:15:36.300 In that particular result, they used all of the earthquakes in the seismic catalog. 00:15:36.300 --> 00:15:39.610 That automatically opens you up to the possibility of a 00:15:39.610 --> 00:15:42.320 spurious correlation due to some other underlying process. 00:15:42.320 --> 00:15:44.970 So for example, what if the quality of the seismic array has been 00:15:44.970 --> 00:15:48.870 improving over time? And surely it has been – that’s generally the case. 00:15:48.870 --> 00:15:52.360 And also, it just so happens that very high-rate wells 00:15:52.360 --> 00:15:54.780 have been more common in recent years. 00:15:54.780 --> 00:15:57.320 If those two things were true, then even if there was no causal 00:15:57.320 --> 00:16:01.010 relationship between high-rate injection and seismicity, you would observe 00:16:01.010 --> 00:16:04.610 that there tends to be more earthquakes in the seismic catalog 00:16:04.610 --> 00:16:06.440 in the vicinity of high-rate wells. 00:16:06.440 --> 00:16:09.320 So it’s not to say that that result was not correct, 00:16:09.320 --> 00:16:13.110 but – which opened up to question because they didn’t account for 00:16:13.110 --> 00:16:15.840 the magnitude of completeness in that particular result. 00:16:15.840 --> 00:16:20.100 Okay. So what we’re doing is a multilevel cross-sectional study design. 00:16:20.100 --> 00:16:22.090 Multilevel means, instead of looking between members – 00:16:22.090 --> 00:16:24.600 well, what it means is we’re looking at statistics within members 00:16:24.600 --> 00:16:27.709 of a population, and then we’re looking at the distribution 00:16:27.709 --> 00:16:29.860 of those statistics across a full population. 00:16:29.860 --> 00:16:31.850 So that’s what we’re doing with the cross-sectional study design. 00:16:31.850 --> 00:16:34.750 We’re looking at members of a population over time. 00:16:34.750 --> 00:16:36.279 So we gridded up the study area. 00:16:36.279 --> 00:16:38.820 So we could have looked at things in terms of wells. 00:16:38.820 --> 00:16:41.620 The problem with looking at things in terms of wells and why we eventually 00:16:41.620 --> 00:16:44.820 decided not to do that is this problem that the wells are clustered. 00:16:44.820 --> 00:16:48.060 So well observations are not independent. 00:16:48.060 --> 00:16:50.430 So we looked – we gridded up the states. 00:16:50.430 --> 00:16:52.949 We made blocks that were 0.1 by 0.1 latitude by longitude, 00:16:52.949 --> 00:16:55.430 which is 14 miles by 11 miles squared. 00:16:55.430 --> 00:16:57.730 And in each grid block, we looked at relationships 00:16:57.730 --> 00:17:01.040 between injection and seismicity over time. 00:17:01.040 --> 00:17:04.340 We’re going to fit a statistical model to those observations. 00:17:04.340 --> 00:17:06.540 We’re going to assume that natural seismicity and the 00:17:06.540 --> 00:17:11.139 induced seismicity hazard is constant over time in a grid block, 00:17:11.139 --> 00:17:14.929 or what I mean by that is, the relationship between injection and 00:17:14.929 --> 00:17:21.039 seismicity, if it exists, is parameterized by a variable that is constant over time. 00:17:21.039 --> 00:17:23.009 But it’s different in each block, obviously, 00:17:23.009 --> 00:17:26.770 because seismic hazard is a function of space. 00:17:26.770 --> 00:17:31.090 Okay. We considered only injection wells classified as water disposal wells. 00:17:31.090 --> 00:17:34.289 So we did not include improved oil recovery wells, 00:17:34.289 --> 00:17:35.289 secondary oil recovery wells. 00:17:35.289 --> 00:17:37.330 We did not include geothermal injection wells. 00:17:37.330 --> 00:17:39.269 Certainly those might induce seismicity, 00:17:39.269 --> 00:17:41.850 but I think that they’re a different category. 00:17:41.850 --> 00:17:44.259 The hazard from those types of injection is different, 00:17:44.259 --> 00:17:46.639 and so we wanted to keep it focused on one thing. 00:17:46.639 --> 00:17:48.570 So we only looked at wastewater disposal wells. 00:17:48.570 --> 00:17:50.869 We used only earthquakes above the magnitude of completeness for the 00:17:50.869 --> 00:17:55.720 study area and time period, which was 3 for California and 2.9 for Oklahoma. 00:17:55.720 --> 00:17:58.380 In California, we only considered earthquakes that were shallower 00:17:58.380 --> 00:18:02.200 than 7 kilometers under the idea that it’s unlikely that injection 00:18:02.200 --> 00:18:04.940 is going to induce very deep seismicity. 00:18:04.940 --> 00:18:10.159 Although I’ll mention later – may be a complication with that. 00:18:10.159 --> 00:18:13.129 You might also point out that there could be grid effects because we used a grid. 00:18:13.129 --> 00:18:16.229 So we’re just looking at seismicity and injection within blocks. 00:18:16.229 --> 00:18:18.859 What if there’s an injection well at the edge of a block, and it induces 00:18:18.859 --> 00:18:22.809 seismicity in the adjacent block? So our analysis would miss that. 00:18:22.809 --> 00:18:25.899 So to test for that effect, we actually did the study twice. 00:18:25.899 --> 00:18:29.970 We offset the grid diagonally by a half grid block. 00:18:29.970 --> 00:18:32.710 And so that, if you had an injection well that was on the edge of the block in the 00:18:32.710 --> 00:18:35.970 first analysis, in the second analysis, it would be right in the center of the block. 00:18:35.970 --> 00:18:38.479 So we performed the analysis twice and compared the results. 00:18:38.479 --> 00:18:42.159 They were qualitatively pretty similar, but they’re not exactly the same. 00:18:42.159 --> 00:18:45.799 Still, I think that might have been an area where we might have improved the 00:18:45.799 --> 00:18:50.019 study is to do a little bit more work to deal with that block issue. 00:18:50.019 --> 00:18:52.519 And I think especially – one reason the block issue 00:18:52.519 --> 00:18:55.450 wasn’t as important is because we used pretty big blocks. 00:18:55.450 --> 00:18:57.629 But if we had used smaller blocks, and we were looking at a smaller 00:18:57.629 --> 00:19:01.419 spatial region, it would have been more important to deal with that better. 00:19:01.419 --> 00:19:03.590 So for example, maybe what we could have done is, 00:19:03.590 --> 00:19:05.049 if injection occurs on the edge of a block, 00:19:05.049 --> 00:19:07.840 we could have kind of counted some of the injection as in one block, 00:19:07.840 --> 00:19:10.190 and some of the injection in another block or something like that. 00:19:10.190 --> 00:19:11.920 But we didn’t do that. 00:19:11.920 --> 00:19:14.400 Okay. For the data, we used the California earthquake data 00:19:14.409 --> 00:19:17.549 from the USGS website – the ANSS comprehensive catalog. 00:19:17.549 --> 00:19:20.139 The well data is from the DOGGR website. 00:19:20.139 --> 00:19:23.679 Oklahoma Geological Survey website gave us the Oklahoma earthquake data. 00:19:23.679 --> 00:19:27.159 And for the Oklahoma well data, we used the data set assembled 00:19:27.159 --> 00:19:29.450 by Matthew Weingarten, which is very valuable. 00:19:29.450 --> 00:19:32.820 It’s very difficult to get that data off the Oklahoma website, and it 00:19:32.820 --> 00:19:36.409 wasn’t very well-organized, so I really appreciate him sharing the data. 00:19:36.409 --> 00:19:39.779 We did not de-cluster the seismic catalog. 00:19:39.779 --> 00:19:42.970 Not necessarily opposed to doing that, but the problem was that there are 00:19:42.970 --> 00:19:45.519 kind of embedded assumptions when you do de-clustering. 00:19:45.519 --> 00:19:47.359 So de-clustering is kind of based on the idea 00:19:47.359 --> 00:19:49.129 that there’s a main shock and an aftershock. 00:19:49.129 --> 00:19:51.840 So what if there’s an earthquake swarm? And suddenly you’re getting into 00:19:51.840 --> 00:19:54.369 subjective decisions about how to deal with things. 00:19:54.369 --> 00:19:56.929 And we felt more comfortable just using the data directly 00:19:56.929 --> 00:19:59.749 without changing it at all. 00:19:59.749 --> 00:20:03.389 Okay. And so here is just an example of kind of what the data might look like. 00:20:03.389 --> 00:20:05.999 So specifically what we did is we looked at the number 00:20:05.999 --> 00:20:08.789 of earthquakes in each block each year. 00:20:08.789 --> 00:20:10.119 And then we looked at the cumulative volume 00:20:10.119 --> 00:20:13.200 of wastewater disposal in each block in each year. 00:20:13.200 --> 00:20:16.600 So here is four locations. I’ll explain. 00:20:16.600 --> 00:20:19.499 So up here, this is the northeast part of Oklahoma City. 00:20:19.499 --> 00:20:23.669 This is actually the region that the Keranen et al. paper in 2014 looked at. 00:20:23.669 --> 00:20:25.299 They concluded that there was really strong evidence 00:20:25.299 --> 00:20:27.190 of induced seismicity in this region. 00:20:27.190 --> 00:20:31.460 So if you look at exactly that region, this plot kind of brings that out, 00:20:31.460 --> 00:20:35.759 that in this part of Oklahoma, there’s been an increase in wastewater disposal 00:20:35.759 --> 00:20:39.929 in recent years, and that’s tracked really well with this increase in earthquakes. 00:20:39.929 --> 00:20:45.009 So that’s – this is a block where our algorithm gave a very low p value 00:20:45.009 --> 00:20:47.070 that this looks like it’s probably induced seismicity 00:20:47.070 --> 00:20:49.970 because there’s this really strong temporal association. 00:20:49.970 --> 00:20:51.840 This block is interesting. This is actually one of the 00:20:51.840 --> 00:20:55.409 lowest p values in California, and it’s Hermosa Beach, California. 00:20:55.409 --> 00:20:57.230 So I think this actually would be a place that somebody ought to 00:20:57.230 --> 00:20:59.630 go do a more careful, site-specific study. 00:20:59.630 --> 00:21:02.330 There’s only four earthquakes in this data series, 00:21:02.330 --> 00:21:06.349 so it’s not that much seismicity, and they’re pretty small. 00:21:06.349 --> 00:21:11.049 But in this 34-year period of time, there was only four earthquakes 00:21:11.049 --> 00:21:12.539 above the magnitude of completeness that were 00:21:12.539 --> 00:21:16.190 shallower than 7 kilometers in this region. 00:21:16.190 --> 00:21:18.029 And that coincided with the six-year period 00:21:18.029 --> 00:21:20.099 when there was wastewater disposal occurring in that block. 00:21:20.099 --> 00:21:25.919 So the algorithm notes that and sees that that’s an unlikely coincidence. 00:21:25.919 --> 00:21:27.639 And so that’s – that comes out of the analysis. 00:21:27.639 --> 00:21:30.579 So these are examples of low-p-value blocks. 00:21:30.580 --> 00:21:33.799 Now, I don’t think that this is enough to conclude that that’s induced seismicity, 00:21:33.799 --> 00:21:37.759 but it’s enough that – probably worth taking a look at. Maybe it was. 00:21:37.759 --> 00:21:39.840 These are examples of not induced seismicity. 00:21:39.840 --> 00:21:42.860 So here is just a block near Gilroy, California. 00:21:42.860 --> 00:21:45.769 Here’s the number of earthquakes per year. It’s pretty random. 00:21:45.769 --> 00:21:47.059 Here’s wastewater disposal. 00:21:47.059 --> 00:21:48.989 We can see there was a big increase in this year. 00:21:48.989 --> 00:21:50.970 There’s no apparent effect on the seismicity. 00:21:50.970 --> 00:21:53.999 So the analysis does not find significance in this block. 00:21:53.999 --> 00:21:58.070 And then this block I’ve chosen to show kind of how difficult the data is. 00:21:58.070 --> 00:22:01.980 So this is a block that includes the 1994 Northridge earthquake. 00:22:01.980 --> 00:22:05.929 And note that this is only aftershocks and events shallower than 7 kilometers. 00:22:05.929 --> 00:22:10.989 You can see there’s one year that’s, like, 100, and then everything is just a few. 00:22:10.989 --> 00:22:15.190 So we have to make sure we use statistical analysis that isn’t dominated 00:22:15.190 --> 00:22:19.880 and confused by data that looks like that. And so we did do that. 00:22:19.880 --> 00:22:21.920 Okay. So I’m going to go through the details of the study. 00:22:21.929 --> 00:22:25.619 Here’s the overall study design. First, we define a statistical model 00:22:25.619 --> 00:22:28.799 that can reasonably describe the seismicity observations. 00:22:28.799 --> 00:22:31.210 And also relates injection to seismicity. 00:22:31.210 --> 00:22:35.289 Doesn’t have to be a perfect model, but it has to be reasonable. 00:22:35.289 --> 00:22:38.289 Second, to do a quality control on that statistical model, 00:22:38.289 --> 00:22:40.669 we used Gibbs sampling to perform a posterior predictive check 00:22:40.669 --> 00:22:43.850 to confirm that the model can reasonably represent the observations, 00:22:43.850 --> 00:22:47.289 both qualitatively and quantitatively. That’s just a quality control. 00:22:47.289 --> 00:22:50.210 Then, as actually part of the study, in each block, we find the 00:22:50.210 --> 00:22:53.419 maximum likelihood estimate for model parameters that either, 00:22:53.419 --> 00:22:57.340 number one, assume that injection might cause seismicity – and so there 00:22:57.340 --> 00:23:02.399 is an actual coefficient beta that relates that – or the null hypothesis model, 00:23:02.399 --> 00:23:04.659 which assumes that there is not any relationship. 00:23:04.659 --> 00:23:08.059 And so we can do maximum likelihood estimates for both of those models. 00:23:08.059 --> 00:23:10.419 Then what we do is we took – we can actually calculate 00:23:10.419 --> 00:23:13.710 the likelihood of the observation given those two estimates, 00:23:13.710 --> 00:23:16.649 and we can look at the ratio of the likelihoods. 00:23:16.649 --> 00:23:19.720 This is effectively what’s called a likelihood ratio test. 00:23:19.720 --> 00:23:22.549 And we can relate that to a test statistic, D. 00:23:22.549 --> 00:23:26.159 That test statistic, D, relates how much better the model fit becomes if we 00:23:26.159 --> 00:23:31.169 include this parameter for induced seismicity in the statistical model. 00:23:31.169 --> 00:23:34.769 So in these two cases, including a parameter that says that 00:23:34.769 --> 00:23:37.440 seismicity increases with injection volume 00:23:37.440 --> 00:23:40.129 will cause the statistical model to match the data better. 00:23:40.129 --> 00:23:41.739 In these two blocks, it won’t have any effect. 00:23:41.739 --> 00:23:44.700 So the relationship, D, accounts for that. 00:23:44.700 --> 00:23:48.299 Now, theoretically, there is – in the asymptotic case 00:23:48.299 --> 00:23:51.940 of large data, D, in the way that we defined it, 00:23:51.940 --> 00:23:54.470 would be distributed according to a chi-square distribution. 00:23:54.470 --> 00:23:56.070 So at first, we were going to try to get p values 00:23:56.070 --> 00:23:58.269 by comparing to a chi-square distribution. 00:23:58.269 --> 00:24:00.191 However, when we empirically started looking at it, 00:24:00.191 --> 00:24:02.919 we realized we didn’t have enough data, and that wasn’t always the case. 00:24:02.919 --> 00:24:06.690 So I’ll explain. Instead, what we did is we used resampling to effectively 00:24:06.690 --> 00:24:11.049 bootstrap out confidence intervals for D in each block, and we used those to 00:24:11.049 --> 00:24:15.080 make the p value calculations. So I’ll explain how to do that. 00:24:15.080 --> 00:24:18.080 Then finally, we have a p – we have a p value in each block. 00:24:18.080 --> 00:24:21.809 There is around 50 blocks in California and 50 to 60 blocks in Oklahoma 00:24:21.809 --> 00:24:25.909 that have both injection and seismicity that we can use in this analysis. 00:24:25.909 --> 00:24:27.489 Each of those is going to have a p value. 00:24:27.489 --> 00:24:30.979 So, on average, 5% is going to have a p value below 0.05. 00:24:30.979 --> 00:24:34.679 That means, on average, even by chance, two to three blocks 00:24:34.679 --> 00:24:37.739 in each state should be expected to have a p value below 0.05. 00:24:37.739 --> 00:24:40.719 Right, so we can’t just look at the low p values. You have to look at all of them. 00:24:40.720 --> 00:24:43.769 We want to combine them into one big statewide p value 00:24:43.769 --> 00:24:46.159 to say what’s the overall state result. 00:24:46.159 --> 00:24:49.179 So we do that, and then we performed an analysis in each state, 00:24:49.179 --> 00:24:52.090 and we also offset the grid, repeat the analysis. 00:24:52.090 --> 00:24:56.179 Okay. So here’s the statistical equation that we used. 00:24:56.179 --> 00:24:59.460 Now, the overall study design I’m describing doesn’t have to 00:24:59.460 --> 00:25:02.269 use this equation. It could use any equation. 00:25:02.269 --> 00:25:06.179 We picked this equation because it has a few properties that were desirable. 00:25:06.179 --> 00:25:08.420 Number one, we’re counting the number of earthquakes per year, 00:25:08.420 --> 00:25:13.799 so we need some kind of model that will return a non-negative integer value. 00:25:13.799 --> 00:25:16.989 I’ll show in a moment that this model can represent 00:25:16.989 --> 00:25:18.879 the data qualitatively and quantitatively. 00:25:18.879 --> 00:25:22.999 It includes a term for temporal correlation. 00:25:22.999 --> 00:25:25.359 And specifically what it’s assuming is a linear relationship 00:25:25.359 --> 00:25:28.830 between cumulative volume injected and induced seismicity. 00:25:28.830 --> 00:25:32.049 So I’m not wedded to this equation. This is the one that we chose to use. 00:25:32.049 --> 00:25:34.440 But you could use this overall workflow with any. 00:25:34.440 --> 00:25:36.380 So to go through the equation real fast. 00:25:36.380 --> 00:25:39.520 Here’s my cursor. Where’s my cursor? Here it is. 00:25:39.529 --> 00:25:44.139 So y-ij, that would be the number of earthquakes in block i in time step, 00:25:44.139 --> 00:25:45.710 or in year, j. 00:25:45.710 --> 00:25:48.379 And then we’re going to draw a number from a Poisson distribution. 00:25:48.379 --> 00:25:52.190 But this is not just a Poisson distribution because the rate parameter inside 00:25:52.190 --> 00:25:56.190 the Poisson distribution is, itself, the sum of two random numbers. 00:25:56.190 --> 00:25:57.940 So this is not a Poisson distribution. 00:25:57.940 --> 00:26:01.679 This is really a – I guess you’d call it a compound distribution. 00:26:01.679 --> 00:26:04.599 So this is the exponential of a normal distribution 00:26:04.599 --> 00:26:07.369 with mean zero and a variance of sigma-i. 00:26:07.369 --> 00:26:11.009 So that sigma-i expresses how variable from year to year 00:26:11.009 --> 00:26:13.789 is the natural seismicity. So if natural seismicity is 00:26:13.789 --> 00:26:17.549 pretty consistent from year to year, you have a lower sigma-i. 00:26:17.549 --> 00:26:23.739 The mu-i parameter, that is the – related to the background rate of seismicity. 00:26:23.739 --> 00:26:27.969 Then x-i, that’s – x-ij, that’s the amount of volume of 00:26:27.969 --> 00:26:31.740 wastewater disposal in the block in year j and block i. 00:26:31.740 --> 00:26:34.360 And then we multiply that by a coefficient, beta-i. 00:26:34.369 --> 00:26:36.499 So that coefficient beta expresses the relationship 00:26:36.499 --> 00:26:39.469 between injection and induced seismicity. 00:26:39.469 --> 00:26:42.760 So what we’re trying to do is just see, if we include this beta in the model, 00:26:42.760 --> 00:26:44.870 does it make the model a lot more predictive. 00:26:44.870 --> 00:26:47.249 Then we have a temporal correlation term out here, 00:26:47.249 --> 00:26:51.970 which accounts for temporal correlation from one year to the next. 00:26:51.970 --> 00:26:54.389 So for example, if there was a really big earthquake on December 31st, 00:26:54.389 --> 00:26:57.749 there would be a lot of aftershocks in the following year. 00:26:57.749 --> 00:26:59.799 So that’s what we’re trying to account for. 00:26:59.799 --> 00:27:03.279 If there’s longer-term temporal correlation over the course of many years, 00:27:03.279 --> 00:27:06.609 I’m not sure if that is to be expected from natural seismicity. 00:27:06.609 --> 00:27:09.049 We didn’t include that in the model. 00:27:10.980 --> 00:27:14.340 Okay. And I’ll skip some of the little details of exactly how 00:27:14.340 --> 00:27:16.850 we parameterized that temporal correlation term, 00:27:16.850 --> 00:27:22.169 but we did a pretty involved calculation to choose values of a and this 00:27:22.169 --> 00:27:27.469 sigma value here that were the maximum likelihood estimates based on 00:27:27.469 --> 00:27:30.989 basically the entire state of California over the whole time period. 00:27:30.989 --> 00:27:35.320 Okay. And I do want to emphasize again that, even though we are 00:27:35.320 --> 00:27:38.989 looking specifically for correlations between wastewater disposal volume 00:27:38.989 --> 00:27:42.479 and seismicity, because wastewater disposal is correlated with other 00:27:42.479 --> 00:27:46.999 activities, like extraction, we’re kind of implicitly also looking at that. 00:27:47.010 --> 00:27:52.330 Okay. So we used an optimization algorithm in each block to calculate 00:27:52.330 --> 00:27:56.139 the maximum likelihood estimate for this mu term, which is the 00:27:56.139 --> 00:27:58.630 background rate of seismicity, and this beta, which is the 00:27:58.630 --> 00:28:01.090 relationship between injection and seismicity. 00:28:02.080 --> 00:28:04.999 Let’s see. We also have to integrate over the sigma term. 00:28:04.999 --> 00:28:07.129 So we didn’t want to find a maximum likelihood estimate for that, 00:28:07.129 --> 00:28:10.919 so we integrated over a prior for that just using a very wide uniform 00:28:10.919 --> 00:28:14.369 distribution from zero to 10, so that encompasses all possible values, 00:28:14.369 --> 00:28:17.659 so it’s just a non-informed a prior that doesn’t affect the result. 00:28:17.659 --> 00:28:20.989 So we integrated over that. This is the likelihood. 00:28:20.989 --> 00:28:22.809 And then we find the maximum likelihood. 00:28:22.809 --> 00:28:25.570 What’s the best model fit? 00:28:25.570 --> 00:28:28.259 And within that, there’s two numerical integrations that have to take place. 00:28:28.259 --> 00:28:31.850 First, we numerically integrate over the prior for sigma. 00:28:31.850 --> 00:28:35.330 And second, even if we know sigma, what’s the likelihood of the data 00:28:35.330 --> 00:28:37.269 as a function of all the model parameters? 00:28:37.269 --> 00:28:42.559 And that requires a numerical integration over this equation. 00:28:42.559 --> 00:28:47.029 So it’s actually reasonably computationally intensive to do one 00:28:47.029 --> 00:28:53.180 block because you’re doing two nested numerical integrations, takes 30 minutes. 00:28:53.180 --> 00:28:57.940 So it takes a fair amount of CPU effort. 00:28:57.940 --> 00:29:00.229 The posterior predictive check – I won’t go into the details, 00:29:00.229 --> 00:29:04.809 but the idea is that we used a Gibbs sampler to generate the posterior 00:29:04.809 --> 00:29:09.969 distribution of the model parameters, which is the sigma, the mu, and the beta. 00:29:09.969 --> 00:29:12.429 And then what we do is we draw from that posterior distribution, 00:29:12.429 --> 00:29:15.669 we put it into the model, we use it to simulate the data, 00:29:15.669 --> 00:29:18.440 and then we compare the simulations to the real data and just make sure 00:29:18.440 --> 00:29:20.659 that the simulated data looks kind of like the real data. 00:29:20.659 --> 00:29:22.589 So first we looked at that in the aggregate. 00:29:22.589 --> 00:29:26.370 So we drew hundreds of different numbers from the posterior distribution. 00:29:26.370 --> 00:29:29.080 We did – for each of those, we did about 10 forward simulations, 00:29:29.080 --> 00:29:30.779 and then we aggregated it all together. 00:29:30.779 --> 00:29:33.690 This is – this is across the entire state of California. 00:29:33.690 --> 00:29:35.539 And we look at the frequency of observations. 00:29:35.539 --> 00:29:38.889 And this is a log scale. So the number of blocks 00:29:38.889 --> 00:29:42.029 where there was zero earthquakes is around 90%. 00:29:42.029 --> 00:29:44.149 So about 90% of the blocks, in any given year, 00:29:44.149 --> 00:29:45.760 didn’t have an earthquake. 00:29:45.769 --> 00:29:47.899 About 8% of the time, there was one, and so on. 00:29:47.899 --> 00:29:51.839 And so what we see is that the red is the simulation, and the blue is the real data. 00:29:51.839 --> 00:29:58.239 The red is matching the data very well. So looking at the big overall scale, 00:29:58.239 --> 00:30:01.330 the model is capable of representing the 00:30:01.330 --> 00:30:03.960 frequency distribution of the observations. 00:30:03.960 --> 00:30:07.210 Then looking at it just qualitatively, so taking that block from Oklahoma – 00:30:07.210 --> 00:30:08.940 this is a log scale, so it looks different. 00:30:08.940 --> 00:30:10.889 But that’s the same Oklahoma block we looked at before. 00:30:10.889 --> 00:30:15.649 There was an increase in injection in later years. Increase in seismicity. 00:30:15.649 --> 00:30:19.549 And these are just four forward simulations of the seismicity. 00:30:19.549 --> 00:30:23.309 It looks – you know, I don’t know that you could – it looks – 00:30:23.309 --> 00:30:26.539 I’d say that the relationship is not quite as actually strong in the data, 00:30:26.539 --> 00:30:32.590 so we do see some seismicity occurring before this large increase in injection. 00:30:32.590 --> 00:30:35.859 But nevertheless, it looks qualitatively like the data. 00:30:35.859 --> 00:30:39.749 Okay. So that just tells us that we can trust the model to at least be reasonable. 00:30:39.749 --> 00:30:41.800 It doesn’t have to be perfect. 00:30:41.800 --> 00:30:43.219 All right. So a likelihood ratio test. 00:30:43.219 --> 00:30:46.589 As I said, we calculate the maximum likelihood estimate for mu. 00:30:46.589 --> 00:30:48.240 We do that assuming that beta is zero. 00:30:48.240 --> 00:30:52.669 And then we do it again, jointly for mu and beta, not assuming that beta is zero. 00:30:52.669 --> 00:30:55.869 We compare the likelihood of the two models, and the test statistic, D, 00:30:55.869 --> 00:30:58.619 is 2 times the natural log of the ratio of the likelihoods. 00:30:58.619 --> 00:31:01.950 So theoretically, for a large amount of data, asymptotically, D would be 00:31:01.950 --> 00:31:03.719 distributed according to a chi-square distribution. 00:31:03.719 --> 00:31:07.200 So you can take that D and directly infer a p value. 00:31:07.200 --> 00:31:09.469 However, when we looked at it empirically, we realized that 00:31:09.469 --> 00:31:12.419 we didn’t have enough data, and it’s not chi-square. 00:31:12.419 --> 00:31:14.879 So we did this resampling-based confidence intervals for D. 00:31:14.879 --> 00:31:18.359 So here’s what did. We took all blocks in a state. 00:31:18.359 --> 00:31:22.690 We left the seismicity data unchanged, but then we went to some other 00:31:22.690 --> 00:31:26.820 randomly chosen block in the state, and we pulled in that injection data. 00:31:26.820 --> 00:31:29.720 So we just swapped the injection data to different blocks. 00:31:29.720 --> 00:31:34.619 And then we also offset that injection data by a randomly chosen year so that – 00:31:34.619 --> 00:31:41.549 the point of that is, we’re now applying some random injection to the block. 00:31:41.549 --> 00:31:44.379 That injection is qualitatively like real injection because 00:31:44.379 --> 00:31:46.679 we’ve actually just taken it from a different block. 00:31:46.679 --> 00:31:49.159 But there’s no relationship between injection and seismicity anymore. 00:31:49.159 --> 00:31:50.710 So what we’ve done is, we’ve generated data for which 00:31:50.710 --> 00:31:53.940 the null hypothesis is true, that there can’t possibly be 00:31:53.940 --> 00:31:57.059 any association between the injection and the seismicity 00:31:57.059 --> 00:31:59.979 because we just pulled in random injection from some other area, 00:31:59.979 --> 00:32:02.220 and then we’ve offset it temporally. 00:32:02.220 --> 00:32:05.379 So we do that. And then we re-run our analysis. 00:32:05.379 --> 00:32:11.059 We recalculate the likelihood, including the possibility of induced seismicity. 00:32:11.060 --> 00:32:12.679 And then we re-run the likelihood 00:32:12.679 --> 00:32:14.440 throwing out the possibility of induced seismicity. 00:32:14.440 --> 00:32:16.849 We look at that likelihood ratio statistic, D. 00:32:16.849 --> 00:32:19.190 And then we do that 100 times in each block. 00:32:19.190 --> 00:32:22.749 So we have an empirical – within each block, 00:32:22.749 --> 00:32:27.899 an empirical distribution of what D would be by chance. 00:32:27.899 --> 00:32:29.330 So here’s an example of how that works. 00:32:29.330 --> 00:32:31.179 Here’s two different blocks. 00:32:31.179 --> 00:32:33.779 So this is the CDF of the D values. 00:32:33.779 --> 00:32:38.529 So if we look on the right first, so these are – this is the CDF of the D values. 00:32:38.529 --> 00:32:44.979 So in this particular block, about 20% of the blocks had a D value above 2. 00:32:44.979 --> 00:32:48.700 That’s drawn from data, which we’ve designed to 00:32:48.700 --> 00:32:52.720 not have any association between injection and seismicity. 00:32:52.720 --> 00:32:55.590 So that allows us to assess, what’s the probability of D? 00:32:55.590 --> 00:32:58.690 If we look at the real data, and we get a D value of 2, well, we can say, well, 00:32:58.690 --> 00:33:01.669 there was a 20% chance that would have happened by chance. 00:33:01.669 --> 00:33:05.440 So in this particular block, the actual data is consistent with 00:33:05.440 --> 00:33:08.499 the absolute lowest – or, rather, the largest D value that we saw 00:33:08.499 --> 00:33:10.320 in our entire randomized data set. 00:33:10.320 --> 00:33:14.299 So this is a block where the actual D value is around 8.7, 00:33:14.299 --> 00:33:15.940 so the p value is less than 0.02. 00:33:15.940 --> 00:33:19.109 So this is a block where we have high confidence 00:33:19.109 --> 00:33:21.639 that it was a strong association. 00:33:21.639 --> 00:33:23.889 Now, in this block, it’s pretty different. 00:33:23.889 --> 00:33:29.549 In this block, the actual D value is only about 0.15, so it’s a lot lower. 00:33:29.549 --> 00:33:32.940 And we found that in about 25% of the randomized data, 00:33:32.940 --> 00:33:34.239 the D value was that or larger. 00:33:34.239 --> 00:33:37.039 So this is how we can get the p values, and we have to do that with the 00:33:37.039 --> 00:33:40.859 resampling because the analytical method was dependent on 00:33:40.859 --> 00:33:44.289 having a large data set, which we didn’t have. 00:33:44.289 --> 00:33:48.690 So here is a summary of the results. So this is a CDF of the p values in 00:33:48.690 --> 00:33:51.770 Oklahoma and California with the original grid and the offset grid. 00:33:51.770 --> 00:33:54.309 And it’s compared to the uniform distribution. 00:33:54.309 --> 00:33:56.940 So this orange line here is the uniform distribution. 00:33:56.940 --> 00:34:02.109 What that means is that if we genuinely had no association at all – 00:34:02.109 --> 00:34:05.750 no relationship between the injection and the seismicity, then we would 00:34:05.750 --> 00:34:09.829 expect that this CDF of p values would fall approximately along 00:34:09.829 --> 00:34:12.619 the uniform distribution. 00:34:12.619 --> 00:34:14.990 So if we look at these Oklahoma analyses, 00:34:14.990 --> 00:34:18.730 they fall far below the uniform distribution. 00:34:18.730 --> 00:34:22.960 So what this is saying, for example, in both of the Oklahoma analyses, 00:34:22.960 --> 00:34:27.049 20% of the blocks had a p value below 0.05. 00:34:27.049 --> 00:34:30.149 You would expect that 5% of the blocks would have a p value below 0.05. 00:34:30.149 --> 00:34:34.659 So there’s far more blocks with low p value than you’d expect by chance. 00:34:34.659 --> 00:34:39.579 Looking at it some more, maybe taking it a 0.2% - 0.2 p value, 00:34:39.579 --> 00:34:41.950 half of the blocks had a p value below 0.2. 00:34:41.950 --> 00:34:46.059 You’d expect only 20% to have a p value below 0.02. 00:34:46.059 --> 00:34:49.600 So the distribution of the p values in Oklahoma is very, very strongly 00:34:49.600 --> 00:34:52.869 suggesting – and I’ll show you how to quantify that in a moment – that it was 00:34:52.869 --> 00:34:56.950 statistically significant, then, across the state, there’s induced seismicity. 00:34:56.950 --> 00:35:01.440 Now, looking at Oklahoma – and also, up here at the top, it kind of flattens out. 00:35:01.440 --> 00:35:04.269 If anyone wants to ask a question about it, I can explain why that occurred, 00:35:04.269 --> 00:35:06.490 but it’s not particularly important. 00:35:06.490 --> 00:35:10.099 It doesn’t have a big impact on the results. 00:35:10.099 --> 00:35:12.319 Looking at Oklahoma – or, sorry, looking at California, 00:35:12.319 --> 00:35:14.640 on the original grid, this is the line for California. 00:35:14.640 --> 00:35:18.019 So it absolutely is falling below the uniform distribution. 00:35:18.019 --> 00:35:20.150 With the shifted grid, it doesn’t – it also falls below 00:35:20.150 --> 00:35:22.619 the uniform distribution, but not as much. 00:35:22.619 --> 00:35:25.579 They’re pretty similar down here, actually. 00:35:25.579 --> 00:35:28.900 Okay, so these results are going to change because we’re actually 00:35:28.900 --> 00:35:33.099 still doing more calculations to improve our empirical p values. 00:35:33.099 --> 00:35:39.399 So this is not the final figure, but this is, I think, in the ballpark of the final figure. 00:35:39.410 --> 00:35:42.310 Okay. And so I’ll show in a minute how to take this CDF and 00:35:42.310 --> 00:35:45.670 actually turn that into one number – a statewide p value. 00:35:46.760 --> 00:35:49.779 Okay. So each state has between 50 and 60 blocks with both injection 00:35:49.779 --> 00:35:53.380 and seismicity that we’re able to run this analysis on. 00:35:53.380 --> 00:35:56.680 So as I said, on average, 5% should have p value below 0.05. 00:35:56.680 --> 00:35:59.119 That’s about two to three blocks in each – in each state. 00:35:59.119 --> 00:36:02.550 So we have to look at that overall distribution of p values. 00:36:02.550 --> 00:36:05.070 So if the null hypothesis is true, the p values should be 00:36:05.070 --> 00:36:07.900 uniformly distributed between zero and 1. 00:36:07.900 --> 00:36:11.650 So it turns out – this is a statistics trivia fact – that the negative 00:36:11.650 --> 00:36:14.849 of the natural log of the product of N numbers that are uniformly 00:36:14.849 --> 00:36:18.800 distributed between zero and 1, which is true of the p values 00:36:18.800 --> 00:36:21.670 if the null hypothesis is true, that’s distributed according to 00:36:21.670 --> 00:36:25.580 the gamma function with a shape factor of N and a scale factor of 1. 00:36:25.580 --> 00:36:28.410 So we can use that property to calculate the overall p values. 00:36:28.410 --> 00:36:31.480 We can take the product of those p values, we can take the 00:36:31.480 --> 00:36:33.620 negative natural log, and then we can compare to 00:36:33.620 --> 00:36:36.730 the gamma distribution and infer a statewide p value. 00:36:36.730 --> 00:36:40.369 Okay. So looking at Oklahoma first, the overall p value with the 00:36:40.369 --> 00:36:43.400 original block was around 10 to the minus 6, saying that 00:36:43.400 --> 00:36:45.940 if the null hypothesis is true, that there’s no induced seismicity, 00:36:45.940 --> 00:36:49.109 then it’s about a 1 in a million chance 00:36:49.109 --> 00:36:51.550 that what we observed would have happened by coincidence. 00:36:51.550 --> 00:36:54.730 So that’s a virtually certain statistical significance. 00:36:54.730 --> 00:36:56.771 The lowest p value, or the highest value of D, 00:36:56.771 --> 00:36:59.480 the highest likelihood ratio, is that block northeast of 00:36:59.480 --> 00:37:00.940 Oklahoma City, which is the same one 00:37:00.940 --> 00:37:04.640 that Keranen already identified in a publication. 00:37:04.640 --> 00:37:07.119 Looking at a map – so up here, this is a map of 00:37:07.119 --> 00:37:08.650 the cumulative locations of earthquakes. 00:37:08.650 --> 00:37:12.280 You can see the red here is clustered around Oklahoma City. 00:37:12.280 --> 00:37:14.290 So that’s where the most seismicity is occurring. 00:37:14.290 --> 00:37:19.410 And remember that this data is only up through, I think, 2013 – I think. 00:37:19.410 --> 00:37:22.170 So I don’t have the last few years of data in this. 00:37:22.170 --> 00:37:25.089 Here’s cumulative injection volume. 00:37:25.089 --> 00:37:27.260 So it’s kind of this crescent is the highest injection volume, 00:37:27.260 --> 00:37:29.619 but it’s really distributed across the whole state. 00:37:29.619 --> 00:37:31.750 We look at the p values. We’re getting high p values 00:37:31.750 --> 00:37:36.609 east of Oklahoma City up here in this – near the border with Kansas. 00:37:36.609 --> 00:37:39.170 Again, this is also an area that people already identified 00:37:39.170 --> 00:37:42.230 as having probably been induced seismicity. 00:37:42.230 --> 00:37:44.019 And then actually, down here in the southern part of the state, 00:37:44.019 --> 00:37:45.349 we’re not getting high p values. 00:37:45.349 --> 00:37:47.210 So maybe a few blocks over here have – 00:37:47.210 --> 00:37:50.309 or, sorry, not high p values. Low p values. Significance. 00:37:50.309 --> 00:37:53.750 So it turns out that, down here where there is a little bit of seismicity 00:37:53.750 --> 00:37:56.549 south of Oklahoma City, the algorithm is not really 00:37:56.549 --> 00:37:59.730 seeing evidence of induced seismicity as much. 00:37:59.730 --> 00:38:02.670 And when we used the shifted grid, actually the results are extremely similar. 00:38:02.670 --> 00:38:05.870 The p value was also around 10 to the minus 6, 00:38:05.870 --> 00:38:08.960 and the locations of the low-p-value blocks is the same. 00:38:08.960 --> 00:38:14.250 So in this block, the shifted grid really had a very minimal importance. 00:38:14.250 --> 00:38:17.579 Looking here at Okla- – or, sorry, California, so here is injection volume. 00:38:17.579 --> 00:38:21.940 So we see a lot of injection in the Central Valley and also along the coast. 00:38:21.940 --> 00:38:24.789 Here is cumulative earthquakes, and the areas with the most red 00:38:24.789 --> 00:38:26.610 are actually areas where there’s been some of the 00:38:26.610 --> 00:38:29.970 biggest earthquakes in California in the past 34 years. 00:38:29.970 --> 00:38:32.640 And again, remember this is only 7 kilometers or shallower. 00:38:32.640 --> 00:38:35.700 If we look at the p values, this is where it gets interesting. 00:38:35.700 --> 00:38:37.350 One area is clustered around Coalinga. 00:38:37.350 --> 00:38:41.400 There’s definitely kind of a spatial region of Coalinga. 00:38:41.400 --> 00:38:44.839 And the other really low one is that one I’ve already showed near Hermosa 00:38:44.839 --> 00:38:49.089 Beach, which is only four events. They’re only around magnitude 4s. 00:38:49.089 --> 00:38:53.369 But there’s that really nice temporal correlation that is with the injection. 00:38:53.369 --> 00:38:56.519 So the Coalinga, that is something that’s already been published about, 00:38:56.519 --> 00:39:00.829 so there were – Segall and Art McGarr have both written papers about the idea 00:39:00.829 --> 00:39:05.200 that there might have been induced seismicity near Coalinga in the 1980s. 00:39:05.200 --> 00:39:07.700 The reason our algorithm is picking that up is there was a lot more 00:39:07.700 --> 00:39:10.030 wastewater disposal in the vicinity of Coalinga 00:39:10.030 --> 00:39:12.900 in the ’80s and early ’90s than there was after that. 00:39:12.900 --> 00:39:14.760 And that’s also when there was most of the seismicity. 00:39:14.760 --> 00:39:21.029 So the algorithm is just picking up on that fact that that’s what the data shows. 00:39:21.029 --> 00:39:25.230 So the – I don’t know, the algorithm, I think what it’s showing is that we 00:39:25.230 --> 00:39:28.349 could look more carefully at those sites, and it’s definitely picking up on that. 00:39:28.349 --> 00:39:31.069 We already knew about Coalinga as a possibility. 00:39:31.069 --> 00:39:32.750 We didn’t know about the Hermosa Beach thing. 00:39:32.750 --> 00:39:34.770 It’s worth taking a look at. 00:39:34.770 --> 00:39:37.290 Okay. And the shifted grid – oh, so in the overall p value 00:39:37.290 --> 00:39:40.260 for the state of California, and this is, I think, subject to change because 00:39:40.260 --> 00:39:43.540 I’m still running more simulations so that I have more simulation results 00:39:43.540 --> 00:39:48.810 to make those empirical distributions of D, but the current result is 00:39:48.810 --> 00:39:53.230 around 0.1 saying that there’s about a 10% chance that it’s by chance. 00:39:53.230 --> 00:39:58.099 So 0.1 is kind of like an unsatisfying result because it’s not really 00:39:58.099 --> 00:40:00.859 low so that you’re really confident, but it also seems subjective. 00:40:00.859 --> 00:40:03.150 So I think the conclusion that I would draw from that is 00:40:03.150 --> 00:40:09.030 it does look like there’s probably been some seismicity in California that’s 00:40:09.030 --> 00:40:12.420 been induced by injection or associated activity such as depletion. 00:40:12.420 --> 00:40:15.559 Because actually, what people have speculated about Coalinga 00:40:15.559 --> 00:40:17.730 is that it was depletion that could have induced seismicity. 00:40:17.730 --> 00:40:20.039 The reason that would have a correlation with injection 00:40:20.039 --> 00:40:23.320 is because depletion is associated with injection. 00:40:23.320 --> 00:40:26.079 So it’s not very strong, but it does suggest it might have happened. 00:40:26.079 --> 00:40:28.519 If it is there, it’s definitely not jumping out of the data. 00:40:28.519 --> 00:40:31.400 There’s not a ton of stuff coming out of the data. 00:40:31.400 --> 00:40:33.369 So if there is, it’s not a lot. 00:40:34.280 --> 00:40:37.319 In the shifted grid, the p value was actually higher at 0.3, 00:40:37.319 --> 00:40:39.510 which is really not significant. 00:40:39.510 --> 00:40:41.440 So that also kind of undermines it a little bit. 00:40:41.440 --> 00:40:44.869 But again, those exact numbers, I think, might change a little bit. 00:40:44.869 --> 00:40:48.130 So my conclusion from this is that, while there does seem to be 00:40:48.130 --> 00:40:50.789 a little bit more association than you’d expect by chance, 00:40:50.789 --> 00:40:55.109 implying there probably is some in the state, it’s not a big thing. 00:40:55.109 --> 00:40:57.729 That’s the overall result. 00:40:58.680 --> 00:41:00.619 So limitations – I only considered 00:41:00.619 --> 00:41:04.309 temporal correlation on a one-year period. 00:41:04.309 --> 00:41:07.800 So if there’s temporal correlations in seismicity over multiple-year periods, 00:41:07.800 --> 00:41:10.210 that’s not in the statistical model. 00:41:10.210 --> 00:41:12.359 And I’ve mentioned also, there could have been, perhaps, 00:41:12.359 --> 00:41:14.460 ways to deal with this grid block issue better 00:41:14.460 --> 00:41:16.599 since we really only looked at things within grid blocks 00:41:16.599 --> 00:41:21.089 and didn’t look at earthquakes that could have occurred in adjacent grid blocks. 00:41:21.089 --> 00:41:22.690 The opportunities here – first, this algorithm 00:41:22.690 --> 00:41:25.609 could be applied readily anywhere where this data is available. 00:41:25.609 --> 00:41:27.600 It’ll pick out immediately the places where it looks like 00:41:27.600 --> 00:41:28.750 there might be induced seismicity. 00:41:28.750 --> 00:41:30.349 Maybe we know about them. Maybe we don’t. 00:41:30.349 --> 00:41:33.089 They can just sift through the data really quickly. 00:41:33.089 --> 00:41:37.759 I think another really neat potential application is to look for risk factors, 00:41:37.759 --> 00:41:40.350 like maybe injection rate or depth matter. 00:41:40.350 --> 00:41:41.960 That was sort of the idea when I started this project, 00:41:41.960 --> 00:41:44.230 but I’ve kind of run out of time. 00:41:44.230 --> 00:41:45.960 But what you could do is you could take the statistical model, 00:41:45.960 --> 00:41:48.910 you could put in another term that’s parameterized by a new variable 00:41:48.910 --> 00:41:52.470 that is related to something like depth, and then you can do this likelihood 00:41:52.470 --> 00:41:56.200 ratio test to see if including that extra term in the – in the model 00:41:56.200 --> 00:41:59.480 improves the fit sufficiently that it’s statistically significant. 00:41:59.480 --> 00:42:01.970 You could apply the algorithm to other types of data, 00:42:01.970 --> 00:42:05.980 such as depletion or injection for secondary recovery. 00:42:05.980 --> 00:42:08.652 If you don’t like the statistical equation that I used to describe the number 00:42:08.652 --> 00:42:11.109 of earthquakes per year, you could just take it out and put in whatever 00:42:11.109 --> 00:42:15.520 equation you want to put in, and you could follow the same workflow. 00:42:16.760 --> 00:42:19.539 Yeah. If you wanted to apply the algorithm to a more localized data set – 00:42:19.539 --> 00:42:23.190 so the grid block thing was not as big of an issue because 00:42:23.190 --> 00:42:27.519 we used really big grid blocks – 10 by – you know, 11 by 14 miles. 00:42:27.519 --> 00:42:29.830 If you wanted to use it on a smaller data set, you’d have to 00:42:29.830 --> 00:42:33.560 do more work to think about relationships between blocks. 00:42:34.940 --> 00:42:38.859 Okay. So in conclusion, extremely high statistical significance 00:42:38.859 --> 00:42:40.640 of induced seismicity in Oklahoma. 00:42:40.640 --> 00:42:44.549 In California, it does seem probable that there has been some seismicity 00:42:44.549 --> 00:42:47.349 induced by wastewater disposal or other associated activities 00:42:47.349 --> 00:42:49.540 somewhere in the state. 00:42:49.540 --> 00:42:52.619 Evidence is not strong and not strong enough to be completely conclusive. 00:42:52.619 --> 00:42:56.150 We have identified a few places of interest that would be worth more 00:42:56.150 --> 00:43:00.270 careful study, one of which was Coalinga, which we already knew about. 00:43:00.270 --> 00:43:04.700 The overall study design overcomes a lot of challenging difficulties, 00:43:04.700 --> 00:43:08.500 including the nonrandom location of earthquakes in time and space. 00:43:08.500 --> 00:43:10.940 The difficulty in getting confidence intervals. 00:43:10.940 --> 00:43:14.710 And so using a longitudinal study design, using a likelihood ratio test, 00:43:14.710 --> 00:43:16.510 bootstrapping confidence intervals for that, have been the way 00:43:16.510 --> 00:43:19.240 that we’ve dealt with that. 00:43:19.240 --> 00:43:22.720 And most of this research has not been funded, but the Southern California 00:43:22.720 --> 00:43:25.900 Earthquake Center did provide a small seed grant that helped out a little bit. 00:43:25.900 --> 00:43:28.539 And also, I’ve gotten some really helpful comments from Jef Caers 00:43:28.539 --> 00:43:31.880 and Norm Sleep when I previously presented this earlier this year. 00:43:31.880 --> 00:43:34.200 So thanks very much, guys. 00:43:34.200 --> 00:43:39.400 [ Applause ] 00:43:41.680 --> 00:43:44.880 - All right. Do we have any questions? 00:43:47.120 --> 00:43:49.720 Make sure this is on. [laughs] 00:43:51.440 --> 00:43:54.000 - Hey there, Mark. Thanks for the presentation. 00:43:54.002 --> 00:43:58.049 I really appreciate the rigor you went through to be very – 00:43:58.049 --> 00:44:00.800 you know, to get at this question. 00:44:00.800 --> 00:44:04.030 Can I – I want to play devil’s advocate. - Mm-hmm. 00:44:04.030 --> 00:44:09.000 - Can you effectively get any answer you want by changing the size of the grids? 00:44:10.100 --> 00:44:13.920 - Well, I think that the result should be insensitive to the size of the grid. 00:44:13.930 --> 00:44:16.619 So I do think that if we shrunk the grid, 00:44:16.619 --> 00:44:19.380 at some point, that would start to affect the result. 00:44:19.380 --> 00:44:22.950 So the grid needs to be big enough that it doesn’t have a big impact. 00:44:22.950 --> 00:44:26.140 And that’s why we shifted the grid. So by shifting the grid, 00:44:26.140 --> 00:44:29.519 and we got kind of the same answer, that gave us a lot more confidence. 00:44:29.519 --> 00:44:31.200 Not only was the overall p value the same, 00:44:31.200 --> 00:44:33.799 but the locations on the map were the same. 00:44:33.799 --> 00:44:37.769 Gave us a lot of confidence that the grid wasn’t having a big impact. 00:44:37.769 --> 00:44:41.619 But I do think if you had a – and that’s just because it’s 11 miles by 14 miles. 00:44:41.619 --> 00:44:44.960 So it’s less likely that injection is going to cause an earthquake 00:44:44.960 --> 00:44:47.350 in an adjacent block, although it definitely could. 00:44:47.350 --> 00:44:50.309 If we used really small blocks, and that would become likely, 00:44:50.309 --> 00:44:53.420 you’d – I think – what I think the best solution would be would just be 00:44:53.420 --> 00:44:55.759 to distribute the injection across, not just one block, 00:44:55.759 --> 00:44:59.580 but kind of surrounding blocks kind of proportionally, something like that. 00:44:59.580 --> 00:45:03.600 But, no, I don’t think it’s ultrasensitive to the grid, and we checked for that. 00:45:03.600 --> 00:45:05.780 - Okay. - Thanks. 00:45:07.240 --> 00:45:13.860 [ Silence ] 00:45:14.700 --> 00:45:18.960 - I had two questions. First was about sort of the statistical 00:45:18.970 --> 00:45:20.900 problem that you’re showing with the cumulative density functions. 00:45:20.900 --> 00:45:23.540 I was just curious what was happening there. 00:45:23.540 --> 00:45:27.520 But the more relevant question is, is – I’m a little 00:45:27.520 --> 00:45:32.270 worried about the 7-kilometer depth limit that you’re using. 00:45:32.270 --> 00:45:34.720 Certainly we’ve seen induced earthquakes in Oklahoma 00:45:34.720 --> 00:45:37.680 at 10 or 12 kilometers, and, you know, natural seismicity 00:45:37.680 --> 00:45:39.690 here in California occurs much deeper. 00:45:39.690 --> 00:45:42.529 And so it seems – you know, obviously you’re going to 00:45:42.529 --> 00:45:46.650 have to make choices, but – and this is computationally intensive, 00:45:46.650 --> 00:45:48.849 but I would wonder what would happen if you put it at 00:45:48.849 --> 00:45:51.389 7 kilometers, 10 kilometers, 15 kilometers. 00:45:51.389 --> 00:45:53.080 Would that change the result? 00:45:53.090 --> 00:45:56.819 Because I think depth is an important issue here. 00:45:57.540 --> 00:45:59.020 - Yeah. I don’t know. Because we didn’t test it. 00:45:59.029 --> 00:46:01.549 - [laughs] - And that’s my point is I feel like 00:46:01.549 --> 00:46:04.319 I could just go on and on with this and do so much and not – 00:46:04.319 --> 00:46:07.240 I’m actually done working on this, so I’m presenting the method, 00:46:07.240 --> 00:46:11.300 and anybody can follow it and test anything you want. [laughs] 00:46:11.900 --> 00:46:13.920 - Could you explain what was happening with the cumulative 00:46:13.930 --> 00:46:15.630 density function also? - Yeah. 00:46:15.630 --> 00:46:18.400 So it’s – I mean, this one or these, or which one? 00:46:18.400 --> 00:46:19.789 - Yeah. The next one. - This one? 00:46:19.789 --> 00:46:21.430 - Yeah. - Oh, you mean this thing up here? 00:46:21.430 --> 00:46:23.599 - Just the spike and the flattening out. 00:46:23.599 --> 00:46:27.200 - Yeah, the flattening out, sure. So what happens is, the really – 00:46:27.200 --> 00:46:29.780 okay, so we’re putting a constraint on beta. 00:46:29.780 --> 00:46:32.130 The beta is what relates injection to seismicity. 00:46:32.130 --> 00:46:33.619 We’re saying it can’t be negative. 00:46:33.619 --> 00:46:36.349 So when we did the optimization, we said – you’re not allowed 00:46:36.349 --> 00:46:38.019 to say that beta is negative. Because if beta was negative, 00:46:38.019 --> 00:46:40.329 it would mean that injection prevents earthquakes. 00:46:40.329 --> 00:46:44.680 So the problem there is, if you have data for which the null hypothesis is true, 00:46:44.680 --> 00:46:47.780 and there’s just – it’s random, then half of the betas that would be 00:46:47.780 --> 00:46:51.200 inferred from an unconstrained optimization would be negative. 00:46:51.200 --> 00:46:56.050 And so this flattening out represents the area where it’s trying to 00:46:56.050 --> 00:46:59.230 get a negative beta, and it’s not allowed to get a negative beta. 00:46:59.230 --> 00:47:01.070 And so actually what happens is, 00:47:01.070 --> 00:47:02.819 we don’t really have an ability to get a p value. 00:47:02.819 --> 00:47:06.160 It’s some p value between 0.5 and 1, roughly. 00:47:06.160 --> 00:47:09.580 But we don’t exactly know what it is. And there’s – it’s not – there’s not 00:47:09.580 --> 00:47:12.940 anything coming out of the model that would allow us to go between 0.5 and 1. 00:47:12.950 --> 00:47:16.819 So in the absence of any other way to make a kind of a principle decision, 00:47:16.819 --> 00:47:21.939 we just took the average, and we used 0.75. [laughs] Yeah. 00:47:28.880 --> 00:47:31.700 - That looks very encouraging. 00:47:31.700 --> 00:47:37.960 I was wondering how locally you can apply this analysis. 00:47:37.960 --> 00:47:42.990 I mean, often, after an important earthquake sequence that people think 00:47:42.990 --> 00:47:45.999 might be related to, say, wastewater disposal, 00:47:45.999 --> 00:47:50.190 there’s a lot of debate as to whether the earthquakes were natural or not. 00:47:50.190 --> 00:47:55.390 And the big example of that are the Prague, Oklahoma, earthquakes. 00:47:55.390 --> 00:47:58.720 I was wondering if your analysis could bear on that. 00:47:58.720 --> 00:48:01.740 I know you were talking about the Azle earthquakes earlier. 00:48:01.740 --> 00:48:04.039 - Yeah. - And I’m not sure what your 00:48:04.039 --> 00:48:09.419 conclusion was. I’m not sure I believe the railroad commission, either. [laughs] 00:48:09.420 --> 00:48:11.359 But in any case, in Prague, yeah. - I don’t – I don’t think so, either. 00:48:11.359 --> 00:48:13.029 But I was trying to be agnostic about it. [laughs] 00:48:13.029 --> 00:48:14.820 - Right, yeah. - Or, rather I was – rather, mostly 00:48:14.820 --> 00:48:17.039 persuaded by their paper, but anyway, I’m not – 00:48:17.039 --> 00:48:19.119 I haven’t looked at it in detail, so I don’t know. 00:48:19.120 --> 00:48:21.260 - Okay. - Yeah. 00:48:21.260 --> 00:48:25.150 Yeah. I think you could absolutely apply this on more localized data sets. 00:48:25.150 --> 00:48:28.800 I think you’d need to modify it. If you have smaller blocks, 00:48:28.800 --> 00:48:30.690 it would be critical to account for the fact that injection 00:48:30.690 --> 00:48:32.559 could cause seismicity somewhere else. 00:48:32.559 --> 00:48:34.749 It would also be critical to account for the location uncertainty 00:48:34.749 --> 00:48:36.430 of the – of the earthquakes. 00:48:36.430 --> 00:48:39.290 And so you’d have to do something to deal with that. 00:48:39.290 --> 00:48:42.390 The simplest approach would be to kind of proportionally distribute it out. 00:48:42.390 --> 00:48:46.230 So if you have an earthquake, and the hypocenter is here, 00:48:46.230 --> 00:48:48.329 but we know that there’s this kind of radius of uncertainty, 00:48:48.329 --> 00:48:50.619 but maybe we could kind of proportionally distribute out 00:48:50.619 --> 00:48:53.420 the earthquake in the blocks around it. 00:48:53.420 --> 00:48:56.430 You know, so put kind of 20% of the earthquake in this block 00:48:56.430 --> 00:48:58.299 and 20% in this block, and so on. 00:48:58.299 --> 00:49:00.040 Or maybe you could use some other approach. 00:49:00.040 --> 00:49:05.059 And Rajesh had some ideas that were more kind of rigorously statistical. 00:49:05.059 --> 00:49:07.120 So I think it would be possible. I think you’d have to 00:49:07.120 --> 00:49:09.620 do a little bit of work. - Okay, thank you. 00:49:09.620 --> 00:49:10.640 - Thanks. 00:49:16.119 --> 00:49:18.960 - I want to go back to one of the questions you asked at the beginning. 00:49:18.960 --> 00:49:22.220 Why is this so complicated? [laughter] 00:49:22.220 --> 00:49:25.240 - Sure, yeah. Let’s see. 00:49:28.529 --> 00:49:30.310 I think one reason it’s complicated is because 00:49:30.310 --> 00:49:32.040 there’s a lot of injection wells and a lot of earthquakes. 00:49:32.040 --> 00:49:35.539 So you have to be really careful about getting statistical confidence 00:49:35.539 --> 00:49:37.990 in making sure that, when you’re calculating a p value, 00:49:37.990 --> 00:49:40.690 that means you’re calculating to some – against some – 00:49:40.690 --> 00:49:43.390 you’re comparing against some null hypothesis. 00:49:43.390 --> 00:49:45.609 This is – this is what the data should look like 00:49:45.609 --> 00:49:47.519 if there’s no induced seismicity. 00:49:47.519 --> 00:49:51.609 You have to be really careful about defining the null hypothesis 00:49:51.609 --> 00:49:54.329 to make sure that you’re getting a reasonable p value, number one. 00:49:54.329 --> 00:49:57.789 Number two, it’s very challenging because the events – 00:49:57.789 --> 00:50:01.279 and injection is not located randomly in space. 00:50:01.279 --> 00:50:06.440 So you can’t just tie direct connections between injection and seismicity 00:50:06.440 --> 00:50:08.690 because there can be these underlying causal processes 00:50:08.690 --> 00:50:10.859 that are causing things to cluster together. 00:50:10.859 --> 00:50:13.259 And when they cluster together, you have correlated observations, 00:50:13.259 --> 00:50:15.160 and that affects statistical confidence. 00:50:15.160 --> 00:50:17.720 For example, if you have – if you have two injection wells 00:50:17.720 --> 00:50:22.299 that are exactly in the same location, then they have a 100% probability 00:50:22.299 --> 00:50:25.789 of having the same association with seismicity, for example. 00:50:25.789 --> 00:50:28.070 And so that clustering and the possibility that you could have 00:50:28.070 --> 00:50:31.710 an underlying causal process that kind of puts things in the same place or pushes 00:50:31.710 --> 00:50:37.180 them apart without a causal relationship, all those things make it pretty tricky. 00:50:37.180 --> 00:50:40.400 And – yeah, yeah. 00:50:40.400 --> 00:50:43.150 And we did quite a bit of iterating through different things 00:50:43.150 --> 00:50:45.339 over the last three years. [laughs] 00:50:45.339 --> 00:50:47.170 And this is kind of what we’ve converged to finally 00:50:47.170 --> 00:50:50.050 that we think is the best we could come up with. 00:50:56.900 --> 00:51:02.920 - Following up on Art’s question of trying to apply this to a smaller region, 00:51:02.930 --> 00:51:09.269 can you somehow marry the statistics with the physics that you’ve developed 00:51:09.269 --> 00:51:14.269 through your previous work to approach it that way so that, you know, 00:51:14.269 --> 00:51:18.499 rather than just spreading out the earthquakes and injection into 00:51:18.499 --> 00:51:25.519 neighboring blocks, somehow combine the physics in there? 00:51:25.519 --> 00:51:28.230 - Yeah, absolutely. So if you have an idea about 00:51:28.230 --> 00:51:32.349 how fluid pressure might be spreading out into a formation during injection, 00:51:32.349 --> 00:51:35.770 that could help inform you. Kind of, okay, we’re injecting here, 00:51:35.770 --> 00:51:39.000 but yeah, we could relate that to seismicity better. 00:51:39.000 --> 00:51:40.779 So for example, maybe you could use a model that, 00:51:40.779 --> 00:51:44.369 instead of using cumulative volume, uses delta-p. 00:51:44.369 --> 00:51:47.859 And then you have some physical model that describes how delta-p should be 00:51:47.859 --> 00:51:52.250 changing spatially and temporally, and you could do something like that. 00:51:52.250 --> 00:51:55.520 So, yeah, I think you could absolutely pull in some physics. 00:51:55.520 --> 00:51:58.091 But, yeah, generally this is really different from my previous work. 00:51:58.091 --> 00:52:02.471 Previous work was totally mechanistic, and this is totally statistical. [laughs] 00:52:02.860 --> 00:52:04.300 Yeah. 00:52:06.140 --> 00:52:07.140 Yeah. 00:52:09.160 --> 00:52:11.120 - Are there any other questions? 00:52:14.160 --> 00:52:17.040 All right. Well, let’s thank Mark one more time and we’re … 00:52:17.050 --> 00:52:19.150 - Thanks, guys. - … going to go to lunch, 00:52:19.150 --> 00:52:23.070 probably at the patio, meeting downstairs in about 10 minutes. 00:52:23.640 --> 00:52:28.920 [ Applause ] 00:52:29.980 --> 00:52:53.840 [ Silence ]