k, the ANOVA decomposition terms tend to 0. 0 3.6. Bias of the Jackknife Variance Estimator. Let (3.24) then, based on the correspondence between the ANOVA decomposition and the Hoeffding decomposition proved in the previous section, we have (x) v [A1] v [Gr'(: = )h,,] = GfG)2J[, 38 and more generally, for 1 :::; m :::; k. Based on the ANOVA expansion of e in equation 3.9, the expected value of the jackknife variance estimation of the U-statistic of interest for sample size n -1 can be expressed as [-------£T1 (n -2) £T2 (n -2) £T3 E VAR(Un-1)J = n 1 + 1 (n 1)2 + 2 (n -1)5 +... (3.26) Thus, the jackknife variance estimate has expected value (3.27) (n 2 n(n 1)4 n Var(A3) + _ Substitute the variance of the ANOVA-type terms with the Hoeffding-type terms using the relationship in 3.25, we have ( ) -2( nk --2 2 )2 -2 2 62 3 --n VARJ(Un)]=n (k) n 22 n (k) 2 () 2 + + ( ( ···· 2 3 (3.28) Since the true variance of Un can be expressed as k 21 - V [Un] = ?= (k.) bJ (n.) 7 Equation (3.20), the bias of the jackknife variance estimator is BiaSJack = E [YARJ(Un)J -V [Un] (n)-2(n-1)2 (n-2) n3 (n)-2(n-2)2 = n k k-1 621 + 1 (n-1)2 k k-2 62 (n-) n ( )n -2(n-)2 5 (3.29) 2 3 2 + 2 (n-1)4 k k-3 63+ · · · -k (k)2(n. )-1 Jj .2 . J 3.7. Bias Comparison between Jackknife Variance Estimator and Linearly Ex- trapolated Variance Estimators. Table 3 compares the decomposition terms of the jackknife variance estimator, the linearly extrapolated variance estimators, and the true variance of a U-statistic Un. We observe that i. ii. The second-order bias is in the order of 1/n . All the methods are biased upwards. The delete-one jackknife has the least bias among all. v. Among the proposed linearly extrapolated variance estimator, the lowest bias is achieved by taking subsamples of size n/2. Although the delete-one jackknife has the lowest bias among all, the proposed construction of a general class of linearly extrapolated variance estimator is of practical interests particularly in situations such as cross-validation and half sampling when disjoint subsamples are designed to study the statistic of interest and subsample size m < n/2. TABLE 3. Exact Bias of Linearly Extrapolated Variance Estimators for U-statistics J2 J2 J2 1 2 3 bJ? kz n 2n(n-1) k2(k-1)2(k-2)2 (k!f(n-j) 6n(n-1)(n-2) ! True Parameter kz n(n-2)k2(k-1)2 n3(n-3)k2(k-1f(k-2)2 Delete-One Jackknife n (n-1)4 (n-1)2i(n-2)!(j-1) !(k-j) ! Extrapolated w I nl2 kz n n(n-2) 2k2(k-1)2(k-2)2 (k!f(n/2-j) (k-j)!]2j!(n/2) ! 2 [ ! kz k2(k-1)2 k2(k-1)2(k-2)2 (k!)2(m-j)! .... Extrapolated w I m n 2n(m-1) 6n(m-1)(m-2) n[(k-j)!]2j!(m-1)! 3.8. Empirical Comparisons of Linearly Extrapolated Methods. We conducted simulations to evaluate the performance of the delete-one jackknife, the linearly extrpolated method with half samples, and the linearly extrapolated method with quarter samples on normal distribution, t-distribution, bimodal distribution, and chi-squared distribution. For the newly proposed linearly extrapolated methods, we take 1000 subsamples from the original data set to generate the estimator. The simulation results (as summarized in Table 4 and Figure 6, 7, 8, and 9)show that: i. The bias of the linearly extrapolated methods decreases as the sample size increases. ii. Although the linearly extrapolated variance estimators have larger bias compared to the jackknife variance estimator (in 2nd order term), we observe very similar or sometimes smaller MSE values in the simulations. Our numerical study thus indicates that some of the linearly extrapolated variance estimators might have superior variance property. Further theoretical verification is required to confirm this conjecture. iii. The linearly extrapolated estimator with half samples seems competitive to jackknife when we consider the trade-off between bias and variance. Distribution size Population Jackknife LinExtp. n/2 (0.32) 1.04 1.47 10 1.15 (0.48) (0.39) (0.49) Bimodal 0.64 0.60 30 0.62 (0.14) (0.13) (0.13) 50 0.48 0.49 (0.08) 0.47 (0.08) 0.48 (0.08) 100 0.33 0.34 (0.04) 0.33 (0.04) 0.33 (0.04) 10 3.62 2.43 2.01 2.21 (2.98) (2.46) 1.62 1.52 1.52 30 (1.31) (1.23) (1.20) X2df=2 1.35 1.30 1.30 50 1.60 (0.90) (0.86) (0.85) 1.01 0.99 0.98 100 1.13 (0.51) (0.50) (0.50) Note: this table shows the mean of the parameter estimates in each case, and shows the standard deviation of the estimates in the parentheses below each reported mean. 43 Number of Simulations = 2000 Number of Subsamples for Linearly Extrapolated Estimators 1000 = Parameter of Interest (8) sd[Var[i',]] = FIGURE 6. MSE of Linearly Extrapolated Variance Estimators on Standard Normal Distribution \ 0.100 0.075"' \ \ (/) \ 0.050 ., \ \ 0.025 1 30 50 100 size · method -•Jackknifei .•Half Sample Note: this figure shows the mean squared error of using each of the resampling methods to estimate the standard error of sample variance based on samples of size 10, 30, 50, or 100 from the standard normal distribution. Method "Half Sample" means linearly extrapolated variance estimator based on subsamples of size Ln /2 J, and method "Quarter Sample" means linearly extrapolated variance estimator based on subsamples of size Ln j4J. Number of Simulations = 2000 Number of Subsamples for Linearly Extrapolated Estimators = 1000 Parameter of Interest (B) sd[Var[Pnll = FIGURE 7. MSE of Linearly Extrapolated Variance Estimators on t-Distribution 1.6 1.2 w en 0.8 0.4 Note: this figure shows the mean squared error of using each of the resampling methods to estimate the standard error of sample variance based on samples of size 10, 30, 50, or 100 from the standard normal distribution. Method "Half Sample" means linearly extrapolated variance estimator based on subsamples of size ln /2 J, and method "Quarter Sample" means linearly extrapolated variance estimator based on subsamples of size l n /4 J. Number of Simulations = 2000 Number of Subsamples for Linearly Extrapolated Estimators = 1000 Parameter of Interest (8) = sd[Var[PnJJ FIGURE 8. MSE of Linearly Extrapolated Variance Estimators on Bimodal Distribution Note: this figure shows the mean squared error of using each of the resampling methods to estimate the standard error of sample variance based on samples of size 10, 30, 50, or 100 from the standard normal distribution. Method "Half Sample" means linearly extrapolated variance estimator based on subsamples of size ln/2J , and method "Quarter Sample" means linearly extrapolated variance estimator based on subsamples of size l n /4 J. Nmnber of Simulations = 2000 Number of Subsamples for Linearly Extrapolated Estimators = 1000 Parameter of Interest (8) sd[Var[i'nll = FIGURE 9. MSE of Linearly Extrapolated Variance Estimators on Chi-Squared Distribution 3 10 30 1 50size method Samplei..: ·:Quarter Sample Note: this figure shows the mean squared error of using each of the resampling methods to estimate the standard error of sample variance based on samples of size 10, 30, 50, or 100 from the standard normal distribution. Method "Half Sample" means linearly extrapolated variance estimator based on subsamples of size ln/2J, and method "Quarter Sample" means linearly extrapolated variance estimator based on subsamples of size ln /4 J. Number of Simulations = 2000 Number of Subsamples for Linearly Extrapolated Estimators= 1000 Parameter of Interest (ll) = sd[Var[fn]J 3.9. Summary. In this section, we proposed a general class of linearly extrapolated variance estimators as an extension of the delete-one jackknife variance estimator. This class of estimators estimate the sampling variance of a statistic computed based on samples of size n by first estimating the sampling variance of the statistic for samples of size m ::=:;n/2 and then extrapolating this estimate from sample size m estimator by proving their exact bias when estimating the variance of a general U-statistic and compare its bias with the theoretical bias of the jackknife variance extrapolated variance estimators are first-order unbiased and second-order biased in the order of 1fn2. Within the class of linearly extrapolated variance estimators, the estimator based on statistics computed for subsamples of size n/2 has the smallest theoretical bias, and the bias increases as the subsample size decreases. We also conducted simulations comparing several representatives of the newly proposed class of estimators with the jackknife variance estimator in estimating the standard error of sample variance. Our empirical study indicates that, despite having greater bias on average, the newly constructed class of linearly extrapolated variance estimators might have superior variance property. Further work is required to provide theoretical backup for this observation. 4. RESAMPLING FOR DEPENDENT DATA In the previous sections, we have been focusing on resampling methods for i.i.d. data. For the rest of this thesis, we will consider the case when the observations in the original sample are not independent. This situation often occurs in real-life problems, especially in time series analysis. Resampling from dependent data is more complicated because the population is not characterized entirely by the one-dimensional marginal distribution F alone, but requires the knowledge of the joint distribution of the whole sequence X1, X2, · · ·, Xn. presence of dependency. ping techniques for dependent data. A prime example appears in the seminal paper by Singh (1981). Suppose X1, X2, · · · is a sequence of m-dependent random variables with E[X1] = 11 and E[XI] < oo. Recall that {Xn}n;:::l is called m-dependent for some integer m 2: 0 if (X1, X2, · · · ,Xk) and (Xk+m+l' · · ·) are independent for all k 2: 1 (i.e. an i.i.d. sequence of random variables { €n}n;:::l is 0-dependent). Let m-l = V[Xl] + 2 L Cov(Xl, xl+i) m 2: 2 (4.1) i=l and n- (4.2) Xn =n If CT E (0, oo), then by the CLT for m-dependent variables, where denotes convergence in distribution. Suppose that we estimate the sampling distribution of the random variable e = ..fii(X n -JJ) using the i.i.d. bootstrap. For simplicity, assume that the resample size equals the sample size, and equal number of bootstrap variables Xi,X2_, · · ·,X are generated. Then, the bootstrap version T of Tn is given by (4.4) n where x: = n -I I:: x;. The conditional distribution of T under the i.i.d. boot- i=l strap method still converges to a normal distribution, but with a wrong variance. The mean squared error in this case tends to a nonzero number in the limit and the bootstrap estimator of P(Tn :::; x) is not consistent. Thus, we need a resampling method that keeps the covariance structure of the data sample. One method is to fit a parametric model and then sample from the residuals. Yet this requires that we assume an underlying covariance structure for any parametric assumptions. 4.2. Block Bootstrap. Ki.insch (1989) and Liu and Singh (1992) independently formulated sampling scheme, called the moving block bootstrap (MBB), that is applicable to dependent data. Instead of drawing one single observation at a time as in i.i.d. bootstrap, the MBB method draws one block of (consecutive) observations at a time. As a result, an MBB sample better resembles the original data set since the dependence structure of the original observations is preserved within each block. 4.2.1. Moving Block Bootstrap. Let Xn = (X1, X2, · · ·, Xn) denote a set of observations from a sequence of stationary random variables. We define the MBB version of estimators of the form {J = ecFn), where Fn denotes the empirical distribution function of X}, x2, ... 'Xn. Let It can be shown that when the data are generated by a weakly dependent process, the MBB reproduces the underlying dependence structure of the process asymptotically. 4.2.2. Nonoverlapping Block Bootstrap. Carlestein (1986) defines a block bootstrap method which uses nonoverlapping segments of the data to define the blocks (NBB). Let I = In E [1, n] be an integer FIGURE 11. Nonoverlapping Block Bootstrap (NBB) Bi+1 -----+1 and b 2:: 1 be the largest integer such that lb ::; n. Then, define the blocks Bi = (X(i-1)1+1t ... I X-il), l = 1, .. . I b. (4.8) Note that in NBB, if the choice of block length l does not divide the sample size n, then the last n-bl observations in the original sample would not be in the resampled data. Other than the blocking rules, the rest of the implementation of the NBB is exactly the same as that for the MBB. 4.2.3. Circular Block Bootstrap. Both MBB and NBB resampling schemes suffer from an undesirable boundary effect. The MBB assigns less weights to the observations toward the beginning and the end of the data set than to the middle part (Lahiri, 2003). A similar problem exists under the NBB. As mentioned in 4.2.2, when n is not a multiple of the block length l, the last n-bl observations can never appear in the bootstrap sample, where b stands for the largest integer for which bl ::; n. In addition, the NBB does not guarantee that the bootstrap samples have the same number of observations as in the original sample. The circular block bootstrap (CBB) introduced in Politis and Romano (1992, 1994) is a simple way out of this boundary problem. The idea is to wrap the data around a circle and form additional blocks using the "circularly defined" observations. Given the variables Xn = {X1, · · · , Xn}, first we define a new time series Yn,i' i 2:: 1 by periodic extension. Note that for any i 2:: 1, there are integers ki 2:: 0 and 4.2.4. Blocks of Blocks Bootstrap. The block bootstrap estimators considered above work well for many commonly used estimators such as the sample mean. However, they are not sufficiently rich for applications in the time series context. This is primaily because the formulation of en discussed above depends only on the one-dimensional marginal empirical distribution, and hence does not cover standard statistics like the sample lag correlations, or the spectral density estimators (Lahiri, 2003). A more general version of the MBB that covers such statistics is the blocks of blocks bootstrap method. Given the observations Xn, let Fp,n denote the p-dimensional empirical measure n-p+1 Fp,n = (n-p + (4.10) 1)-1 j=1 L Jyi ' where Y;· = (Xj, · · ·, Xj+p-1) and for any y E lR.P, by denotes the probability measure on JRP putting unit mass on y. Kiinsch (1989) extends the original moving block bootstrap and formulates the blocks of blocks bootstrap, which, in stead of resampling blocks of X-values, resamples blocks of Y-values, each of which is a block of X-values. Note that the blocks of blocks techinique and be viewed as an extra layer that can be added on top of to any of the aforementioned block bootstrap methods, MBB, NBB, or CBB. For example, a blocks of blocks bootstrap method implemented on top of NBB would resample blocks of data blocks defined with the NBB blocking rules. 4.2.5. The Choice of the Block Size. Hall et al. (1995) show that the optimal block size depends significantly on context, and they advise setting the block lengths to be n113 for variance or bias estimation, n114 for the estimation of a one-sided distribution function, and n115 for the estimation of a two-sided distribution function, where n is the length of the dependent series. In Section 4.3, we will show that our empirical study results confirm this theoretical optimal choice of block length for variance estimation. 4.3. Numerical Study of Block Bootstrap Methods. We conducted simulation to estimate the autocovariance of an ARMA(3,4) process using the aforementioned block bootstrap designs as well as the i.i.d. bootstrap method. Let {XoihEz be an ARMA(3, 4) process specified by Xoi -0.4Xo(i-1) -0.2Xo(i-Z) -0.1Xo(i-3) = Ei + 0.2€i-l + 0.3€i-z + 0.2€i_3 + 0.1€i_4, (4.11) where { € hEz is a sequence of i.i.d. N(O, 1) random variables. Let . i, k E Z. (4.12) be the autocovariance function of {XoihEZ· For 0 ::; u ::; n, an estimator of ry(k) · · ·, Xon) of size n is given by n-k n-k)-1 L XoiXO(i+k) -x(n-k)' (4.13) i=1 where n-k Xo(n-k) = (n-k)-1 L Xoi· (4.14) i=1 We simulate 1000 random samples of size n = 102 from the ARMA(3, 4) pro cess. For each simulated sample, we implement various resampling schemes to estimate the autocovariance of lag 3 (ry(3)) of the underlying population distribu tion. The true parameter value of ry(3) for an ARMA(3, 4) process is 2.14. Figure 13 show that the i.i.d. i.i.d. >. 0 c Q) ::J r:::r Q) L.. u. For the block bootstrap methods, we use the blocks-of-blocks technique and conduct simulations using MBB, NBB, and CBB blocking rules with different block lengths (1 = 4, 6, 8, 10, 15, and 20). 800 bootstrap samples are drawn for each estimation. The number of bootstrap samples required to obtain a good estimation in this case is much greater than the number of bootstrap sample required to make standard error estimations in Section 2.4 since the statistic in this situation is more convoluted. Table 5 summarizes the simulation results of the block bootstrap methods. We report for each method the average estimation from 1000 simulations using a given block length. We report in the parentheses below each of these average estimations the standard deviation of the estimations across 1000 simulations. Overall, we observe that: • very similar estimates in this simulation setup. • Longer block length results in smaller autocovariance estimates across all three methods. • All the results are reasonably close to the true value of the population parameter. The ones with shorter block lengths appear to be more accurate, but with slightly greater standard deviation. • Block length of 4 among all appears to provide the best estimations for all three methods, followed closely by the estimations with block length of 6. This empirical observation agrees with the theoretical optimal block length discussed in Section 4.2.5, which would be n113 = 1021/3 4.67 in this case. 14 also shows that the three block bootstrap methods, MBB, NBB, and CBB, have similar performances in every case in this set of simulations. The block bootstrap implementations with shorter block lengths such as 4 or 6 (close to the optimal block length choice proposed by Hall et al. (1995)) provide more accurate estimates, more specificaly, more accurate than the average plug-in estimates observed in the simulations. Block Length 4 6 8 15 2.095 2.059 2.027 2.003 1.969 1.953 MBB (0.931) (0.915) (0.907) (0.901) 2.102 2.056 2.032 2.027 1.968 1.995 NBB (0.937) (0.928) (0.894) 2.103 2.073 2.051 2.032 2.012 2.006 CBB (0.936) (0.920) (0.893) (0.868) Note: this table shows the mean of the parameter estimates in each case, and shows the standard deviation of the estimates in the parentheses below each reported mean. Mean Plug-In Estimation= 1.83 Standard Deviation of Plug-In Estimates = 0.930 Number of Simulations = 1000 Number of Bootstrap Samples (B) = 800 FIGURE 14. Comparison of Spatial Block Bootstrap Methods . • • • " • " # • * .. ! t • .. .. 0!1 I • -m • $ E I I w I I'll 1S 0Ill -u 0 ffi 2 -: I t I l4 6 8 to 15 20 Block Size (I) M&tltod NBB :--'Mean of Plug-In Estimates·-' True Parameter 4.4. Summary. In this section, we pointed out the inadequacy of the i.i.d. bootstrap for autocorrelative data, and we looked at resampling methods designed specifically for one-dimensional dependent data. We implemented the moving block bootstrap, nonoverlapping block bootstrap, and circular block bootstrap methods using the blocks of blocks technique proposed by Kiinsch (1989) to estimate the autocovariance of lag three of samples from the ARMA(3, 4) process. Our empirical results show that the three methods have similar performance in this simulation context, and, with optimal block length choice, all perform better than the plug-in estimator, let alone the i.i.d. bootstrap method. Our empirical study also confirms the optimal choice of block length proposed by Hall et al. (1995). 5. RESAMPLING FOR SPATIAL DATA Spatial data are data that are geographically referenced. They commonly appear in scientific research, such as research in environmental studies, geoscience, or marine science. Spatial data sets are very rarely i.i.d., which means that a simple i.i.d. bootstrap sample of the original data would usually not be representative of the population. Moreover, the understanding of the underlying covariance structure of a spatial process is often one of the most important objectives in spatial data analysis. Spatial models rely heavily on assumptions of the covariance ing relationship may be highly complex. Yet with careful design, we can extend the 1-dimensional block bootstrap approach and resample subsets of higher dimension for spatial resampling. In this section, we study some extensions of the 1-dimensional block bootstrap mechanisms for higher-dimensional data analysis and apply the spatial resampling techniques in the estimation of variogram parameters of both simulated and real spatial data. 5.1. Spatial Block Bootstrap on a Regular Grid. We first consider a spatial bootstrapping technique for a spatial process indexed by the integer grid z d. We thus assume that we have observations, and only have (or only going to use) observations that lie at the intersection points of an integer grid. Below we will start with the overlapping bootstrap method for spatial data (spatial MBB) on a regular grid. Let Rn denote the sampling region. Suppose that {Z(s)ls E .zd} is a stationary spatial process that is observed at finitely many locations Sn = {x1 , · · · , s Nn}, given by the part of the integer grid zd that lies inside Rn, i.e., (5.1) Zn = {Z(si), · · ·, Z(sn)} (5.2) is the set of observations on the regular grid. Let u = [0, 1)d be a unit cube in JRd, and let f3 be the block length (similar to l in the block bootstrap method for 1-dimensional data). We first partition the sampling region Rn using cubes of volume {3d. Let Kn = {k E zd if3(k + U) n Rn =/0} be the index set of all cubes of the form f3(k + U) that have nonempty intersections with the sampling region Rn. Let Rn(k) = Rn n {f3(k + U) } k E Kn . (5.5) Define (5.6) the index set of all cubes of volume {3d in Rn with starting points i E .zd, and thus {i + f3Uii E In } (5.7) represents the set of cubic subregions (blocks) that are overlapping and are con tained in Rn. For all i E In, we say that {Z(s) ls E zd n {i + f3Uii E In}} (5.8) is complete in the sense that the Z(-)-process is observed at every point of the integer grid in the subregion i + f3U. background grid represent the regular integer grid, and the solid background grid represent the partitioning of the original integer grid with f3 = 5. The thick-edged black block is an example of a resulting block from the partition of the sampling region (in the set Kn)The two blue blocks are in the index set In from which we draw bootstrap samples. Each example block, in set Kn or in the index set In, has two solid sides and two dashed sides. The observations on the two dashed sides of a block are not included in this block. For any set A c JRd, define Zn(A) = { Z(s)js E An Sn}· (5.9) So Zn(Rn) is the entire sample Zn = {Z(s), · · ·, Z(sN)}, and Zn(Rn(k)) is a subsample lying in the subregion Rn(k) for k E Kn. For each k E Kn, resample one block at random from the collection {i + f3Uii E In} independently. Let K denote the size of Kn, and let (5.10) be a collection of K i.i.d. random variables with common distribution P(h = i E ln. (5.11) We generate a spatial moving block bootstrap sample by selecting a random sample of size k with replacement from all overlapping cubes {i + f3nUii E In } and defining a version of the Z(-)-process on each subregion Rn(k) by considering all data-values that lie on a congruent part of the resampled cube. For k E Kn, define the spatial MBB replicate of Zn (Rn(k)) to be Zn * (Rn(k)) = Zn ( { h + f3U} n {Rn(k) -k(3 + h}) . (5.12) Note that with this definition, the resampled blocks have several properties: i. Rn(k) and {h + (3U} n {Rn(k) -k(3 + Ik} have the same shape; ii. the resampled observations retain the same sptial dependence structure as the original process Zn (Rn(k)) over the subregion Rn(k); iii. since the mapping of the shape is done with a translation by integer vectors, the number of resampled observations in Zn * (Rn(k)) for all k E Kn. With the last property, we are guaranteed that the bootstrap sample The nonocverlapping block bootstrap (spatial NBB) on a regular grid is very similar to the spatial MBB, the only difference is that the index set of all cubes of in the nonoverlapping case is defined to be Jn = {j E zdj(j + U) {3 C Rn}· (5.14) We then generate K i.i.d. random variables (5.15) with common distribution j E fn· (5.16) P(JI j) = = This figure illustrates the mechanism of the spatial NBB in two dimension. The dashed background grid represent the regular integer grid, and the solid background grid represent the partitioning of the original integer grid with {3 = 5. The thick-edged black block is an example of a resulting block from the partition of the sampling region (in the set Kn). The two blue blocks are in the index set Jn from which we draw bootstrap samples. Each example block, in set Kn or in the index set Jn , has two solid sides and two dashed sides. The observations on the two dashed sides of a block are not included in this block. For k E Kn, define the spatial NBB replicate of Zn (Rn(k)) to be Z (Rn(k)) = Zn ( {h + {3U} n {Rn(k)- kf3 + hf3}). 5.2. Isotropic Intrinsically Stationary Process and Its Variogram. In spatial data analysis context, the nnderlying covariance structure of the spatial process is often the most valuable information. There are various different functions used to represent the covariance structure depending on the associated spatial process, such as variogram and covariogram. In our study, we start with some of the examples explained in Lahiri (2003) and focus specifically on applying spatial block bootstrap in the variogram estimation for an isotropic intrinsic stationary process. Define a spatial process Z(· ) to be {Z(s)is E D}, (5.18) where Dis a fixed subset of d-dirnensional Euclidean space. The spatial process is said to be intrinsic stationary if JI(s) = J1 (constant) (5.19) and 2 E [Z(s +h)-Z(s)]= Var(Z(s +h)-Z(s)) = 2{(h), (5.20) where 'Y(h) is a function from JR2 to JR. For an intrinsically stationary process, 2r(h) is called the variogram, and 'Y(h) is called the semivariogram. When {(h) depends only on the magnitude and not the direction of the vector h, the spatial process is said to be isotropic. In scientific studies, we often see observations collected at locations geographically close to each other to share more properties than the ones collected at locations that are much farther from each other. Isotropic intrinsicly stationary process is thus a good candidate when one needs to make assumptions about the underlying structure of a spatial data set. For isotropic intrinsically stationary processes, the covariance stucture is fully controlled by the variogram function. Therefore, methods that can effectively estimate the parameters in a variogram function is of great practical interest. In our study, we focus on one example of isotropic variogram, which is the exponential variogram: {1/J + K(1 -e-iihiii

0 (5.21) {(h; 1/J, K, 0 satisfies that the number of points in S which belongs to R follows Poisson(A(R)) distribution. Conditional on the number of points in S which belongs to R, the points XR = {X1, · · ·, XN(R)} in the bounded region are i.i.d. and each uniformly distributed in the region R. For a homogeneous Poisson process, the intensity function, Poisson process is a model for complete spatial randomness since XA and Xa are independent for all A, B E S that are disjoint. Figure 18 shows examples of Poisson point processes. for three irregular Gaussian processes with the same three exponential variograms used in the simulation studies in Section 5.3. In each case, we draw three different samples of Gaussian processes with the given exponential variogram using the Poisson point process, resample B = 200 bootstrap samples from the original data set using spatial MBB or spatial NBB with block size f3 = 5 or 8. Again, B = 200 is rather a small number of bootstrap replications in this situation. We then fit a variogram for the 200 bootstrap samples using two methods implemented in the R package geoR, LS and WLS. We repeat this process 100 times in the study. As explained in Section 5.3, the maximum likelihood method is extremely timeconsuming, and it only seemed to work well when tp, the parameter for nugget effect, is 0 or close to 0. We thus do not implement the maximum likelihood in this part of the study. We again observe some extreme values in the resulting estimations, and the mean estimation reported in Table 6, 7, and 8 for each case is again obtained by taking the average of all the estimates generated after removing the values above or below three times the interquartile range away from the original mean. simulation results for spatial bootstrapping on the irregular grid show that • Different from the regular grid cases, we do observe some of the mean WLS estimations greatly affected by outliers in the bootstrap sample statistics. It is reasonable since sptial bootstrap for data on an irregular grid is an even more complicated process than the process for data on a regular grid. We do not have the same number of observations in the bootstrap samples as • The simple variogram fitting seems to have more variation in the performance and relies heavily on the correct capture of the nugget effect in order to produce reasonable estimation. TABLE 9. Variogram Parameter Estimation with Spatial Bootstrap Methods on Irregularly Spaced Sample (B (0, 2, 1)) = 1/J LS K cp WLS Npairs 1/J K cp Simple 0.00 2.04 0.95 0.00 1.97 0.84 MBB ({3 = 5) 0.53 1.66 1.27 0.49 1.64 1.27 Sample 1 NBB ({3 = 5) 0.72 167.61 15947.65 0.52 1.73 0.93 MBB ({3 = 8) 0.51 1.70 1.41 0.44 1.69 1.08 NBB ({3 = 8) 0.00 1.92 0.89 0.00 1.95 0.84 Simple 1.39 1.06 3.59 1.67 0.93 7.26 MBB ({3 = 5) 0.66 1.76 2.51 0.62 1.77 1.54 Sample 2 NBB ({3 = 5) 0.63 2.18 1.39 0.11 2.23 0.88 MBB ({3 = 8) 0.86 161.34 15594.26 0.87 1.48 1.78 NBB ({3 = 8) 0.69 1.30 0.85 0.96 1.14 1.33 Simple 0.00 1.75 0.66 0.00 1.84 0.77 MBB ({3 = 5) 0.61 1.51 1.69 0.56 1.46 1.16 Sample 3 NBB ({3 = 5) 0.60 1.33 1.56 0.63 1.26 1.43 MBB ({3 = 8) 0.64 1.53 1.69 0.61 1.42 1.16 NBB ({3 = 8) 0.00 1.79 0.77 0.00 1.84 0.82 Note: this table shows the truncated mean of the parameter estimates in each case (truncate at three times interquartile range from the original mean). Number of Simulations = 100 Number of Bootstrap Replications (B) 200 = TABLE 10. Variogram Parameter Estimation with Spatial Bootstrap Methods on Irregularly Spaced Sample(IJ = (1, 1, 1)) LS WLS Npairs tP K