Documentation
Mondrian trees
MondrianForests.MondrianTree — TypeA Mondrian tree is determined by:
id: a string to identify the treelambda: the non-negative lifetime parameterlower: the lower coordinate of the root cellupper: the upper coordinate of the root cellcreation_time: the time when the root cell was created during samplingis_split: whether the root cell is splitsplit_axis: the direction in which the root cell is split, if anysplit_location: the location onsplit_axisat which the root cell is split, if anytree_left: the left child tree of the root cell, if anytree_right: the right child tree of the root cell, if any
MondrianForests.MondrianTree — MethodMondrianTree(d::Int, lambda::Float64)Sample a Mondrian tree $\mathcal{M}([0,1]^d, \lambda)$.
Examples
d = 2
lambda = 3.0
tree = MondrianTree(d, lambda);MondrianForests.MondrianTree — MethodMondrianTree(id::String, lambda::Float64, lower::NTuple{d,Float64},
upper::NTuple{d,Float64}, creation_time::Float64) where {d}Sample a Mondrian tree with a given id, lifetime, lower and upper cell coordinates, and creation time. To be used in internal construction methods.
Examples
id = ""
lambda = 3.0
lower = ntuple(i -> 0.2, d)
upper = ntuple(i -> 0.7, d)
creation_time = 0.0
tree = MondrianTree(id, lambda, lower, upper, creation_time)Base.show — MethodBase.show(tree::MondrianTree{d}) where {d}Show the recursive structure of a Mondrian tree.
MondrianForests.are_in_same_leaf — Methodare_in_same_leaf(x1::NTuple{d,Float64}, x2::NTuple{d,Float64}, tree::MondrianTree)Check if two points are in the same leaf cell of a Mondrian tree.
Examples
d = 2
tree = MondrianTree(d, 0.0)
x1 = ntuple(i -> 0.2, d)
x2 = ntuple(i -> 0.7, d)
are_in_same_leaf(x1, x2, tree)
# output
trueMondrianForests.count_leaves — Methodcount_leaves(tree::MondrianTree{d}) where {d}Count the leaves of a Mondrian tree.
Examples
tree = MondrianTree(2, 3.0)
count_leaves(tree)MondrianForests.get_center — Methodget_center(tree::MondrianTree{d}) where {d}Get the center point of the root cell of a Mondrian tree.
Examples
tree = MondrianTree(2, 3.0)
get_center(tree)
# output
(0.5, 0.5)MondrianForests.get_common_refinement — Methodget_common_refinement(trees::Vector{MondrianTree{d}}) where {d}Get the common refinement of several Mondrian trees, whose leaf cells are the intersections of all leaf cells in trees. Preserves the split times and returns a new equivalent MondrianTree.
Examples
trees = [MondrianTree(2, 3.0) for _ in 1:3]
refined_tree = get_common_refinement(trees)MondrianForests.get_leaf_containing — Methodget_leaf_containing(x::NTuple{d,Float64}, tree::MondrianTree{d}) where {d}Get the leaf of a Mondrian tree containing a point x.
Examples
d = 2
x = ntuple(i -> 0.2, d)
tree = MondrianTree(d, 3.0)
get_leaf_containing(x, tree)MondrianForests.get_leaves — Methodget_leaves(tree::MondrianTree{d}) where {d}Get a list of the leaves in a Mondrian tree.
Examples
tree = MondrianTree(2, 3.0)
get_leaves(tree)MondrianForests.get_subtrees — Methodget_subtrees(tree::MondrianTree{d}) where {d}Get a list of the subtrees contained in a Mondrian tree.
Examples
tree = MondrianTree(2, 3.0)
get_subtrees(tree)MondrianForests.get_volume — Methodget_volume(tree::MondrianTree{d}) where {d}Get the d-dimensional volume of the root cell of a Mondrian tree.
Examples
tree = MondrianTree(2, 3.0)
get_volume(tree)
# output
1.0MondrianForests.is_in — Methodis_in(x::NTuple{d,Float64}, tree::MondrianTree{d}) where {d}Check if a point x is contained in the root cell of a Mondrian tree.
Examples
x = ntuple(i -> 0.2, 2)
tree = MondrianTree(2, 3.0)
is_in(x, tree)
# output
trueMondrianForests.restrict — Methodrestrict(tree::MondrianTree{d}, time::Float64) where {d}Restrict a Mondrian tree to a stopping time.
Examples
tree = MondrianTree(2, 3.0)
restrict(tree, 2.0)Mondrian random forests
MondrianForests.MondrianForest — TypeA Mondrian random forest is determined by:
lambda: the non-negative lifetime parametern_trees: the number of Mondrian trees in the forestn_data: the number of data pointsn_evals: the number of evaluation pointsx_evals: the evaluation pointssignificance_level: the significance level for confidence intervalsX_data: the covariate dataY_data: the response datatrees: a list of the trees used in the forestmu_hat: the estimated regression functionsigma2_hat: the estimated residual variance functionSigma_hat: the estimated forest variance functionconfidence_band: a confidence band for the regression function
MondrianForests.MondrianForest — MethodMondrianForest(lambda::Float64, n_trees::Int, x_evals::Vector{NTuple{d,Float64}},
X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
estimate_var::Bool=false, significance_level::Float64=0.05) where {d}Fit a Mondrian random forest to data. If estimate_var is false, do not estimate the variance or construct confidence bands. This can speed up computation significantly.
Examples
lambda = 3.0
n_trees = 20
x_evals = [(0.5, 0.5), (0.2, 0.8)]
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
estimate_var = true
significance_level = 0.05
forest = MondrianForest(lambda, n_trees, x_evals, X_data, Y_data,
estimate_var, significance_level)Base.show — MethodBase.show(forest::MondrianForest{d}) where {d}Show a Mondrian random forest.
Debiased Mondrian random forests
MondrianForests.DebiasedMondrianForest — TypeA debiased Mondrian random forest is determined by:
lambda: the non-negative lifetime parametern_trees: the number of Mondrian trees in the forestn_data: the number of data pointsn_evals: the number of evaluation pointsx_evals: the evaluation pointsdebias_order: the order of debiasing to applysignificance_level: the significance level for confidence intervalsX_data: the covariate dataY_data: the response datadebias_scaling: the lifetime scaling valuesdebias_coeffs: the debiasing coefficientstrees: a list of the trees used in the forestmu_hat: the estimated regression functionsigma2_hat: the estimated residual variance functionSigma_hat: the estimated forest variance functionconfidence_band: a confidence band for the regression function
MondrianForests.DebiasedMondrianForest — MethodDebiasedMondrianForest(lambda::Float64, n_trees::Int, x_evals::Vector{NTuple{d,Float64}},
debias_order::Int,
X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
estimate_var::Bool=false,
significance_level::Float64=0.05) where {d}Fit a debiased Mondrian random forest to data. If estimate_var is false, do not estimate the variance or construct confidence bands. This can speed up computation significantly.
Examples
lambda = 3.0
n_trees = 20
x_evals = [(0.5, 0.5), (0.2, 0.8)]
debias_order = 1
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
estimate_var = true
significance_level = 0.05
forest = DebiasedMondrianForest(lambda, n_trees, x_evals, debias_order, X_data, Y_data,
estimate_var, significance_level)Base.show — MethodBase.show(forest::DebiasedMondrianForest{d}) where {d}Show a debiased Mondrian random forest.
Lifetime parameter selection
MondrianForests.select_lifetime_polynomial — Methodselect_lifetime_polynomial(X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
debias_order::Int=0) where {d}Select the lifetime parameter for a (debiased) Mondrian random forest using polynomial estimation.
Examples
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
debias_order = 0
lambda = select_lifetime_polynomial(X_data, Y_data, debias_order)MondrianForests.get_gcv — Methodget_gcv(lambda::Float64, n_trees::Int,
X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
debias_order::Int, n_subsample::Int) where {d}Get the generalized cross-validation score of a lifetime parameter lambda.
Examples
lambda = 3.0
n_trees = 20
debias_order = 0
n_subsample = 40
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
lambda = get_gcv(lambdas, n_trees, X_data, Y_data, debias_order, n_subsample)MondrianForests.select_lifetime_gcv — Methodselect_lifetime_gcv(lambdas::Vector{Float64}, n_trees::Int,
X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
debias_order::Int, n_subsample::Int) where {d}Select the lifetime parameter for a (debiased) Mondrian random forest using generalized cross-validation.
Examples
lambdas = collect(range(0.5, 10.0, step=0.5))
n_trees = 20
debias_order = 0
n_subsample = 40
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
lambda = select_lifetime_gcv(lambdas, n_trees, X_data, Y_data, debias_order, n_subsample)Data generation
MondrianForests.generate_data — Methodgenerate_data(n::Int, X_dist::Distribution, eps_dist::Distribution,
mu::Function, sigma2::Function)Generate sample data for Mondrian forest estimation.
Draws n independent samples from $Y = \mu(X) + \sigma(X) \varepsilon$, with $X \sim$ X_dist and $\varepsilon \sim$ eps_dist.
Examples
n = 20
X_dist = product_distribution([Uniform(0, 1) for _ in 1:2])
eps_dist = Uniform(-1, 1)
mu = (x -> x[1]^2)
sigma2 = (x -> 1 + x[2]^4)
generate_data(n, X_dist, eps_dist, mu, sigma2)MondrianForests.generate_uniform_data_normal_errors — Methodgenerate_uniform_data_normal_errors(d::Int, n::Int)Generate uniform sample data with normal errors for Mondrian forest estimation.
Draws n independent samples from $Y = \varepsilon$, with $X \sim \mathcal{U}[0, 1]$ and $\varepsilon \sim \mathcal{N}(0, 1)$.
Examples
d = 3
n = 20
generate_uniform_data_normal_errors(d, n)MondrianForests.generate_uniform_data_uniform_errors — Methodgenerate_uniform_data_uniform_errors(d::Int, n::Int)Generate uniform sample data with uniform errors for Mondrian forest estimation.
Draws n independent samples from $Y = \varepsilon$, with $X \sim \mathcal{U}[0, 1]$ and $\varepsilon \sim \mathcal{U}\big[-\sqrt 3, \sqrt 3\big]$.
Examples
d = 3
n = 20
generate_uniform_data_uniform_errors(d, n)