Documentation

Mondrian trees

MondrianForests.MondrianTreeType

A Mondrian tree is determined by:

  • id: a string to identify the tree
  • lambda: the non-negative lifetime parameter
  • lower: the lower coordinate of the root cell
  • upper: the upper coordinate of the root cell
  • creation_time: the time when the root cell was created during sampling
  • is_split: whether the root cell is split
  • split_axis: the direction in which the root cell is split, if any
  • split_location: the location on split_axis at which the root cell is split, if any
  • tree_left: the left child tree of the root cell, if any
  • tree_right: the right child tree of the root cell, if any
source
MondrianForests.MondrianTreeMethod
MondrianTree(d::Int, lambda::Float64)

Sample a Mondrian tree $\mathcal{M}([0,1]^d, \lambda)$.

Examples

d = 2
lambda = 3.0
tree = MondrianTree(d, lambda);
source
MondrianForests.MondrianTreeMethod
MondrianTree(id::String, lambda::Float64, lower::NTuple{d,Float64},
             upper::NTuple{d,Float64}, creation_time::Float64) where {d}

Sample a Mondrian tree with a given id, lifetime, lower and upper cell coordinates, and creation time. To be used in internal construction methods.

Examples

id = ""
lambda = 3.0
lower = ntuple(i -> 0.2, d)
upper = ntuple(i -> 0.7, d)
creation_time = 0.0
tree = MondrianTree(id, lambda, lower, upper, creation_time)
source
Base.showMethod
Base.show(tree::MondrianTree{d}) where {d}

Show the recursive structure of a Mondrian tree.

source
MondrianForests.are_in_same_leafMethod
are_in_same_leaf(x1::NTuple{d,Float64}, x2::NTuple{d,Float64}, tree::MondrianTree)

Check if two points are in the same leaf cell of a Mondrian tree.

Examples

d = 2
tree = MondrianTree(d, 0.0)
x1 = ntuple(i -> 0.2, d)
x2 = ntuple(i -> 0.7, d)
are_in_same_leaf(x1, x2, tree)

# output
true
source
MondrianForests.count_leavesMethod
count_leaves(tree::MondrianTree{d}) where {d}

Count the leaves of a Mondrian tree.

Examples

tree = MondrianTree(2, 3.0)
count_leaves(tree)
source
MondrianForests.get_centerMethod
get_center(tree::MondrianTree{d}) where {d}

Get the center point of the root cell of a Mondrian tree.

Examples

tree = MondrianTree(2, 3.0)
get_center(tree)

# output
(0.5, 0.5)
source
MondrianForests.get_common_refinementMethod
get_common_refinement(trees::Vector{MondrianTree{d}}) where {d}

Get the common refinement of several Mondrian trees, whose leaf cells are the intersections of all leaf cells in trees. Preserves the split times and returns a new equivalent MondrianTree.

Examples

trees = [MondrianTree(2, 3.0) for _ in 1:3]
refined_tree = get_common_refinement(trees)
source
MondrianForests.get_leaf_containingMethod
get_leaf_containing(x::NTuple{d,Float64}, tree::MondrianTree{d}) where {d}

Get the leaf of a Mondrian tree containing a point x.

Examples

d = 2
x = ntuple(i -> 0.2, d)
tree = MondrianTree(d, 3.0)
get_leaf_containing(x, tree)
source
MondrianForests.get_leavesMethod
get_leaves(tree::MondrianTree{d}) where {d}

Get a list of the leaves in a Mondrian tree.

Examples

tree = MondrianTree(2, 3.0)
get_leaves(tree)
source
MondrianForests.get_subtreesMethod
get_subtrees(tree::MondrianTree{d}) where {d}

Get a list of the subtrees contained in a Mondrian tree.

Examples

tree = MondrianTree(2, 3.0)
get_subtrees(tree)
source
MondrianForests.get_volumeMethod
get_volume(tree::MondrianTree{d}) where {d}

Get the d-dimensional volume of the root cell of a Mondrian tree.

Examples

tree = MondrianTree(2, 3.0)
get_volume(tree)

# output
1.0
source
MondrianForests.is_inMethod
is_in(x::NTuple{d,Float64}, tree::MondrianTree{d}) where {d}

Check if a point x is contained in the root cell of a Mondrian tree.

Examples

x = ntuple(i -> 0.2, 2)
tree = MondrianTree(2, 3.0)
is_in(x, tree)

# output
true
source
MondrianForests.restrictMethod
restrict(tree::MondrianTree{d}, time::Float64) where {d}

Restrict a Mondrian tree to a stopping time.

Examples

tree = MondrianTree(2, 3.0)
restrict(tree, 2.0)
source

Mondrian random forests

MondrianForests.MondrianForestType

A Mondrian random forest is determined by:

  • lambda: the non-negative lifetime parameter
  • n_trees: the number of Mondrian trees in the forest
  • n_data: the number of data points
  • n_evals: the number of evaluation points
  • x_evals: the evaluation points
  • significance_level: the significance level for confidence intervals
  • X_data: the covariate data
  • Y_data: the response data
  • trees: a list of the trees used in the forest
  • mu_hat: the estimated regression function
  • sigma2_hat: the estimated residual variance function
  • Sigma_hat: the estimated forest variance function
  • confidence_band: a confidence band for the regression function
source
MondrianForests.MondrianForestMethod
MondrianForest(lambda::Float64, n_trees::Int, x_evals::Vector{NTuple{d,Float64}},
               X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
               estimate_var::Bool=false, significance_level::Float64=0.05) where {d}

Fit a Mondrian random forest to data. If estimate_var is false, do not estimate the variance or construct confidence bands. This can speed up computation significantly.

Examples

lambda = 3.0
n_trees = 20
x_evals = [(0.5, 0.5), (0.2, 0.8)]
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
estimate_var = true
significance_level = 0.05
forest = MondrianForest(lambda, n_trees, x_evals, X_data, Y_data,
                        estimate_var, significance_level)
source
Base.showMethod
Base.show(forest::MondrianForest{d}) where {d}

Show a Mondrian random forest.

source

Debiased Mondrian random forests

MondrianForests.DebiasedMondrianForestType

A debiased Mondrian random forest is determined by:

  • lambda: the non-negative lifetime parameter
  • n_trees: the number of Mondrian trees in the forest
  • n_data: the number of data points
  • n_evals: the number of evaluation points
  • x_evals: the evaluation points
  • debias_order: the order of debiasing to apply
  • significance_level: the significance level for confidence intervals
  • X_data: the covariate data
  • Y_data: the response data
  • debias_scaling: the lifetime scaling values
  • debias_coeffs: the debiasing coefficients
  • trees: a list of the trees used in the forest
  • mu_hat: the estimated regression function
  • sigma2_hat: the estimated residual variance function
  • Sigma_hat: the estimated forest variance function
  • confidence_band: a confidence band for the regression function
source
MondrianForests.DebiasedMondrianForestMethod
DebiasedMondrianForest(lambda::Float64, n_trees::Int, x_evals::Vector{NTuple{d,Float64}},
                       debias_order::Int,
                       X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
                       estimate_var::Bool=false,
                       significance_level::Float64=0.05) where {d}

Fit a debiased Mondrian random forest to data. If estimate_var is false, do not estimate the variance or construct confidence bands. This can speed up computation significantly.

Examples

lambda = 3.0
n_trees = 20
x_evals = [(0.5, 0.5), (0.2, 0.8)]
debias_order = 1
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
estimate_var = true
significance_level = 0.05
forest = DebiasedMondrianForest(lambda, n_trees, x_evals, debias_order, X_data, Y_data,
                                estimate_var, significance_level)
source
Base.showMethod
Base.show(forest::DebiasedMondrianForest{d}) where {d}

Show a debiased Mondrian random forest.

source

Lifetime parameter selection

MondrianForests.select_lifetime_polynomialMethod
select_lifetime_polynomial(X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
                           debias_order::Int=0) where {d}

Select the lifetime parameter for a (debiased) Mondrian random forest using polynomial estimation.

Examples

data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
debias_order = 0
lambda = select_lifetime_polynomial(X_data, Y_data, debias_order)
source
MondrianForests.get_gcvMethod
get_gcv(lambda::Float64, n_trees::Int,
        X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
        debias_order::Int, n_subsample::Int) where {d}

Get the generalized cross-validation score of a lifetime parameter lambda.

Examples

lambda = 3.0
n_trees = 20
debias_order = 0
n_subsample = 40
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
lambda = get_gcv(lambdas, n_trees, X_data, Y_data, debias_order, n_subsample)
source
MondrianForests.select_lifetime_gcvMethod
select_lifetime_gcv(lambdas::Vector{Float64}, n_trees::Int,
                    X_data::Vector{NTuple{d,Float64}}, Y_data::Vector{Float64},
                    debias_order::Int, n_subsample::Int) where {d}

Select the lifetime parameter for a (debiased) Mondrian random forest using generalized cross-validation.

Examples

lambdas = collect(range(0.5, 10.0, step=0.5))
n_trees = 20
debias_order = 0
n_subsample = 40
data = generate_uniform_data_uniform_errors(2, 50)
X_data = data["X"]
Y_data= data["Y"]
lambda = select_lifetime_gcv(lambdas, n_trees, X_data, Y_data, debias_order, n_subsample)
source

Data generation

MondrianForests.generate_dataMethod
generate_data(n::Int, X_dist::Distribution, eps_dist::Distribution,
              mu::Function, sigma2::Function)

Generate sample data for Mondrian forest estimation.

Draws n independent samples from $Y = \mu(X) + \sigma(X) \varepsilon$, with $X \sim$ X_dist and $\varepsilon \sim$ eps_dist.

Examples

n = 20
X_dist = product_distribution([Uniform(0, 1) for _ in 1:2])
eps_dist = Uniform(-1, 1)
mu = (x -> x[1]^2)
sigma2 = (x -> 1 + x[2]^4)
generate_data(n, X_dist, eps_dist, mu, sigma2)
source
MondrianForests.generate_uniform_data_normal_errorsMethod
generate_uniform_data_normal_errors(d::Int, n::Int)

Generate uniform sample data with normal errors for Mondrian forest estimation.

Draws n independent samples from $Y = \varepsilon$, with $X \sim \mathcal{U}[0, 1]$ and $\varepsilon \sim \mathcal{N}(0, 1)$.

Examples

d = 3
n = 20
generate_uniform_data_normal_errors(d, n)
source
MondrianForests.generate_uniform_data_uniform_errorsMethod
generate_uniform_data_uniform_errors(d::Int, n::Int)

Generate uniform sample data with uniform errors for Mondrian forest estimation.

Draws n independent samples from $Y = \varepsilon$, with $X \sim \mathcal{U}[0, 1]$ and $\varepsilon \sim \mathcal{U}\big[-\sqrt 3, \sqrt 3\big]$.

Examples

d = 3
n = 20
generate_uniform_data_uniform_errors(d, n)
source