Package‘rare’
October14,2022
Type Package
Title Linear Model with Tree-Based Lasso Regularization for Rare
Features
Version0.1.1
Author Xiaohan Yan[aut,cre],Jacob Bien[aut,cre]
Maintainer Xiaohan Yan<*****************>
Description Implementation of an alternating direction method of multipliers algorithm forfitting a linear model with tree-based lasso regularization,
which is proposed in Algorithm1of Yan and Bien(2018)<arXiv:1803.06675>.
The package allows efficient modelfitting on the entire2-dimensional
regularization path for large datasets.The complete set of functions
also makes the entire process of tuning regularization parameters and
visualizing results hassle-free.
Depends R(>=3.2.1)
Imports Matrix,glmnet,Rcpp
Suggests knitr,dendextend,rmarkdown
License GPL-3
Encoding UTF-8
LazyData true
VignetteBuilder knitr
RoxygenNote6.1.0
LinkingTo Rcpp,RcppArmadillo
URL github/yanxht/rare
BugReports github/yanxht/rare/issues
NeedsCompilation yes
Repository CRAN
Date/Publication2018-08-0316:50:09UTC
1
2rare-package R topics documented:
正则化与稀疏rare-package (2)
data.dtm (3)
data.hc (3)
data.rating (4)
find.leaves (4)
group.plot (5)
rarefit (7)
rarefit.cv (10)
rarefit.predict (11)
tree.matrix (12)
Index14 rare-package Model path for tree-based lasso framework for selecting rare features
Description
The packagefits the linear model with tree-based lasso regularization proposed in Yan and Bien (2018)
using alternating direction method of multipliers(ADMM).The ADMM algorithm is pro-posed in Algorithm1of the same paper.The package also provides tools for tuning regularization parameters,making predictions from thefitted model and visualizing recovered groups of the co-variates in a dendrogram.
Details
Its main functions are rarefit,rarefit.cv,rarefit.ver and group.plot.
Author(s)
Xiaohan Yan<*****************>,Jacob Bien
References
Yan,X.and Bien,J.(2018)Rare Feature Selection in High Dimensions,/ abs/1803.06675.
data.dtm3 data.dtm Document-term matrix for adjectives in TripAdvisor hotel reviews
Description
A500-by-200document-term matrix for200adjectives appearing in500TripAdvisor reviews.The document-term matrix is in sparse format.
Usage
data.dtm
Format
An object of class dgCMatrix with500rows and200columns.
See Also
data.rating,data.hc.
data.hc Hierarchical clustering tree for adjectives in TripAdvisor data set
Description
An hclust tree for the200adjectives appearing in the TripAdvisor reviews.The tree was gen-erated with100-dimensional word embeddings pre-trained by GloVe(Pennington et al.,2014)on Gigaword5and
Wikipedia2014corpora for the adjectives.
Usage
data.hc
Format
An object of class hclust of length7.
Source
Embeddings available at nlp.stanford.edu/data/glove.6B.zip
References
Pennington,J.,Socher,R.,and Manning,C.D.(2014).Glove:Global vectors for word representa-tion.In Empirical Methods in Natural Language Processing(EMNLP),pages1532–1543.
4find.leaves data.rating TripAdvisor hotel review ratings
Description
A length-500TripAdvisor review ratings on the scale1to5.
Usage
data.rating
Format
An object of class integer of length500.
Source
TripAdvisor Data Set used in www.cs.virginia.edu/~hw5x/paper/rp166f-wang.pdf find.leaves Find all descendant leaves of a node in an hclust tree
Description
The function recursivelyfinds all leaves that are descendants of a node in an hclust tree.
Usage
find.leaves(ind,merge)
Arguments
ind Index of the tree node.For an hclust tree of p leaves,-j denotes the j th leaf and k denotes the interior node formed at the k th merging in constructing the tree.
The range of ind is{-1,...,-p,1,...,p-1}where p-1is the number of interior
nodes.
merge A(p-1)-by-2matrix that encodes the order of mergings in constructing the tree.
merge uses the same notation for nodes and mergings in an hclust object.See
hclust for details.
Value
Returns a sequence of indices for descendant leaves in the leaf set{1,...,p}.Unlike the notation used in ind,we use positive integers to denote leaves here.
group.plot5
Examples
##Not run:
hc<-hclust(dist(USArrests),"ave")
#Descendant leaves of the10th leaf(should be iteself)
find.leaves(-10,hc$merge)
#Descendant leaves of the10th interior node
find.leaves(10,hc$merge)
#Descendant leaves of the root(should be all leaves)
ind_root<-nrow(hc$merge)
all.equal(find.leaves(ind_root,hc$merge),hc$order)
##End(Not run)
group.plot Visualize groups by coloring branches and leaves of an hclust tree
Description
The function plots an hclust tree with branches and leaves colored based on group membership.
The groups span the covariate indices{1,...,nvars}.Covariates from the same group share equal coefficient(beta),and sibling groups have different coefficients.The function determines groups based on the sparsity in gamma.In an hclust tree with beta[i]on the i th leaf,the branch and leaf are colored in blue,red or gray according to beta[i]being positive,negative or zero,respectively.
The larger the magnitude of beta[i]is,the darker the color will be.So branches and leaves from the same group will have the same color.
Usage
group.plot(beta,gamma,A,hc,nbreaks=20)
Arguments
beta Length-nvars vector of covariate coefficient.
gamma Length-nnodes vector of latent variable coefficient.Note that rarefit returns NA as gamma value when alpha is zero,in which case our problem becomes the
lasso on beta.
A nvars-by-nnodes binary matrix encoding ancestor-descendant relationships be-
tween leaves and nodes in the tree.
hc An hclust tree of nvars leaves where each leaf corresponds to a covariate.
nbreaks Number of breaks in binning beta elements(positive part and negative part are done separately).Each bin is associated with a color based on the magnitude
and positivity/negativity of beta elements in the bin.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。