This is actually linked (through an "iframe" tag) to a Google Doc that I keep updating as I face projects making use of new mining algorithms... so this is work-in-progress. I realize that blog is probably not the best way to publish live text, but is the easiest one for me.
This first part gathers basic topics from statistics difficult to classify into a very precise subject... should pretty much serve as a refresher for most people in this domain.
Definition:
instances = data objects observed and analysed (sometimes referred to as objects, data points...)
variables= characteristics measured (for continuous) or observed (for categorical) for each instance
Notation:
n data objects (sample size)
X generic input variables. When it is a vector, its component variable j is expressed with subscript: Xj
x denotes some observed instance, and when we have p-variables, we denote x1 .. xp as the the real-valued for the 1.. p variables measured on the particular object or instance.
xk(i) correspond to the measure for variable Xk of the i-th data objects, where i has 1 .. N.
x (in bold) correspond to the vector of n observation of a single variable x.
X (capital in bold) correspond to the matrix N x p, containing N input p-vector xp(1..N).

0 commentaires:
Enregistrer un commentaire