Tuesday, July 29, 2014

Star Plot



A star plot is a method of studying multivariate data where each star is one unique observation. Each star contains a set amount of variables that are studied for the whole set of stars. Each variable has a range that each star uses to record specific data used to compare with the rest of the stars. Star plots can compare a large amount of objects with a large number of the same variables. The image above is an example of a star plot that shows a set of sixteen stars that each share the same nine variables which can be compared at a glance that can show the particular advantages and disadvantages each car has in comparison to one another.

Correlation Matrix


(Table 8.1 near bottom of page)

A correlation matrix describes a correlation among matrix variables. It's basically a table that contains correlation coefficients. The original variables can be directly computed, but I personally do not know how to explain how to compute this nor how to explain how to calculate it. The image above is an example of a correlation matrix of economic variables that are correlated together in simplified mathematics after very complex calculating (when seeing how complex each variable is) and averaging each principal variable. 

Similarity Matrix

(bottom page)

A similarity matrix is a matrix of scores that represent the similarity between a number of data points. Similarity matrices are used to find data points found in clusters. Also, similarity matrices can be used to align sequences of DNA. Similarity matrices can be used to organize similarity between objects so that all the objects are compared with each other and can thus show the similarity within the table. The image shown above is an example of a similarity matrix where every individual is compared with the other from Charley to Ron on their baseball skill ratings.

Stem and Leaf Plot



Stem and leaf plots are devices for presenting quantitative data in a graphical format to visualize the shape of a distribution in a number set. Stem-and-leaf plots retain the original data to at least two significant digits, and put the data in order. Stem and leaf plots make it easy to find the mean, median, mode and range rather quickly due to all the data points being made available. The image shown above is an example of a stem and leaf plot where a large group of numbers are easily organized to show order in a disorganized set of number of boxes bought.

Box Plot

(bottom of source page)

A box plot is graphical way of depicting groups of numerical data through their group quartiles with basic statistical concepts like mean, median range and mode. A box plot may have lines extending vertically from the boxes, the "whiskers" which indicates variability outside the upper and lower quartiles. Box plots display variation in samples of a statistical population without having any assumptions of the statistical distribution. The spacing between the different parts of the box indicate the degree of dispersion and degree of skew in the data, while also showing outliers. Box plots can be drawn either horizontally or vertically. The image above is an example of a box plot of daily mean temperature in Fahrenheit for November 1940, Madison, Wisconsin.

Histogram



A histogram is a graphical representation of frequency within a distribution of data. Histograms are effective when talking about the totality of something to find out where frequencies may occur. Histograms are important to use to find out patterns that may exist and expose outliers that may have previously been difficult to pinpoint. The image shown above is an example of a histogram that exposes that the most frequent amount of Greek tragedies is between 7,000 words and 8,000 words. 

Parallel Coordinate Graph

(mid-top of source page)

A parallel coordinate graph is a common way of visualizing complex high-dimensional geometry and multivariate data in a much less complex graph. Parallel coordinate graphs are difficult to construct due to the scaling of the axes, the ordering of the variables, and rotation of the axes.  Parallel coordinate graphs can easily be inaccurately made as well as inaccurately read which is why it is not popularly used outside of research fields that test products and use parallel coordinate graphs often. Parallel coordinate graphs compare two variables on a set of characteristics with multiple tests. Both variables are usually distinguished with a different color. The image shown above is an example of a parallel graph that compares two vehicles with a large sample size of tests.