Öoverview | Software |
Description | Websites |
readings | Courses |
overview
This list is based on Principal Component Analysis (PCA).book pageand exploratory factor analysis (EFA)book pageon this website. This resource is intended to serve as a guide for researchers considering using PCA or EFA as a data reduction technique. The resources described below are intended to supplement existing resources on technology-specific websites.
Description
Theoretical/statistical foundations and comparisons
These two publications compare the two methods and have opposing views on whether EFA and PCA should be used on the same dataset.
Principal Component Analysis vs. Exploration Factor Analysis. D. Suhr SAS Working Paper 203-30:http://www2.sas.com/proceedings/sugi30/203-30.pdf
“Determining the appropriate statistical analysis to answer research questions a priori... It is not appropriate to perform PCA and EFA on your data. PCA involves correlated variables with the goal of reducing the number of variables and explaining the same amount of variance with fewer variables (principal components). The AFE estimates factors, underlying constructs that cannot be measured directly."
Joliffe IT, Morgan BJ. Principal component analysis and exploratory factor analysis. Statistical Methods in Medical Research 1992;1:69-95.
“Despite their different formulations and objectives, it can be instructive to look at the results of both techniques on the same dataset. Each technique offers different insights into the structure of the data, with PCA focusing on consideration of diagonal elements and factor analysis of off-diagonal elements of the covariance matrix, both of which can be useful.
(Video) Principal Component Analysis and Factor Analysis
There are a number of other books and resources cited on the Advanced Epidemiology page for each method. Many resources cover both techniques, but don't necessarily compare and contrast them. The online resources at the end of this booklet provide introductory material and a comparison of the two methods.
The general purpose of this guide is to provide resources for a researcher to navigate through the following decision tree nodes and to share literature that has compared the use of PCA, EFA, and other data reduction techniques.
readings
Methodical Article
The following articles are reviews of the use of PCA, EFA, and other data reduction techniques in the health and public health literature.
This article is more theoretical and reviews the underlying theory for PCA, EFA (and their connection) along with structural equation modeling and MIMIC using welfare and poverty indices as a case study.
Krishnakumar, Jaya and Nagar, AL, On Exact Statistical Properties of Multidimensional Indices Based on Principal Components, Factor Analysis, MIMIC, and Structural Equation Modeling (2008). Social Indicators Survey, (2008) 86:481-496.
Systematic review of the classification systems of major depressive disorders and the statistical methods used to identify symptom dimensions or latent classes. Based on 20 articles with 34 analyses, the authors found an equal number of factor analyzes and PCAs performed, often with the same scales and measures or on the same sample.
van Loo HM, de Jonge P, Romeijn JW, Kessler RC, Schoevers RA. Data-driven subtypes of major depressive disorders: a systematic review. BMC Med 2012;10:156.
This article reviews 47 studies using PCA and compares methods, challenges, and pitfalls of using PCA for composite health interventions. The article suggests repeating the analysis between samples and using complementary methods such as factor analysis.
Coste J, Bouee S, Ecosse E, Leplege A, Pouchot J. Methodological problems in determining the dimensionality of composite health measures using principal component analysis: case studies and suggestions for practice.
Quality of Life Survey: An International Journal of Quality of Life Aspects of Treatment, Care, and Rehabilitation 2005;14:641-54.
This article describes common mistakes and mistakes with EPT from a review of 60 studies in psychological journals. It provides useful suggestions for improving practices related to the use of EFAs and journal reporting.
Henson RK, Roberts JK. Using investigation factor analysis in published research: Common mistakes and some comments on best practices. Pedagogical and psychological measure 2006;66:393-416.
This article provides an overview of the use of EFAs and key decisions when conducting EFAs (review of 28 articles from top nursing journals). Results reported that PCA was used more frequently than EFA (61% vs. 39%), although no article explained why PCA was preferred to EFA. The document outlines practical recommendations for addressing flawed and outdated "rules of thumb" for using PCA and EFA.
Gaskin CJ, Happell B. In exploratory factor analysis: a review of recent evidence, an assessment of current practice, and recommendations for future use. International Journal of Nursing Studies 2014;51:511-21.
application item
PCA
Comparison of the nutritional epidemiology of reduced rank regression, partial least squares regression, and PCA.
DiBello JR, Kraft P, McGarvey ST, Goldberg R, Campos H, Baylin A. Comparison of 3 methods for identifying dietary patterns associated with disease risk. American Journal of Epidemiology 2008;168:1433-43.
Example of social epidemiology. The authors concluded that using a variable instead of PCA can be just as good as developing principal components.
Hurtado D, Kawachi I, Sudarsky J. Social capital and self-assessed health in Colombia: the good, the bad, and the ugly. Social Sciences and Medicine 2011;72:584-90.
Analysis of the built environment and development of the neighborhood deprivation index using PCA.
Messer LC, Laraia BA, Kaufman JS, et al. The development of a standardized neighborhood deprivation index. Journal of Urban Health: Bulletin of the New York Academy of Medicine 2006;83:1041-62.
FEP
Dietary epidemiological study of dietary patterns and association with throat cancer. Comparison of dietary patterns and whether they provide a better explanation of the determinants versus the individual components of the dietary patterns.
De Stefani E, Boffetta P, Ronco AL, Deneo-Pellegrini H, Acosta G, Mendilaharsu M. Dietary patterns and laryngeal cancer risk: an exploratory factor analysis in Uruguayan men. International Journal of Cancer International Journal of Cancer 2007;121:1086-91.
Dietary epidemiological study comparing two EFA-generated dietary patterns (“traditional cooking” and “fruit and vegetable” patterns) to a hypothesis-based Dietary Approaches to Stop Hypertension (DASH) pattern. No significant trends were found when comparing the three patterns, although women in the third trimester of DASH had a lower risk than those in the first trimester.
Schulze MB, Hoffmann K, Kroke A, Boeing H. Risk of hypertension in women in the EPIC-Potsdam study: comparison of relative risk estimates for exploratory and hypothesis-based dietary patterns. American Journal of Epidemiology 2003;158:365-73.
Social-epidemiological article using PCA and EFA as synonyms: the authors write that they "performed an exploratory factor analysis using principal component analysis". The EFA produced two factors that reflect perceived and exercised sexual stigma in LBQ women (based on items on a sexual stigma scale).
Logie CH, Earnshaw V. Adaptation and validation of a scale to measure sexual stigma among lesbian, bisexual, and queer women. Plos one 2015;10:e0116198.
Built environment work examining environmental contributions to drug abuse using 32 variables for census ranks. Four factors (accounting for 55.8% of the variance) were identified. The authors noted that EFA may be more politically relevant by helping to distinguish between influence/relationship with economic well-being, violence, or social disorganization (3 of the factors).
Bell DC, Carlson JW, Richard AJ. The social ecology of drug use: a factor analysis of an urban setting. Subst Use Misuse 1998;33:2201-17.
Courses
Short course on PCA and EFA by José Manuel Roche at Oxford University Poverty and Human Development Initiative with video lectures, slides, practice files, reading list and links to other resources. Available here:http://www.ophi.org.uk/principal-components-analysis-and-factor-analysis-2010
Two introductory lessons on PCA and EFA by Mike Clark, PhD from the University of North Texas and Elizabeth Root from the University of Colorado. Explain the difference in variance between the two methods. These lectures also provide a helpful explanation of the scales of factor analysis, as well as guidance on which variables should be included in the analysis:
http://www.unt.edu/rss/class/mike/6810/Principal%20Components%20Analysis.pdfmi
http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture18_PCA.pdf
A resource page on EFA and PCA from the University of Wisconsin Department of Psychology:http://psych.wisc.edu/henriques/pca.html
5 introductory videos and tutorials (2 hours) for EFA and PCA from Academy of Econometrics (by Ani Katakova). It is interesting to note that the example performs EFA and PCA on the same dataset.https://www.youtube.com/playlist?list=PLRW9kMvtNZOjaStLK9ldf_Yc8MB6TkCUx
Other Academy of Econometrics resources are available here:https://sites.google.com/site/econometricsacademy/
Lecture on principal component analysis from Bill Press' Opinionated Lessons in Statistics, University of Texas. Major warning regarding exaggerated interpretation of the meaning of the componentshttps://www.youtube.com/watch?v=frWqIUpIxLg&index=43&list=PLUAHeOPjkJseXJKbuk9-hlOfZU9Wd6pS0
A written tutorial on principal component analysis. Lindsay I Smith on February 26, 2002. Accessed March 15, 2015. Available athttps://courses.cs.washington.edu/courses/cse528/09sp/pca.pdf
Short tutorial on exploratory (and confirmatory) factor analysis by Jamie Decoster at the University of Alabama. "Summary of Factor Analysis". Consulted March 16, 2005. Available at:http://stat-help.com/factor.pdf
FAQs
Principal component analysis or research factor analysis? ›
PCA looks to identify the dimensions that are composites of the observed predictors. Factor analysis explicitly presumes that the latent (or factors) exist in the given data. The aim of PCA is to explain as much of the cumulative variance in the predictors (or variables) as possible.
What is the difference between factor analysis and principal component analysis? ›The mathematics of factor analysis and principal component analysis (PCA) are different. Factor analysis explicitly assumes the existence of latent factors underlying the observed data. PCA instead seeks to identify variables that are composites of the observed variables.
What is principal component analysis and factor analysis in accounting research? ›Principal component analysis (PCA) and factor analysis (FA) are statistical techniques used to represent a set of observed variables in terms of a smaller number of variables (i.e., “components” or “factors”).
Which is factor analysis while a principal components analysis? ›Principal component analysis involves extracting linear composites of observed variables. Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.
When would you use PCA over EFA? ›PCA aims to explain the maximum amount of the total variance in the variables by analysing all of the observed variance, while in EFA, only the shared covariance between the variables is analysed (Schneeweiss & Mathes, 1995). PCA is undertaken when there is sufficient correlation among the original variables.
What is better than PCA? ›LDA is more effective than PCA for classification datasets because LDA reduces the dimensionality of the data by maximizing class separability. It is easier to draw decision boundaries for data with maximum class separability.
What is the best explanation of principal component analysis? ›Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed.
What is the relationship between factor analysis and Principal Component Analysis? ›PCA has as a goal to define new variables based on the highest variance explained and so forth. FA has as a goal to define new variables that we can understand and interpret in a business / practical manner.
What is the alternative to factor analysis? ›Composite variable analysis: A simple and transparent alternative to factor analysis.
What is a drawback of factor analysis? ›Disadvantages of Factor Analysis:
If important attributes are missed the value of procedure is reduced accordingly. 2. Naming of the factors can be difficult multiple attributes can be highly correlated with no apparent reasons.
What tool is used for principal component analysis? ›
Principal Component Analysis (PCA) is one of the most popular data mining statistical methods. Run your PCA in Excel using the XLSTAT statistical software.
When should PCA not be used? ›PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.
When not to use PCA analysis? ›While it is technically possible to use PCA on discrete variables, or categorical variables that have been one hot encoded variables, you should not. Simply put, if your variables don't belong on a coordinate plane, then do not apply PCA to them.
What is the disadvantage of using PCA? ›The drawbacks with PCA is that it is difficult to evaluate the covariance matrix in an accurate manner and it also fails to capture the simplest invariance unless the information is explicitly provided to the training data.
Why is PCA preferable? ›Principal Component Analysis (PCA) is very useful to speed up the computation by reducing the dimensionality of the data. Plus, when you have high dimensionality with high correlated variable of one another, the PCA can improve the accuracy of classification model.
In which of the following cases will PCA work better? ›PCA always performs better than t-SNE for smaller-sized data.
What is a real life example of principal component analysis? ›Some real-world applications of PCA are image processing, movie recommendation system, optimizing the power allocation in various communication channels. It is a feature extraction technique, so it contains the important variables and drops the least important variable.
What is principal component analysis PCA dummies? ›Principal components analysis attempts to capture most of the information in a dataset by identifying the principal components that maximize the variance between observations. The covariance matrix is a symmetric matrix with rows and columns equal to the number of dimensions in the data.
What is PC1 and PC2 in principal component analysis? ›These axes that represent the variation are “Principal Components”, with PC1 representing the most variation in the data and PC2 representing the second most variation in the data. If we had three samples, then we would have an extra direction in which we could have variation.
What is factor analysis in simple terms? ›Factor analysis is a statistical technique that reduces a set of variables by extracting all their commonalities into a smaller number of factors. It can also be called data reduction.
Why not use factor analysis? ›
Assumes linear relationships between input variables.
One of the main disadvantages of factor analysis is that it makes the assumption that the input features are linearly related to one another. That means it may not perform well on sets of features that are not linearly related.
There are two types of factor analyses, exploratory and confirmatory.
What type of factor analysis should I use? ›Exploratory Factor Analysis should be used when you need to develop a hypothesis about a relationship between variables. Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables.
Why is factor analysis controversial? ›Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.
Is Anova the same as factor analysis? ›One factor analysis of variance (Snedecor and Cochran, 1989) is a special case of analysis of variance (ANOVA), for one factor of interest, and a generalization of the two-sample t-test. The two-sample t-test is used to decide whether two groups (levels) of a factor have the same mean.
What are some pros and cons of using factor analysis? ›There is flexibility in naming using dimensions. It is not extremely difficult to do, inexpensive, and accurate. Disadvantages The disadvantages of factor analysis are as follows: Naming of the factors can be difficult – multiple attributes can be highly correlated with no apparent reason.
When would you use a principal components analysis? ›When/Why to use PCA. PCA technique is particularly useful in processing data where multi-colinearity exists between the features/variables. PCA can be used when the dimensions of the input features are high (e.g. a lot of variables). PCA can be also used for denoising and data compression.
What is the core of the principal component analysis method? ›The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude.
Which is the most common type of PCA error? ›Misprogramming of the PCA pump is, by far, the most frequently reported practice-related issue.
Do you need cross validation for PCA? ›In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods.
What are the 3 factors in PCA? ›
Litterman and Scheinkman (1991) use a principal component analysis (PCA) and find that US bond returns are mainly determined by three factors such as level, steepness, and curvature movements in the term structure.
What is the minimum sample size for PCA? ›Generally speaking, a minimum of 150 cases, or 5 to 10 cases per variable, has been recommended as a minimum sample size. There are a few methods to detect sampling adequacy: (1) the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy for the overall data set; and (2) the KMO measure for each individual variable.
What are the strengths of PCA? ›PCA's key advantages are its low noise sensitivity, the decreased requirements for capacity and memory, and increased efficiency given the processes taking place in a smaller dimensions; the complete advantages of PCA are listed below: 1) Lack of redundancy of data given the orthogonal components [19, 20].
Why is PCA bad for feature selection? ›The problem with using PCA is that (1) measurements from all of the original variables are used in the projection to the lower dimensional space, (2) only linear relationships are considered, and (3) PCA or SVD-based methods, as well as univariate screening methods (t-test, correlation, etc.), do not take into account ...
Are factors and components the same thing? ›6.1 Components are linear sums of variables and do not necessarily say anything about the correlations between the variables. Factors are latent variables thought to explain the correlations or covariances between observed variables.
What is the difference between canonical correlation and principal component analysis and factor analysis? ›Canonical Correlation Analysis vs PCA
Where PCA focuses on finding linear combinations that account for the most variance in one data set , Canonical Correlation Analysis focuses on finding linear combinations that account for the most correlation in two datasets.
A Content Type is a data structure that is used as a collection of specific content. A Component is a data structure (also) that could be used and re-used in many different Content Type.
What is the difference between causes and factors? ›A cause is the agent that is responsible in producing an effect. On the other hand a factor is an agent that is affecting an object, a procedure or a process.
Is factors and classification are same? ›Classifications are the potential results that can be achieved for a factor. For example, a need category might include classifications such as Vulnerable, Engaged, Progressing, and Self-Sufficient. Factors are assigned to a category and thus adopt the category classification.