By AOLYTIX Research Desk 16 min read · Data Analysis · R for Research
A PhD candidate came to us with a dataset — 1,200 responses, 47 variables, all named Q1 through Q47. No labels, no value descriptions, no codebook. Just numbers in a spreadsheet with column headers that meant nothing without the original questionnaire.
We spent the first hour just renaming variables and building a data dictionary. Not glamorous work. But it's the kind of thing that separates analysis that holds up under examination from analysis that falls apart the moment someone asks "what does Q23 measure?"
We did that project in R.
Not because R is always the answer. But because for that dataset — large, complex, headed for journal submission — R was the right tool. The analysis was clean, reproducible, and the visualisations were good enough to go straight into the manuscript.
This guide explains what R is, what it can do for your research, and how to get started — even if you've never programmed before.
R is a free, open-source programming language built specifically for statistical computing and data visualisation. It's been the standard tool in epidemiology, ecology, psychology, economics, and public health research globally for over two decades. If you want to publish in a serious international journal, there's a reasonable chance the reviewer on the other side of your submission uses R.
But beyond prestige — here's the practical case:
It's free. No licence fee, no institutional restriction, no subscription that expires the day before your submission.
It has more statistical capability than any other single tool. Everything SPSS does, R does. Plus advanced techniques that SPSS doesn't support at all: structural equation modelling, multilevel modelling, survival analysis, Bayesian analysis, spatial statistics. As your research ambitions grow, R grows with you.
Its visualisations are in a different class. The ggplot2 package produces charts that look like they belong in The Lancet or Nature. SPSS charts look like they belong in 2004.
It's fully reproducible. Your entire analysis lives in a script. Anyone — your supervisor, a journal reviewer, a co-author — can run it and get the same results. That's increasingly expected in serious academic work.
You need two things. Both are free.
Step 1: Install R from cran.r-project.org. Choose the version for your operating system.
Step 2: Install RStudio from posit.co/download/rstudio-desktop. The free Desktop version is everything you need.
Always open RStudio — not R itself. RStudio gives you a proper workspace: a script editor, a console, an environment panel showing your loaded data, and a plot viewer. Raw R is just a blank command line.
When you open RStudio, you'll see four panels. The most important habit to build immediately: write your code in the Script Editor (top-left), not in the Console (bottom-left). The Console runs code. The Script Editor saves it. If you do your analysis in the Console and then close RStudio, it's gone.
Base R handles a lot. But R's real capability comes from packages — free, community-built add-ons that extend what R can do. Install them once, load them at the start of each session.
# Install a package (only once)
install.packages("tidyverse")
# Load it for use (every session)
library(tidyverse)
The packages every researcher needs:
| Package | What it does |
|---|---|
tidyverse |
A collection covering data cleaning, manipulation, and visualisation — install this one and you get ggplot2, dplyr, readr, and more |
ggplot2 |
Publication-quality visualisation |
dplyr |
Data manipulation: filter, summarise, group, join |
psych |
Descriptive stats, reliability analysis (Cronbach's alpha), psychometrics |
car |
Regression diagnostics, ANOVA |
lme4 |
Multilevel / mixed effects models |
lavaan |
Structural Equation Modelling |
corrplot |
Visualising correlation matrices |
readxl |
Import Excel files directly |
For most dissertation-level work, tidyverse and psych combined with base R will cover everything you need.
library(readxl)
library(readr)
# From Excel
data <- read_excel("survey_data.xlsx")
# From CSV
data <- read_csv("survey_data.csv")
# Preview
head(data)
# Check structure and variable types
str(data)
library(dplyr)
# Remove rows with missing values
data_clean <- data %>% na.omit()
# Filter to a specific subset
females <- data %>% filter(gender == "Female")
# Rename variables
data <- data %>% rename(
job_satisfaction = Q1,
motivation = Q2,
performance = Q3
)
# Recode a categorical variable
data$education <- recode(data$education,
"1" = "Primary",
"2" = "Secondary",
"3" = "Tertiary"
)
library(psych)
# Summary stats for numerical variables
describe(data[, c("age", "years_experience", "satisfaction_score")])
# Frequency table
table(data$gender)
# Mean by group
data %>%
group_by(department) %>%
summarise(mean_satisfaction = mean(job_satisfaction, na.rm = TRUE))
If you used a multi-item Likert scale, run this before any other analysis. It validates that your scale is internally consistent — and examiners will ask for it if you don't provide it.
library(psych)
scale_items <- data[, c("Q1", "Q2", "Q3", "Q4", "Q5")]
alpha(scale_items)
The key number to report is raw_alpha. Anything above 0.70 is acceptable for social science research. Below that, you need to either revise your scale or explicitly justify its use.
# Pearson correlation between two variables
cor.test(data$training_hours, data$performance_score, method = "pearson")
# Correlation matrix for multiple variables
cor_matrix <- cor(data[, c("motivation", "satisfaction", "performance")], use = "complete.obs")
# Visualise
library(corrplot)
corrplot(cor_matrix, method = "color", addCoef.col = "black", tl.cex = 0.8)
# Independent samples T-test comparing two groups
t.test(performance_score ~ gender, data = data)
Report the t-statistic, degrees of freedom, p-value, and the means of both groups. That's what goes in your dissertation table.
# ANOVA across three or more groups
model <- aov(satisfaction_score ~ department, data = data)
summary(model)
# Post-hoc test to identify which specific groups differ
TukeyHSD(model)
# Build the model
model <- lm(performance_score ~ training_hours + experience + motivation, data = data)
# Full output
summary(model)
You'll get R-squared, adjusted R-squared, beta coefficients, standard errors, t-values, and p-values for each predictor. All of this belongs in your results chapter.
This is the section that makes R users never go back to SPSS charts.
library(ggplot2)
# Bar chart
ggplot(data, aes(x = department, fill = gender)) +
geom_bar(position = "dodge") +
labs(title = "Staff Distribution by Department and Gender",
x = "Department", y = "Count") +
theme_minimal()
# Scatter plot with regression line
ggplot(data, aes(x = training_hours, y = performance_score)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", colour = "steelblue", se = TRUE) +
labs(title = "Training Hours vs. Performance Score") +
theme_classic()
# Box plot
ggplot(data, aes(x = gender, y = satisfaction_score, fill = gender)) +
geom_boxplot() +
labs(title = "Job Satisfaction by Gender") +
theme_minimal()
# Save at publication quality
ggsave("performance_chart.png", dpi = 300, width = 8, height = 5)
Every chart is fully customisable — colours, fonts, labels, gridlines, themes. Once you learn the ggplot2 grammar, you can produce any chart type cleanly.
| SPSS | R | Python | |
|---|---|---|---|
| Cost | Paid | Free | Free |
| Learning curve | Low | Moderate | Moderate–High |
| Statistical range | High | Highest | High |
| Visualisation | Basic | Excellent | Very good |
| Advanced modelling (SEM, multilevel) | Limited | Excellent | Good |
| Reproducibility | Low | Excellent | Excellent |
| Used in top journals | Common | Very common | Growing |
| Nigerian university acceptance | Very high | Growing (esp. PhD) | Growing |
R is the most powerful statistical tool of the three. It's also the one most likely to impress a journal reviewer or international collaborator. The tradeoff is a steeper initial learning curve than SPSS.
If you're a 400-level student: R is probably not worth learning specifically for your final year project unless you're already comfortable with it. SPSS will meet your requirements. Come back to R when you're in a master's programme, planning to publish, or targeting institutions with stronger quantitative research cultures.
Will Nigerian examiners accept R output? At PhD level, and increasingly at master's level in departments with international research exposure — yes. But confirm with your supervisor before committing.
When presenting results:
- Reformat tables for APA style — don't paste raw R console output
- Export ggplot2 charts at 300 dpi using ggsave()
- Cite R and packages in your methodology: "Data analysis was conducted using R (version 4.3.2; R Core Team, 2023) and the tidyverse package (Wickham et al., 2019)."
- Include your full R script as an appendix if required
The examiner is evaluating your analysis and interpretation — not your code. Present results as cleanly as you would from any other tool.
You don't need to master R. You need enough fluency to handle your dissertation analysis confidently. Six weeks of focused practice gets you there.
Week 1: R and RStudio basics — objects, vectors, data frames, importing data. Start with R for Data Science by Hadley Wickham, free at r4ds.hadley.nz.
Week 2: Data cleaning with dplyr — filter, select, mutate, summarise, group_by
Week 3: Visualisation with ggplot2 — the grammar of graphics, the chart types you need
Week 4: Statistical analysis — descriptive stats with psych, correlation, T-test, ANOVA, regression
Week 5–6: Apply to your actual research dataset, format outputs, interpret results
The Swirl package (an interactive R tutorial that runs inside RStudio) is excellent for beginners who learn better by doing than reading.
If your dataset is complex, your deadline is close, or you simply want your analysis done with the rigour that stands up to examination and peer review — AOLYTIX Group provides professional R data analysis for Nigerian researchers.
We deliver data cleaning and preparation, descriptive and inferential statistical analysis, advanced modelling where needed, publication-quality ggplot2 visualisations, and interpreted results ready for your dissertation or manuscript.
Bringing us a dataset named Q1 through Q47 with no codebook? We've seen it before. We can work with it.
Talk to the AOLYTIX Research Desk →
AOLYTIX Research Desk is the publishing arm of AOLYTIX Group — a Nigerian academic research and consulting firm supporting postgraduate students, researchers, and organisations across Africa with data analysis, dissertation support, and research consulting.