← Back to Blog

How to Use R for Research Data Analysis: The Complete Guide for Nigerian Postgraduate Students

By AOLYTIX Research Desk 16 min read · Data Analysis · R for Research


A PhD candidate came to us with a dataset — 1,200 responses, 47 variables, all named Q1 through Q47. No labels, no value descriptions, no codebook. Just numbers in a spreadsheet with column headers that meant nothing without the original questionnaire.

We spent the first hour just renaming variables and building a data dictionary. Not glamorous work. But it's the kind of thing that separates analysis that holds up under examination from analysis that falls apart the moment someone asks "what does Q23 measure?"

We did that project in R.

Not because R is always the answer. But because for that dataset — large, complex, headed for journal submission — R was the right tool. The analysis was clean, reproducible, and the visualisations were good enough to go straight into the manuscript.

This guide explains what R is, what it can do for your research, and how to get started — even if you've never programmed before.


What Is R and Why Should You Care?

R is a free, open-source programming language built specifically for statistical computing and data visualisation. It's been the standard tool in epidemiology, ecology, psychology, economics, and public health research globally for over two decades. If you want to publish in a serious international journal, there's a reasonable chance the reviewer on the other side of your submission uses R.

But beyond prestige — here's the practical case:

It's free. No licence fee, no institutional restriction, no subscription that expires the day before your submission.

It has more statistical capability than any other single tool. Everything SPSS does, R does. Plus advanced techniques that SPSS doesn't support at all: structural equation modelling, multilevel modelling, survival analysis, Bayesian analysis, spatial statistics. As your research ambitions grow, R grows with you.

Its visualisations are in a different class. The ggplot2 package produces charts that look like they belong in The Lancet or Nature. SPSS charts look like they belong in 2004.

It's fully reproducible. Your entire analysis lives in a script. Anyone — your supervisor, a journal reviewer, a co-author — can run it and get the same results. That's increasingly expected in serious academic work.


Setting Up: R and RStudio

You need two things. Both are free.

Step 1: Install R from cran.r-project.org. Choose the version for your operating system.

Step 2: Install RStudio from posit.co/download/rstudio-desktop. The free Desktop version is everything you need.

Always open RStudio — not R itself. RStudio gives you a proper workspace: a script editor, a console, an environment panel showing your loaded data, and a plot viewer. Raw R is just a blank command line.

When you open RStudio, you'll see four panels. The most important habit to build immediately: write your code in the Script Editor (top-left), not in the Console (bottom-left). The Console runs code. The Script Editor saves it. If you do your analysis in the Console and then close RStudio, it's gone.


R Packages: Where the Power Lives

Base R handles a lot. But R's real capability comes from packages — free, community-built add-ons that extend what R can do. Install them once, load them at the start of each session.

# Install a package (only once)
install.packages("tidyverse")

# Load it for use (every session)
library(tidyverse)

The packages every researcher needs:

Package What it does
tidyverse A collection covering data cleaning, manipulation, and visualisation — install this one and you get ggplot2, dplyr, readr, and more
ggplot2 Publication-quality visualisation
dplyr Data manipulation: filter, summarise, group, join
psych Descriptive stats, reliability analysis (Cronbach's alpha), psychometrics
car Regression diagnostics, ANOVA
lme4 Multilevel / mixed effects models
lavaan Structural Equation Modelling
corrplot Visualising correlation matrices
readxl Import Excel files directly

For most dissertation-level work, tidyverse and psych combined with base R will cover everything you need.


Core Research Tasks in R

Loading Your Data

library(readxl)
library(readr)

# From Excel
data <- read_excel("survey_data.xlsx")

# From CSV
data <- read_csv("survey_data.csv")

# Preview
head(data)

# Check structure and variable types
str(data)

Cleaning and Preparing Your Data

library(dplyr)

# Remove rows with missing values
data_clean <- data %>% na.omit()

# Filter to a specific subset
females <- data %>% filter(gender == "Female")

# Rename variables
data <- data %>% rename(
  job_satisfaction = Q1,
  motivation = Q2,
  performance = Q3
)

# Recode a categorical variable
data$education <- recode(data$education,
  "1" = "Primary",
  "2" = "Secondary",
  "3" = "Tertiary"
)

Descriptive Statistics

library(psych)

# Summary stats for numerical variables
describe(data[, c("age", "years_experience", "satisfaction_score")])

# Frequency table
table(data$gender)

# Mean by group
data %>%
  group_by(department) %>%
  summarise(mean_satisfaction = mean(job_satisfaction, na.rm = TRUE))

Reliability Analysis (Cronbach's Alpha)

If you used a multi-item Likert scale, run this before any other analysis. It validates that your scale is internally consistent — and examiners will ask for it if you don't provide it.

library(psych)

scale_items <- data[, c("Q1", "Q2", "Q3", "Q4", "Q5")]
alpha(scale_items)

The key number to report is raw_alpha. Anything above 0.70 is acceptable for social science research. Below that, you need to either revise your scale or explicitly justify its use.

Correlation Analysis

# Pearson correlation between two variables
cor.test(data$training_hours, data$performance_score, method = "pearson")

# Correlation matrix for multiple variables
cor_matrix <- cor(data[, c("motivation", "satisfaction", "performance")], use = "complete.obs")

# Visualise
library(corrplot)
corrplot(cor_matrix, method = "color", addCoef.col = "black", tl.cex = 0.8)

T-Test

# Independent samples T-test comparing two groups
t.test(performance_score ~ gender, data = data)

Report the t-statistic, degrees of freedom, p-value, and the means of both groups. That's what goes in your dissertation table.

One-Way ANOVA

# ANOVA across three or more groups
model <- aov(satisfaction_score ~ department, data = data)
summary(model)

# Post-hoc test to identify which specific groups differ
TukeyHSD(model)

Multiple Regression

# Build the model
model <- lm(performance_score ~ training_hours + experience + motivation, data = data)

# Full output
summary(model)

You'll get R-squared, adjusted R-squared, beta coefficients, standard errors, t-values, and p-values for each predictor. All of this belongs in your results chapter.


Data Visualisation with ggplot2

This is the section that makes R users never go back to SPSS charts.

library(ggplot2)

# Bar chart
ggplot(data, aes(x = department, fill = gender)) +
  geom_bar(position = "dodge") +
  labs(title = "Staff Distribution by Department and Gender",
       x = "Department", y = "Count") +
  theme_minimal()

# Scatter plot with regression line
ggplot(data, aes(x = training_hours, y = performance_score)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", colour = "steelblue", se = TRUE) +
  labs(title = "Training Hours vs. Performance Score") +
  theme_classic()

# Box plot
ggplot(data, aes(x = gender, y = satisfaction_score, fill = gender)) +
  geom_boxplot() +
  labs(title = "Job Satisfaction by Gender") +
  theme_minimal()

# Save at publication quality
ggsave("performance_chart.png", dpi = 300, width = 8, height = 5)

Every chart is fully customisable — colours, fonts, labels, gridlines, themes. Once you learn the ggplot2 grammar, you can produce any chart type cleanly.


R vs. SPSS vs. Python: The Honest Picture

SPSS R Python
Cost Paid Free Free
Learning curve Low Moderate Moderate–High
Statistical range High Highest High
Visualisation Basic Excellent Very good
Advanced modelling (SEM, multilevel) Limited Excellent Good
Reproducibility Low Excellent Excellent
Used in top journals Common Very common Growing
Nigerian university acceptance Very high Growing (esp. PhD) Growing

R is the most powerful statistical tool of the three. It's also the one most likely to impress a journal reviewer or international collaborator. The tradeoff is a steeper initial learning curve than SPSS.

If you're a 400-level student: R is probably not worth learning specifically for your final year project unless you're already comfortable with it. SPSS will meet your requirements. Come back to R when you're in a master's programme, planning to publish, or targeting institutions with stronger quantitative research cultures.


Presenting R Results in Your Dissertation

Will Nigerian examiners accept R output? At PhD level, and increasingly at master's level in departments with international research exposure — yes. But confirm with your supervisor before committing.

When presenting results: - Reformat tables for APA style — don't paste raw R console output - Export ggplot2 charts at 300 dpi using ggsave() - Cite R and packages in your methodology: "Data analysis was conducted using R (version 4.3.2; R Core Team, 2023) and the tidyverse package (Wickham et al., 2019)." - Include your full R script as an appendix if required

The examiner is evaluating your analysis and interpretation — not your code. Present results as cleanly as you would from any other tool.


A Focused Six-Week Learning Path

You don't need to master R. You need enough fluency to handle your dissertation analysis confidently. Six weeks of focused practice gets you there.

Week 1: R and RStudio basics — objects, vectors, data frames, importing data. Start with R for Data Science by Hadley Wickham, free at r4ds.hadley.nz.

Week 2: Data cleaning with dplyr — filter, select, mutate, summarise, group_by

Week 3: Visualisation with ggplot2 — the grammar of graphics, the chart types you need

Week 4: Statistical analysis — descriptive stats with psych, correlation, T-test, ANOVA, regression

Week 5–6: Apply to your actual research dataset, format outputs, interpret results

The Swirl package (an interactive R tutorial that runs inside RStudio) is excellent for beginners who learn better by doing than reading.


Professional R Analysis From AOLYTIX Group

If your dataset is complex, your deadline is close, or you simply want your analysis done with the rigour that stands up to examination and peer review — AOLYTIX Group provides professional R data analysis for Nigerian researchers.

We deliver data cleaning and preparation, descriptive and inferential statistical analysis, advanced modelling where needed, publication-quality ggplot2 visualisations, and interpreted results ready for your dissertation or manuscript.

Bringing us a dataset named Q1 through Q47 with no codebook? We've seen it before. We can work with it.

Talk to the AOLYTIX Research Desk →


AOLYTIX Research Desk is the publishing arm of AOLYTIX Group — a Nigerian academic research and consulting firm supporting postgraduate students, researchers, and organisations across Africa with data analysis, dissertation support, and research consulting.


Talk to AOLYTIX Research Desk →