📘 Statistics & Probability · Chapter MN24🎯 NDA Level : High Priority
Statistics is the science of collecting, organising, and analysing numerical data. For NDA, the focus is on three areas: how data is presented (frequency tables, histograms, ogives), measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, variance, coefficient of variation). Questions are calculation-based — the student who knows the exact formula and applies it carefully will always score.
📌 What to expect in NDA (based on 2022–2024 papers): (1) Direct calculation of mean for ungrouped data; (2) Mean by assumed mean or step deviation method for grouped data; (3) Median for grouped data using the interpolation formula; (4) Mode for grouped data: modal class and formula; (5) Empirical relation Mode = 3 Median − 2 Mean; (6) Standard deviation and variance calculation; (7) Coefficient of variation CV = (σ/̅x)×100; (8) Reading histograms, identifying modal class, and ogive reading.
Topics at a Glance
① Data Representation
Frequency table, histogram, ogive, polygon
② Mean
Direct, assumed mean, step deviation
③ Median & Mode
Grouped/ungrouped; empirical relation
④ Dispersion
MD, variance σ², SD σ, CV
⑤ Correlation
Karl Pearson r, Spearman rank
⑥ Regression
Lines of regression, byx, bxy
1. Data Representation
1.1
Frequency Distribution, Histogram & Ogive
Understand each graph type — NDA asks you to read from histograms and ogives
Frequency Table Terms
Class interval: e.g. 10–20
Class width (h): upper−lower limit
Class mark (xi): (upper+lower)/2
Frequency (fi): count in that class
Cumulative freq: running total of f
Graph Types & Uses
Histogram: bars for frequency, no gaps. Width = class interval
Freq polygon: midpoints of bars joined
Ogive: cumulative curve; read median at N/2
Modal class: class with highest bar
Ogive & Quartiles
Less-than ogive: plot (upper limit, CF)
More-than ogive: plot (lower limit, reverse CF)
Intersection of two ogives = Median
Q1 at N/4, Q2 at N/2, Q3 at 3N/4
Fig 1: Left — Histogram with modal class (darkest bar, 10–20) and frequency polygon (red dashed). Right — Less-than Ogive; N/2 line meets curve and drops to x-axis giving the median.
2. Mean — Three Methods
2.1
Direct, Assumed Mean & Step Deviation Methods
Step deviation is fastest for grouped data with uniform class width — preferred in NDA
⚡ Mean — All Three Methods
Notation: x_i = class mark, f_i = frequency, N = total frequency
METHOD 1 - DIRECT:
Mean = Sum(f_i * x_i) / N
METHOD 2 - ASSUMED MEAN (A = assumed mean, d_i = x_i - A):
Mean = A + Sum(f_i * d_i) / N
METHOD 3 - STEP DEVIATION (h = class width, u_i = (x_i - A)/h):
Mean = A + h * [Sum(f_i * u_i) / N] [Fastest for uniform class width]
UNGROUPED (n raw values):
Mean = Sum(x_i) / n
COMBINED MEAN (two groups n1, x-bar1 and n2, x-bar2):
Combined Mean = (n1*x-bar1 + n2*x-bar2) / (n1 + n2)
EFFECT OF TRANSFORMATIONS:
All values + k : new mean = old mean + k
All values x k : new mean = k * old mean
Choose A = the class mark closest to the estimated mean. Step deviation always gives the same answer as direct method but with much smaller numbers in the working.
u values: (15−25)/10=−1, (25−25)/10=0, (35−25)/10=1, (45−25)/10=2.
f×u: 3(−1)=−3, 7(0)=0, 5(1)=5, 5(2)=10. Sum f×u = 12.
Mean = 25 + 10×(12/20) = 25 + 6 = 31.
📝 TOPIC-WISE PYQ
Mean — NDA-Pattern Questions
Q1. The mean of 5 observations is 4.4 and variance is 8.24. If three of the observations are 1, 2 and 6, find the other two.
(a) 4 and 9 (b) 5 and 8 (c) 4 and 8 (d) 5 and 9
Answer: (a) 4 and 9
Sum=5×4.4=22. Known sum=1+2+6=9. Remaining sum=13. Pair with sum 13: (4,9) or (5,8).
Check variance: using 1,2,4,6,9: mean=22/5=4.4. Σx²=1+4+16+36+81=138. σ²=138/5−(4.4)²=27.6−19.36=8.24 ✓. Answer: (a) 4 and 9.
Q2. Class of 30 (mean=60) combined with class of 20 (mean=50). Combined mean:
Q3. For data 5,7,9,11,13 (mean=9), if each value is multiplied by 2, the new mean is:
(a) 9 (b) 11 (c) 18 (d) 4.5
Answer: (c) 18
Multiplying each value by k multiplies the mean by k. New mean=9×2=18.
3. Median & Mode
3.1
Median by Interpolation & Mode by Formula
Find the median/modal class first, then apply the formula
⚡ Median & Mode — Grouped Data
MEDIAN (grouped data):
Median = L + [(N/2 - CF) / f] x h
L = lower boundary of median class
N = total frequency
CF = cumulative frequency of class BEFORE median class
f = frequency of median class
h = class width
Median class = first class where CF exceeds N/2
MEDIAN (ungrouped, sorted):
n odd: middle value = ((n+1)/2)-th observation
n even: average of (n/2)-th and (n/2+1)-th observations
MODE (grouped data):
Mode = L + [(f1 - f0) / (2f1 - f0 - f2)] x h
L = lower boundary of modal class (highest frequency)
f1 = frequency of modal class
f0 = frequency of class before modal class
f2 = frequency of class after modal class
h = class width
EMPIRICAL RELATION:
Mode = 3 Median - 2 Mean
Mean = (3 Median - Mode) / 2
Median = (2 Mean + Mode) / 3
The empirical relation is tested directly: given any two measures, find the third. Always write it as Mode = 3M(edian) - 2M(ean). The mnemonic: "3 median, 2 mean".
📌 Median Steps for Grouped Data (memorise this sequence):
(1) Find N = Σf. (2) Compute N/2. (3) Build CF column. (4) Median class = first class where CF > N/2. (5) Read L, CF (of previous class), f, h. (6) Apply formula.
🤯 T1. Mean of n observations is x-bar. Effect of (a) adding k to all, (b) multiplying all by k.
(a) Adding k: new mean = x-bar + k. Mean shifts by same constant.
(b) Multiplying by k: new mean = k times x-bar. Mean scales by same factor.
For SD: adding k leaves SD unchanged; multiplying by k gives new SD = |k| times sigma. Critical exam trap: adding a constant NEVER changes variance or SD. Only multiplying does.
🤯 T2. Mean of 20 values is 45. One value was misread as 34 instead of 43. Correct mean?
Incorrect sum=20×45=900. Correct sum=900−34+43=909.
Correct mean=909/20=45.45. Method: compute sum from mean, adjust for the error, recompute.
4. Measures of Dispersion
4.1
Variance, Standard Deviation & Coefficient of Variation
SD = sqrt(Variance); CV = (SD/Mean) x 100 — most tested dispersion formulas in NDA
⚡ Dispersion — Complete Formula Set
MEAN DEVIATION about Mean:
MD = Sum|x_i - mean| / n (ungrouped)
MD = Sum f_i|x_i - mean| / N (grouped)
VARIANCE (population):
sigma^2 = Sum(x_i - mean)^2 / n (ungrouped)
SHORTCUT: sigma^2 = (Sum x_i^2)/n - mean^2 ["mean of squares minus square of mean"]
Grouped: sigma^2 = Sum f_i(x_i - mean)^2 / N = (Sum f_i x_i^2)/N - mean^2
STANDARD DEVIATION:
sigma = sqrt(sigma^2) (always non-negative)
STEP DEVIATION VARIANCE (class width h):
u_i = (x_i - A)/h
sigma^2 = h^2 * [(Sum f_i u_i^2)/N - ((Sum f_i u_i)/N)^2]
COEFFICIENT OF VARIATION:
CV = (sigma / mean) x 100 (as percentage)
Lower CV = more homogeneous/consistent data
KEY PROPERTIES:
Adding constant k to all values: sigma unchanged, sigma^2 unchanged
Multiplying all by k: new sigma = |k|*sigma, new sigma^2 = k^2*sigma^2
All values equal: sigma = 0
First n natural numbers: sigma = sqrt((n^2-1)/12)
Shortcut formula sigma^2 = (Sum x^2)/n - mean^2 is almost always faster than the definition. Memorise it as "mean of squares minus square of mean". For grouped data, replace Sum x^2 with Sum f*x^2.
Q7. If each observation is divided by 5, the SD of the new set is:
(a) 5σ (b) σ/5 (c) σ (d) σ−5
Answer: (b) σ/5
Dividing by 5 = multiplying by 1/5. New SD = σ/5.
Q8. Variance of 1,2,3,4,5 is:
(a) 1 (b) 2 (c) √2 (d) 4
Answer: (b) 2
Mean=3. Sum x²=55. Variance=55/5−9=11−9=2.
Q9. σ²=16, mean=8. Coefficient of variation is:
(a) 50% (b) 25% (c) 2% (d) 12.5%
Answer: (a) 50%
σ=4. CV=(4/8)×100=50%.
🔥 TRICKY QUESTIONS
Dispersion — Classic NDA Traps
🤯 T3. Two sets of 50 obs each: same mean 16, SD 4 and 6. Find variance of combined 100 obs.
For each set: Sum x^2 = n(sigma^2 + mean^2).
Set 1: 50(16+256)=50*272=13600. Set 2: 50(36+256)=50*292=14600.
Combined Sum x^2 = 28200. Combined variance = 28200/100 - 256 = 282-256 = 26.
🤯 T4. SD of first 11 natural numbers using formula SD = sqrt((n^2-1)/12).
NDA tests range, sign and interpretation of r, and the relation r² = byx × bxy
⚡ Correlation & Regression
KARL PEARSON CORRELATION COEFFICIENT (r):
r = Sum[(x_i - x-bar)(y_i - y-bar)] / sqrt[Sum(x_i-x-bar)^2 * Sum(y_i-y-bar)^2]
Shortcut: r = [n*Sum(xy) - Sum(x)*Sum(y)] /
sqrt{[n*Sum(x^2) - (Sum x)^2] * [n*Sum(y^2) - (Sum y)^2]}
RANGE: -1 <= r <= 1
r = +1: perfect positive; r = -1: perfect negative; r = 0: no linear correlation
SPEARMAN RANK CORRELATION:
r_s = 1 - [6 * Sum(d^2)] / [n(n^2 - 1)]
d = difference between ranks of corresponding pairs
REGRESSION LINES:
y on x: (y - y-bar) = b_yx (x - x-bar) b_yx = r * (sigma_y / sigma_x)
x on y: (x - x-bar) = b_xy (y - y-bar) b_xy = r * (sigma_x / sigma_y)
RELATION BETWEEN REGRESSION COEFFICIENTS:
r^2 = b_yx * b_xy => r = sqrt(b_yx * b_xy) [with sign of b values]
Both regression lines pass through the point (x-bar, y-bar).
r² = b_yx × b_xy is the most tested fact. b_yx can exceed 1 in magnitude; only r must stay in [-1,1]. Both regression lines always intersect at the point of means (x-bar, y-bar).
🤯 T5. Two regression coefficients are 1.6 and 0.4. Is this valid? Find r.
r^2 = 1.6*0.4 = 0.64. r = +0.8 (both positive). Valid since |r|=0.8 <= 1.
b_yx = 1.6 > 1 is perfectly allowed. Only r must be in [-1,1]. r = 0.8. Strong positive correlation. Common trap: students assume both regression coefficients must be <= 1. Wrong.
This material is for personal NDA exam preparation only.
Unauthorised reproduction or distribution is prohibited.
All rights reserved. · ODEA.Classes@gmail.com