Olive Defence
Mathematics

Statistics

📘 Statistics & Probability · Chapter MN24 🎯 NDA Level : High Priority

Statistics is the science of collecting, organising, and analysing numerical data. For NDA, the focus is on three areas: how data is presented (frequency tables, histograms, ogives), measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, variance, coefficient of variation). Questions are calculation-based — the student who knows the exact formula and applies it carefully will always score.

📌 What to expect in NDA (based on 2022–2024 papers):
(1) Direct calculation of mean for ungrouped data;
(2) Mean by assumed mean or step deviation method for grouped data;
(3) Median for grouped data using the interpolation formula;
(4) Mode for grouped data: modal class and formula;
(5) Empirical relation Mode = 3 Median − 2 Mean;
(6) Standard deviation and variance calculation;
(7) Coefficient of variation CV = (σ/̅x)×100;
(8) Reading histograms, identifying modal class, and ogive reading.

Topics at a Glance

① Data Representation
Frequency table, histogram, ogive, polygon
② Mean
Direct, assumed mean, step deviation
③ Median & Mode
Grouped/ungrouped; empirical relation
④ Dispersion
MD, variance σ², SD σ, CV
⑤ Correlation
Karl Pearson r, Spearman rank
⑥ Regression
Lines of regression, byx, bxy

1. Data Representation

1.1
Frequency Distribution, Histogram & Ogive
Understand each graph type — NDA asks you to read from histograms and ogives

Frequency Table Terms

  • Class interval: e.g. 10–20
  • Class width (h): upper−lower limit
  • Class mark (xi): (upper+lower)/2
  • Frequency (fi): count in that class
  • Cumulative freq: running total of f

Graph Types & Uses

  • Histogram: bars for frequency, no gaps. Width = class interval
  • Freq polygon: midpoints of bars joined
  • Ogive: cumulative curve; read median at N/2
  • Modal class: class with highest bar

Ogive & Quartiles

  • Less-than ogive: plot (upper limit, CF)
  • More-than ogive: plot (lower limit, reverse CF)
  • Intersection of two ogives = Median
  • Q1 at N/4, Q2 at N/2, Q3 at 3N/4
x f Histogram 10 20 30 40 50 Modal class Freq polygon x CF Less-than Ogive 10 20 30 40 N/2 Median
Fig 1: Left — Histogram with modal class (darkest bar, 10–20) and frequency polygon (red dashed). Right — Less-than Ogive; N/2 line meets curve and drops to x-axis giving the median.

2. Mean — Three Methods

2.1
Direct, Assumed Mean & Step Deviation Methods
Step deviation is fastest for grouped data with uniform class width — preferred in NDA
⚡ Mean — All Three Methods
Notation: x_i = class mark, f_i = frequency, N = total frequency METHOD 1 - DIRECT: Mean = Sum(f_i * x_i) / N METHOD 2 - ASSUMED MEAN (A = assumed mean, d_i = x_i - A): Mean = A + Sum(f_i * d_i) / N METHOD 3 - STEP DEVIATION (h = class width, u_i = (x_i - A)/h): Mean = A + h * [Sum(f_i * u_i) / N] [Fastest for uniform class width] UNGROUPED (n raw values): Mean = Sum(x_i) / n COMBINED MEAN (two groups n1, x-bar1 and n2, x-bar2): Combined Mean = (n1*x-bar1 + n2*x-bar2) / (n1 + n2) EFFECT OF TRANSFORMATIONS: All values + k : new mean = old mean + k All values x k : new mean = k * old mean
Choose A = the class mark closest to the estimated mean. Step deviation always gives the same answer as direct method but with much smaller numbers in the working.
Worked Example — Step Deviation Method

Classes: 10–20(f=3), 20–30(f=7), 30–40(f=5), 40–50(f=5). Find mean.

h=10, A=25 (class mark of 20–30). N=20.

u values: (15−25)/10=−1, (25−25)/10=0, (35−25)/10=1, (45−25)/10=2.

f×u: 3(−1)=−3, 7(0)=0, 5(1)=5, 5(2)=10. Sum f×u = 12.

Mean = 25 + 10×(12/20) = 25 + 6 = 31.

📝 TOPIC-WISE PYQ
Mean — NDA-Pattern Questions
Q1. The mean of 5 observations is 4.4 and variance is 8.24. If three of the observations are 1, 2 and 6, find the other two.
  • (a) 4 and 9    (b) 5 and 8    (c) 4 and 8    (d) 5 and 9
Answer: (a) 4 and 9
Sum=5×4.4=22. Known sum=1+2+6=9. Remaining sum=13. Pair with sum 13: (4,9) or (5,8).
Check variance: using 1,2,4,6,9: mean=22/5=4.4. Σx²=1+4+16+36+81=138. σ²=138/5−(4.4)²=27.6−19.36=8.24 ✓. Answer: (a) 4 and 9.
Q2. Class of 30 (mean=60) combined with class of 20 (mean=50). Combined mean:
  • (a) 55    (b) 56    (c) 58    (d) 54
Answer: (b) 56
(30×60+20×50)/50=(1800+1000)/50=2800/50=56.
Q3. For data 5,7,9,11,13 (mean=9), if each value is multiplied by 2, the new mean is:
  • (a) 9    (b) 11    (c) 18    (d) 4.5
Answer: (c) 18
Multiplying each value by k multiplies the mean by k. New mean=9×2=18.

3. Median & Mode

3.1
Median by Interpolation & Mode by Formula
Find the median/modal class first, then apply the formula
⚡ Median & Mode — Grouped Data
MEDIAN (grouped data): Median = L + [(N/2 - CF) / f] x h L = lower boundary of median class N = total frequency CF = cumulative frequency of class BEFORE median class f = frequency of median class h = class width Median class = first class where CF exceeds N/2 MEDIAN (ungrouped, sorted): n odd: middle value = ((n+1)/2)-th observation n even: average of (n/2)-th and (n/2+1)-th observations MODE (grouped data): Mode = L + [(f1 - f0) / (2f1 - f0 - f2)] x h L = lower boundary of modal class (highest frequency) f1 = frequency of modal class f0 = frequency of class before modal class f2 = frequency of class after modal class h = class width EMPIRICAL RELATION: Mode = 3 Median - 2 Mean Mean = (3 Median - Mode) / 2 Median = (2 Mean + Mode) / 3
The empirical relation is tested directly: given any two measures, find the third. Always write it as Mode = 3M(edian) - 2M(ean). The mnemonic: "3 median, 2 mean".
📌 Median Steps for Grouped Data (memorise this sequence):
(1) Find N = Σf.   (2) Compute N/2.   (3) Build CF column.   (4) Median class = first class where CF > N/2.   (5) Read L, CF (of previous class), f, h.   (6) Apply formula.
Worked Example — Median for Grouped Data

Classes: 0–10(f=5), 10–20(f=8), 20–30(f=12), 30–40(f=7), 40–50(f=3). N=35.

N/2=17.5. CF: 5, 13, 25, 32, 35. Median class = 20–30 (CF first exceeds 17.5).

L=20, CF=13, f=12, h=10. Median = 20+[(17.5−13)/12]×10 = 20+3.75 = 23.75.

📝 TOPIC-WISE PYQ
Median, Mode & Empirical Relation — NDA-Pattern Questions
Q4. Median of 3,5,7,9,11,13,15 is:
  • (a) 7    (b) 9    (c) 11    (d) 8
Answer: (b) 9
n=7 (odd). Median = 4th value = 9.
Q5. Mean=26, Mode=23. Find median.
  • (a) 24    (b) 25    (c) 26    (d) 27
Answer: (b) 25
23=3 Median−2(26). 3 Median=75. Median=25.
Q6. Mean=24.6, Median=26.1. Find mode.
  • (a) 29.1    (b) 28.5    (c) 30    (d) 27
Answer: (a) 29.1
Mode=3(26.1)−2(24.6)=78.3−49.2=29.1.
🔥 TRICKY QUESTIONS
Central Tendency — Classic NDA Traps
🤯 T1. Mean of n observations is x-bar. Effect of (a) adding k to all, (b) multiplying all by k.
(a) Adding k: new mean = x-bar + k. Mean shifts by same constant.
(b) Multiplying by k: new mean = k times x-bar. Mean scales by same factor.
For SD: adding k leaves SD unchanged; multiplying by k gives new SD = |k| times sigma.
Critical exam trap: adding a constant NEVER changes variance or SD. Only multiplying does.
🤯 T2. Mean of 20 values is 45. One value was misread as 34 instead of 43. Correct mean?
Incorrect sum=20×45=900. Correct sum=900−34+43=909.
Correct mean=909/20=45.45.
Method: compute sum from mean, adjust for the error, recompute.

4. Measures of Dispersion

4.1
Variance, Standard Deviation & Coefficient of Variation
SD = sqrt(Variance); CV = (SD/Mean) x 100 — most tested dispersion formulas in NDA
⚡ Dispersion — Complete Formula Set
MEAN DEVIATION about Mean: MD = Sum|x_i - mean| / n (ungrouped) MD = Sum f_i|x_i - mean| / N (grouped) VARIANCE (population): sigma^2 = Sum(x_i - mean)^2 / n (ungrouped) SHORTCUT: sigma^2 = (Sum x_i^2)/n - mean^2 ["mean of squares minus square of mean"] Grouped: sigma^2 = Sum f_i(x_i - mean)^2 / N = (Sum f_i x_i^2)/N - mean^2 STANDARD DEVIATION: sigma = sqrt(sigma^2) (always non-negative) STEP DEVIATION VARIANCE (class width h): u_i = (x_i - A)/h sigma^2 = h^2 * [(Sum f_i u_i^2)/N - ((Sum f_i u_i)/N)^2] COEFFICIENT OF VARIATION: CV = (sigma / mean) x 100 (as percentage) Lower CV = more homogeneous/consistent data KEY PROPERTIES: Adding constant k to all values: sigma unchanged, sigma^2 unchanged Multiplying all by k: new sigma = |k|*sigma, new sigma^2 = k^2*sigma^2 All values equal: sigma = 0 First n natural numbers: sigma = sqrt((n^2-1)/12)
Shortcut formula sigma^2 = (Sum x^2)/n - mean^2 is almost always faster than the definition. Memorise it as "mean of squares minus square of mean". For grouped data, replace Sum x^2 with Sum f*x^2.
Worked Example — Variance & SD

Data: 2, 4, 6, 8, 10. Find variance and SD.

Mean = 30/5 = 6.   Sum x² = 4+16+36+64+100 = 220.

Variance = 220/5 − 36 = 44 − 36 = 8.   SD = √8 = 2√2 ≈ 2.83.

Worked Example — Coefficient of Variation

Factory A: mean=5000, SD=500. Factory B: mean=4000, SD=360. More consistent?

CV(A)=(500/5000)×100=10%.   CV(B)=(360/4000)×100=9%.

CV(B) < CV(A) → Factory B is more consistent.

📝 TOPIC-WISE PYQ
Dispersion — NDA-Pattern Questions
Q7. If each observation is divided by 5, the SD of the new set is:
  • (a) 5σ    (b) σ/5    (c) σ    (d) σ−5
Answer: (b) σ/5
Dividing by 5 = multiplying by 1/5. New SD = σ/5.
Q8. Variance of 1,2,3,4,5 is:
  • (a) 1    (b) 2    (c) √2    (d) 4
Answer: (b) 2
Mean=3. Sum x²=55. Variance=55/5−9=11−9=2.
Q9. σ²=16, mean=8. Coefficient of variation is:
  • (a) 50%    (b) 25%    (c) 2%    (d) 12.5%
Answer: (a) 50%
σ=4. CV=(4/8)×100=50%.
🔥 TRICKY QUESTIONS
Dispersion — Classic NDA Traps
🤯 T3. Two sets of 50 obs each: same mean 16, SD 4 and 6. Find variance of combined 100 obs.
For each set: Sum x^2 = n(sigma^2 + mean^2).
Set 1: 50(16+256)=50*272=13600. Set 2: 50(36+256)=50*292=14600.
Combined Sum x^2 = 28200. Combined variance = 28200/100 - 256 = 282-256 = 26.
🤯 T4. SD of first 11 natural numbers using formula SD = sqrt((n^2-1)/12).
SD = sqrt((121-1)/12) = sqrt(120/12) = sqrt(10) = 3.162...
Verify: mean=6, Sum x^2=506, variance=506/11-36=46-36=10, SD=sqrt(10) ✓.

5. Correlation & Regression (Basics)

5.1
Karl Pearson’s r & Lines of Regression
NDA tests range, sign and interpretation of r, and the relation r² = byx × bxy
⚡ Correlation & Regression
KARL PEARSON CORRELATION COEFFICIENT (r): r = Sum[(x_i - x-bar)(y_i - y-bar)] / sqrt[Sum(x_i-x-bar)^2 * Sum(y_i-y-bar)^2] Shortcut: r = [n*Sum(xy) - Sum(x)*Sum(y)] / sqrt{[n*Sum(x^2) - (Sum x)^2] * [n*Sum(y^2) - (Sum y)^2]} RANGE: -1 <= r <= 1 r = +1: perfect positive; r = -1: perfect negative; r = 0: no linear correlation SPEARMAN RANK CORRELATION: r_s = 1 - [6 * Sum(d^2)] / [n(n^2 - 1)] d = difference between ranks of corresponding pairs REGRESSION LINES: y on x: (y - y-bar) = b_yx (x - x-bar) b_yx = r * (sigma_y / sigma_x) x on y: (x - x-bar) = b_xy (y - y-bar) b_xy = r * (sigma_x / sigma_y) RELATION BETWEEN REGRESSION COEFFICIENTS: r^2 = b_yx * b_xy => r = sqrt(b_yx * b_xy) [with sign of b values] Both regression lines pass through the point (x-bar, y-bar).
r² = b_yx × b_xy is the most tested fact. b_yx can exceed 1 in magnitude; only r must stay in [-1,1]. Both regression lines always intersect at the point of means (x-bar, y-bar).

Interpreting r

  • 0.9 to 1.0: very high positive
  • 0.7 to 0.9: high positive
  • 0.4 to 0.7: moderate positive
  • 0 to 0.4: weak positive
  • Negative r: inverse relationship
  • r = 0: no linear correlation

Key Regression Facts

  • Both lines pass through (x-bar, y-bar)
  • r² = byx × bxy
  • Both b coefficients have same sign
  • byx may be > 1; r must be ≤ 1
  • Angle between lines → 0 as |r| → 1
📝 TOPIC-WISE PYQ
Correlation & Regression — NDA-Pattern Questions
Q10. byx=0.8, bxy=0.2. Find r.
  • (a) 0.4    (b) 0.16    (c) 0.04    (d) 1.0
Answer: (a) 0.4
r=√(0.8×0.2)=√0.16=0.4.
Q11. r=0.7. Coefficient of determination is:
  • (a) 0.49    (b) 0.7    (c) 0.3    (d) 0.51
Answer: (a) 0.49
Coefficient of determination = r² = 0.7² = 0.49.
🔥 TRICKY QUESTIONS
Correlation & Regression — Conceptual Traps
🤯 T5. Two regression coefficients are 1.6 and 0.4. Is this valid? Find r.
r^2 = 1.6*0.4 = 0.64. r = +0.8 (both positive). Valid since |r|=0.8 <= 1.
b_yx = 1.6 > 1 is perfectly allowed. Only r must be in [-1,1].
r = 0.8. Strong positive correlation.
Common trap: students assume both regression coefficients must be <= 1. Wrong.
🤯 T6. Spearman rank: 5 students, d values 1,-2,3,-1,-1. Find r_s.
Sum d^2 = 1+4+9+1+1 = 16. n=5.
r_s = 1 - 6*16/(5*24) = 1 - 96/120 = 1 - 0.8 = 0.2. Weak positive.

📝 Master Formula Sheet — MN24 Statistics

All critical formulae for rapid pre-exam revision.

● Mean (Grouped)
  • Direct: mean = Sum(fx)/N
  • Assumed mean: A + Sum(fd)/N; d = x-A
  • Step deviation: A + h*Sum(fu)/N; u=(x-A)/h
  • Combined: (n1*m1+n2*m2)/(n1+n2)
  • Adding k: new mean = mean+k; multiplying: k*mean
▲ Median & Mode
  • Median = L+[(N/2-CF)/f]*h
  • Mode = L+[(f1-f0)/(2f1-f0-f2)]*h
  • Mode = 3 Median - 2 Mean
  • Ungrouped odd n: middle value
  • Ungrouped even n: avg of two middles
σ Variance & SD
  • Var = Sum(x-mean)^2/n = Sum(x^2)/n - mean^2
  • Grouped: Sum f(x-mean)^2/N
  • SD = sqrt(variance); always non-negative
  • Add k: SD unchanged; multiply k: SD = k*SD
  • First n naturals: SD=sqrt((n^2-1)/12)
📈 CV & Mean Deviation
  • CV = (sigma/mean) x 100 %
  • Less CV = more consistent data
  • MD(mean) = Sum|x-mean|/n (ungrouped)
  • MD(mean) = Sum f|x-mean|/N (grouped)
∞ Correlation
  • r = [n*Sum(xy) - Sum(x)*Sum(y)] / sqrt[...]
  • -1 <= r <= 1 always
  • r^2 = b_yx * b_xy
  • Spearman: r_s = 1 - 6*Sum(d^2)/[n(n^2-1)]
📊 Regression
  • y on x: y-y-bar = b_yx(x-x-bar)
  • x on y: x-x-bar = b_xy(y-y-bar)
  • Both lines pass through (x-bar, y-bar)
  • r = sqrt(b_yx * b_xy) with same sign

⚡ Quick Revision Booster — MN24 Statistics

● Mean Quick
  • Step deviation: fastest for grouped
  • Choose A = middle class mark
  • Mean = A + h*(Sum fu/N)
  • Combined: weighted average
  • Adding k shifts mean by k
▲ Median Steps
  • Find N, compute N/2
  • Build cumulative frequency (CF)
  • Median class: CF first exceeds N/2
  • Apply L+[(N/2-CF)/f]*h
  • Ogive: read off at N/2 on y-axis
▲ Mode & Empirical
  • Modal class = highest frequency bar
  • Mode=L+[(f1-f0)/(2f1-f0-f2)]*h
  • Mode = 3 Median - 2 Mean
  • Given any two, find the third
  • Normal distribution: Mean=Median=Mode
σ Variance
  • Var = Sum(x^2)/n - mean^2 (shortcut)
  • SD = sqrt(variance)
  • Add constant: SD unchanged
  • Multiply by k: SD becomes k*SD
  • CV = sigma/mean x 100; less = better
∞ Correlation
  • -1 <= r <= 1 always
  • r^2 = b_yx * b_xy (product rule)
  • Both regression lines through (mean_x, mean_y)
  • Spearman: 1 - 6*Sum(d^2)/[n(n^2-1)]
  • b_yx can exceed 1; r cannot
🚨 Critical Traps
  • CF in median formula = PREVIOUS class CF
  • Mode = 3Median - 2Mean (not reversed)
  • Adding constant does NOT change SD/variance
  • CV compares datasets; less CV = more consistent
  • r^2 not r equals product of regression coefficients
  • Median class: CF first EXCEEDS N/2 (not equals)
This material is for personal NDA exam preparation only.
Unauthorised reproduction or distribution is prohibited.
All rights reserved.  ·  ODEA.Classes@gmail.com