Olive Defence

Mathematics

Statistics

📘 Statistics & Probability · Chapter MN24 🎯 NDA Level : High Priority

Statistics is the science of collecting, organising, and analysing numerical data. For NDA, the focus is on three areas: how data is presented (frequency tables, histograms, ogives), measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, variance, coefficient of variation). Questions are calculation-based — the student who knows the exact formula and applies it carefully will always score.

📌 What to expect in NDA (based on 2022–2024 papers):
(1) Direct calculation of mean for ungrouped data;
(2) Mean by assumed mean or step deviation method for grouped data;
(3) Median for grouped data using the interpolation formula;
(4) Mode for grouped data: modal class and formula;
(5) Empirical relation Mode = 3 Median − 2 Mean;
(6) Standard deviation and variance calculation;
(7) Coefficient of variation CV = (σ/̅x)×100;
(8) Reading histograms, identifying modal class, and ogive reading.

Topics at a Glance

① Data Representation

Frequency table, histogram, ogive, polygon

② Mean

Direct, assumed mean, step deviation

③ Median & Mode

Grouped/ungrouped; empirical relation

④ Dispersion

MD, variance σ², SD σ, CV

⑤ Correlation

Karl Pearson r, Spearman rank

⑥ Regression

Lines of regression, b_yx, b_xy

1. Data Representation

1.1

Frequency Distribution, Histogram & Ogive

Understand each graph type — NDA asks you to read from histograms and ogives

Frequency Table Terms

Class interval: e.g. 10–20
Class width (h): upper−lower limit
Class mark (x_i): (upper+lower)/2
Frequency (f_i): count in that class
Cumulative freq: running total of f

Graph Types & Uses

Histogram: bars for frequency, no gaps. Width = class interval
Freq polygon: midpoints of bars joined
Ogive: cumulative curve; read median at N/2
Modal class: class with highest bar

Ogive & Quartiles

Less-than ogive: plot (upper limit, CF)
More-than ogive: plot (lower limit, reverse CF)
Intersection of two ogives = Median
Q₁ at N/4, Q₂ at N/2, Q₃ at 3N/4

Fig 1: Left — Histogram with modal class (darkest bar, 10–20) and frequency polygon (red dashed). Right — Less-than Ogive; N/2 line meets curve and drops to x-axis giving the median.

2. Mean — Three Methods

2.1

Direct, Assumed Mean & Step Deviation Methods

Step deviation is fastest for grouped data with uniform class width — preferred in NDA

⚡ Mean — All Three Methods

Notation: x_i = class mark, f_i = frequency, N = total frequency METHOD 1 - DIRECT: Mean = Sum(f_i * x_i) / N METHOD 2 - ASSUMED MEAN (A = assumed mean, d_i = x_i - A): Mean = A + Sum(f_i * d_i) / N METHOD 3 - STEP DEVIATION (h = class width, u_i = (x_i - A)/h): Mean = A + h * [Sum(f_i * u_i) / N] [Fastest for uniform class width] UNGROUPED (n raw values): Mean = Sum(x_i) / n COMBINED MEAN (two groups n1, x-bar1 and n2, x-bar2): Combined Mean = (n1*x-bar1 + n2*x-bar2) / (n1 + n2) EFFECT OF TRANSFORMATIONS: All values + k : new mean = old mean + k All values x k : new mean = k * old mean

Choose A = the class mark closest to the estimated mean. Step deviation always gives the same answer as direct method but with much smaller numbers in the working.

Worked Example — Step Deviation Method

Classes: 10–20(f=3), 20–30(f=7), 30–40(f=5), 40–50(f=5). Find mean.

h=10, A=25 (class mark of 20–30). N=20.

u values: (15−25)/10=−1, (25−25)/10=0, (35−25)/10=1, (45−25)/10=2.

f×u: 3(−1)=−3, 7(0)=0, 5(1)=5, 5(2)=10. Sum f×u = 12.

Mean = 25 + 10×(12/20) = 25 + 6 = 31.

📝 TOPIC-WISE PYQ

Mean — NDA-Pattern Questions

Q1. The mean of 5 observations is 4.4 and variance is 8.24. If three of the observations are 1, 2 and 6, find the other two.

(a) 4 and 9 (b) 5 and 8 (c) 4 and 8 (d) 5 and 9

Answer: (a) 4 and 9
Sum=5×4.4=22. Known sum=1+2+6=9. Remaining sum=13. Pair with sum 13: (4,9) or (5,8).
Check variance: using 1,2,4,6,9: mean=22/5=4.4. Σx²=1+4+16+36+81=138. σ²=138/5−(4.4)²=27.6−19.36=8.24 ✓. Answer: (a) 4 and 9.

Q2. Class of 30 (mean=60) combined with class of 20 (mean=50). Combined mean:

(a) 55 (b) 56 (c) 58 (d) 54

Answer: (b) 56
(30×60+20×50)/50=(1800+1000)/50=2800/50=56.

Q3. For data 5,7,9,11,13 (mean=9), if each value is multiplied by 2, the new mean is:

(a) 9 (b) 11 (c) 18 (d) 4.5

Answer: (c) 18
Multiplying each value by k multiplies the mean by k. New mean=9×2=18.

3. Median & Mode

3.1

Median by Interpolation & Mode by Formula

Find the median/modal class first, then apply the formula

⚡ Median & Mode — Grouped Data

MEDIAN (grouped data): Median = L + [(N/2 - CF) / f] x h L = lower boundary of median class N = total frequency CF = cumulative frequency of class BEFORE median class f = frequency of median class h = class width Median class = first class where CF exceeds N/2 MEDIAN (ungrouped, sorted): n odd: middle value = ((n+1)/2)-th observation n even: average of (n/2)-th and (n/2+1)-th observations MODE (grouped data): Mode = L + [(f1 - f0) / (2f1 - f0 - f2)] x h L = lower boundary of modal class (highest frequency) f1 = frequency of modal class f0 = frequency of class before modal class f2 = frequency of class after modal class h = class width EMPIRICAL RELATION: Mode = 3 Median - 2 Mean Mean = (3 Median - Mode) / 2 Median = (2 Mean + Mode) / 3

The empirical relation is tested directly: given any two measures, find the third. Always write it as Mode = 3M(edian) - 2M(ean). The mnemonic: "3 median, 2 mean".

📌 Median Steps for Grouped Data (memorise this sequence):
(1) Find N = Σf. (2) Compute N/2. (3) Build CF column. (4) Median class = first class where CF > N/2. (5) Read L, CF (of previous class), f, h. (6) Apply formula.

Worked Example — Median for Grouped Data

Classes: 0–10(f=5), 10–20(f=8), 20–30(f=12), 30–40(f=7), 40–50(f=3). N=35.

N/2=17.5. CF: 5, 13, 25, 32, 35. Median class = 20–30 (CF first exceeds 17.5).

L=20, CF=13, f=12, h=10. Median = 20+[(17.5−13)/12]×10 = 20+3.75 = 23.75.

📝 TOPIC-WISE PYQ

Median, Mode & Empirical Relation — NDA-Pattern Questions

Q4. Median of 3,5,7,9,11,13,15 is:

(a) 7 (b) 9 (c) 11 (d) 8

Answer: (b) 9
n=7 (odd). Median = 4th value = 9.

Q5. Mean=26, Mode=23. Find median.

(a) 24 (b) 25 (c) 26 (d) 27

Answer: (b) 25
23=3 Median−2(26). 3 Median=75. Median=25.

Q6. Mean=24.6, Median=26.1. Find mode.

(a) 29.1 (b) 28.5 (c) 30 (d) 27

Answer: (a) 29.1
Mode=3(26.1)−2(24.6)=78.3−49.2=29.1.

🔥 TRICKY QUESTIONS

Central Tendency — Classic NDA Traps

🤯 T1. Mean of n observations is x-bar. Effect of (a) adding k to all, (b) multiplying all by k.

(a) Adding k: new mean = x-bar + k. Mean shifts by same constant.
(b) Multiplying by k: new mean = k times x-bar. Mean scales by same factor.
For SD: adding k leaves SD unchanged; multiplying by k gives new SD = |k| times sigma.
Critical exam trap: adding a constant NEVER changes variance or SD. Only multiplying does.

🤯 T2. Mean of 20 values is 45. One value was misread as 34 instead of 43. Correct mean?

Incorrect sum=20×45=900. Correct sum=900−34+43=909.
Correct mean=909/20=45.45.
Method: compute sum from mean, adjust for the error, recompute.

4. Measures of Dispersion

4.1

Variance, Standard Deviation & Coefficient of Variation

SD = sqrt(Variance); CV = (SD/Mean) x 100 — most tested dispersion formulas in NDA

⚡ Dispersion — Complete Formula Set

MEAN DEVIATION about Mean: MD = Sum|x_i - mean| / n (ungrouped) MD = Sum f_i|x_i - mean| / N (grouped) VARIANCE (population): sigma^2 = Sum(x_i - mean)^2 / n (ungrouped) SHORTCUT: sigma^2 = (Sum x_i^2)/n - mean^2 ["mean of squares minus square of mean"] Grouped: sigma^2 = Sum f_i(x_i - mean)^2 / N = (Sum f_i x_i^2)/N - mean^2 STANDARD DEVIATION: sigma = sqrt(sigma^2) (always non-negative) STEP DEVIATION VARIANCE (class width h): u_i = (x_i - A)/h sigma^2 = h^2 * [(Sum f_i u_i^2)/N - ((Sum f_i u_i)/N)^2] COEFFICIENT OF VARIATION: CV = (sigma / mean) x 100 (as percentage) Lower CV = more homogeneous/consistent data KEY PROPERTIES: Adding constant k to all values: sigma unchanged, sigma^2 unchanged Multiplying all by k: new sigma = |k|*sigma, new sigma^2 = k^2*sigma^2 All values equal: sigma = 0 First n natural numbers: sigma = sqrt((n^2-1)/12)

Shortcut formula sigma^2 = (Sum x^2)/n - mean^2 is almost always faster than the definition. Memorise it as "mean of squares minus square of mean". For grouped data, replace Sum x^2 with Sum f*x^2.

Worked Example — Variance & SD

Data: 2, 4, 6, 8, 10. Find variance and SD.

Mean = 30/5 = 6. Sum x² = 4+16+36+64+100 = 220.

Variance = 220/5 − 36 = 44 − 36 = 8. SD = √8 = 2√2 ≈ 2.83.

Worked Example — Coefficient of Variation

Factory A: mean=5000, SD=500. Factory B: mean=4000, SD=360. More consistent?

CV(A)=(500/5000)×100=10%. CV(B)=(360/4000)×100=9%.

CV(B) < CV(A) → Factory B is more consistent.

📝 TOPIC-WISE PYQ

Dispersion — NDA-Pattern Questions

Q7. If each observation is divided by 5, the SD of the new set is:

(a) 5σ (b) σ/5 (c) σ (d) σ−5

Answer: (b) σ/5
Dividing by 5 = multiplying by 1/5. New SD = σ/5.

Q8. Variance of 1,2,3,4,5 is:

(a) 1 (b) 2 (c) √2 (d) 4

Answer: (b) 2
Mean=3. Sum x²=55. Variance=55/5−9=11−9=2.

Q9. σ²=16, mean=8. Coefficient of variation is:

(a) 50% (b) 25% (c) 2% (d) 12.5%

Answer: (a) 50%
σ=4. CV=(4/8)×100=50%.

🔥 TRICKY QUESTIONS

Dispersion — Classic NDA Traps

🤯 T3. Two sets of 50 obs each: same mean 16, SD 4 and 6. Find variance of combined 100 obs.

For each set: Sum x^2 = n(sigma^2 + mean^2).
Set 1: 50(16+256)=50*272=13600. Set 2: 50(36+256)=50*292=14600.
Combined Sum x^2 = 28200. Combined variance = 28200/100 - 256 = 282-256 = 26.

🤯 T4. SD of first 11 natural numbers using formula SD = sqrt((n^2-1)/12).

SD = sqrt((121-1)/12) = sqrt(120/12) = sqrt(10) = 3.162...
Verify: mean=6, Sum x^2=506, variance=506/11-36=46-36=10, SD=sqrt(10) ✓.

5. Correlation & Regression (Basics)

5.1

Karl Pearson’s r & Lines of Regression

NDA tests range, sign and interpretation of r, and the relation r² = b_yx × b_xy

⚡ Correlation & Regression

KARL PEARSON CORRELATION COEFFICIENT (r): r = Sum[(x_i - x-bar)(y_i - y-bar)] / sqrt[Sum(x_i-x-bar)^2 * Sum(y_i-y-bar)^2] Shortcut: r = [n*Sum(xy) - Sum(x)*Sum(y)] / sqrt{[n*Sum(x^2) - (Sum x)^2] * [n*Sum(y^2) - (Sum y)^2]} RANGE: -1 <= r <= 1 r = +1: perfect positive; r = -1: perfect negative; r = 0: no linear correlation SPEARMAN RANK CORRELATION: r_s = 1 - [6 * Sum(d^2)] / [n(n^2 - 1)] d = difference between ranks of corresponding pairs REGRESSION LINES: y on x: (y - y-bar) = b_yx (x - x-bar) b_yx = r * (sigma_y / sigma_x) x on y: (x - x-bar) = b_xy (y - y-bar) b_xy = r * (sigma_x / sigma_y) RELATION BETWEEN REGRESSION COEFFICIENTS: r^2 = b_yx * b_xy => r = sqrt(b_yx * b_xy) [with sign of b values] Both regression lines pass through the point (x-bar, y-bar).

r² = b_yx × b_xy is the most tested fact. b_yx can exceed 1 in magnitude; only r must stay in [-1,1]. Both regression lines always intersect at the point of means (x-bar, y-bar).

Interpreting r

0.9 to 1.0: very high positive
0.7 to 0.9: high positive
0.4 to 0.7: moderate positive
0 to 0.4: weak positive
Negative r: inverse relationship
r = 0: no linear correlation

Key Regression Facts

Both lines pass through (x-bar, y-bar)
r² = b_yx × b_xy
Both b coefficients have same sign
b_yx may be > 1; r must be ≤ 1
Angle between lines → 0 as |r| → 1

📝 TOPIC-WISE PYQ

Correlation & Regression — NDA-Pattern Questions

Q10. b_yx=0.8, b_xy=0.2. Find r.

(a) 0.4 (b) 0.16 (c) 0.04 (d) 1.0

Answer: (a) 0.4
r=√(0.8×0.2)=√0.16=0.4.

Q11. r=0.7. Coefficient of determination is:

(a) 0.49 (b) 0.7 (c) 0.3 (d) 0.51

Answer: (a) 0.49
Coefficient of determination = r² = 0.7² = 0.49.

🔥 TRICKY QUESTIONS

Correlation & Regression — Conceptual Traps

🤯 T5. Two regression coefficients are 1.6 and 0.4. Is this valid? Find r.

r^2 = 1.6*0.4 = 0.64. r = +0.8 (both positive). Valid since |r|=0.8 <= 1.
b_yx = 1.6 > 1 is perfectly allowed. Only r must be in [-1,1].
r = 0.8. Strong positive correlation.
Common trap: students assume both regression coefficients must be <= 1. Wrong.

🤯 T6. Spearman rank: 5 students, d values 1,-2,3,-1,-1. Find r_s.

Sum d^2 = 1+4+9+1+1 = 16. n=5.
r_s = 1 - 6*16/(5*24) = 1 - 96/120 = 1 - 0.8 = 0.2. Weak positive.

📝 Master Formula Sheet — MN24 Statistics

All critical formulae for rapid pre-exam revision.

● Mean (Grouped)

Direct: mean = Sum(fx)/N
Assumed mean: A + Sum(fd)/N; d = x-A
Step deviation: A + h*Sum(fu)/N; u=(x-A)/h
Combined: (n1*m1+n2*m2)/(n1+n2)
Adding k: new mean = mean+k; multiplying: k*mean

▲ Median & Mode

Median = L+[(N/2-CF)/f]*h
Mode = L+[(f1-f0)/(2f1-f0-f2)]*h
Mode = 3 Median - 2 Mean
Ungrouped odd n: middle value
Ungrouped even n: avg of two middles

σ Variance & SD

Var = Sum(x-mean)^2/n = Sum(x^2)/n - mean^2
Grouped: Sum f(x-mean)^2/N
SD = sqrt(variance); always non-negative
Add k: SD unchanged; multiply k: SD = k*SD
First n naturals: SD=sqrt((n^2-1)/12)

📈 CV & Mean Deviation

CV = (sigma/mean) x 100 %
Less CV = more consistent data
MD(mean) = Sum|x-mean|/n (ungrouped)
MD(mean) = Sum f|x-mean|/N (grouped)

∞ Correlation

r = [n*Sum(xy) - Sum(x)*Sum(y)] / sqrt[...]
-1 <= r <= 1 always
r^2 = b_yx * b_xy
Spearman: r_s = 1 - 6*Sum(d^2)/[n(n^2-1)]

📊 Regression

y on x: y-y-bar = b_yx(x-x-bar)
x on y: x-x-bar = b_xy(y-y-bar)
Both lines pass through (x-bar, y-bar)
r = sqrt(b_yx * b_xy) with same sign

⚡ Quick Revision Booster — MN24 Statistics

● Mean Quick

Step deviation: fastest for grouped
Choose A = middle class mark
Mean = A + h*(Sum fu/N)
Combined: weighted average
Adding k shifts mean by k

▲ Median Steps

Find N, compute N/2
Build cumulative frequency (CF)
Median class: CF first exceeds N/2
Apply L+[(N/2-CF)/f]*h
Ogive: read off at N/2 on y-axis

▲ Mode & Empirical

Modal class = highest frequency bar
Mode=L+[(f1-f0)/(2f1-f0-f2)]*h
Mode = 3 Median - 2 Mean
Given any two, find the third
Normal distribution: Mean=Median=Mode

σ Variance

Var = Sum(x^2)/n - mean^2 (shortcut)
SD = sqrt(variance)
Add constant: SD unchanged
Multiply by k: SD becomes k*SD
CV = sigma/mean x 100; less = better

∞ Correlation

-1 <= r <= 1 always
r^2 = b_yx * b_xy (product rule)
Both regression lines through (mean_x, mean_y)
Spearman: 1 - 6*Sum(d^2)/[n(n^2-1)]
b_yx can exceed 1; r cannot

🚨 Critical Traps

CF in median formula = PREVIOUS class CF
Mode = 3Median - 2Mean (not reversed)
Adding constant does NOT change SD/variance
CV compares datasets; less CV = more consistent
r^2 not r equals product of regression coefficients
Median class: CF first EXCEEDS N/2 (not equals)

📝 Mock Tests 🎯 Subject Quizzes ✈️ Telegram

MN24 — Statistics

Statistics

Topics at a Glance

1. Data Representation

Frequency Table Terms

Graph Types & Uses

Ogive & Quartiles

2. Mean — Three Methods

3. Median & Mode

4. Measures of Dispersion

5. Correlation & Regression (Basics)

Interpreting r

Key Regression Facts

📝 Master Formula Sheet — MN24 Statistics

⚡ Quick Revision Booster — MN24 Statistics