Statistics is the study of collecting, organizing, analyzing, interpreting, and presenting data. It involves techniques for summarizing and making inferences from data to understand patterns, trends, and relationships.

What are the main branches of statistics?

Descriptive statistics: Methods for summarizing and describing data. Inferential statistics: Techniques for making predictions or inferences about a population based on sample data.

What are measures of central tendency?

Measures of central tendency, such as mean, median, and mode, provide a single value that represents the center or midpoint of a data set.

What is the difference between population and sample?

Population refers to the entire group of individuals or items that are of interest to a study. A sample is a portion of the population chosen for observation and analysis.

How do descriptive and inferential statistics differ?

Descriptive statistics summarize and describe the main features of a data set. Inferential statistics involve making predictions or inferences about a population based on sample data.

What are the types of data in statistics?

Nominal, ordinal, interval, and ratio are the main types of data, each with different levels of measurement and characteristics.

What are some common statistical tests?

Common statistical tests include t-tests, chi-square tests, ANOVA, regression analysis, and correlation analysis, among others.

How do you interpret p-values?

The p-value indicates the probability of obtaining the observed results (or more extreme results) if the null hypothesis is true. A smaller p-value suggests stronger evidence against the null hypothesis.

JEE Major Test Series

Practice for the real exam with full syllabus mocks.

Home

JEE Maths

Statistics

Statistics is a mathematical branch that deals with collecting, analyzing, interpreting, presenting, and organizing data. It involves techniques for making inferences and decisions based on data, helping us understand patterns, trends, and relationships in various fields such as science, economics, and social sciences.

1.0Introduction to Statistics

Introduction to Statistics is a foundational course or text that provides an initial understanding of the principles, methods, and applications of statistics. This field of study covers techniques for collecting, organizing, analyzing, interpreting, and presenting data, as well as making informed decisions based on statistical findings. An introduction to statistics aims to provide learners with the fundamental resources, skills and concepts necessary for navigating and applying statistical methods in various disciplines and real-world situations.

Statistics, in essence, entails the examination and manipulation of data. As previously discussed, it involves the analysis and computation of numerical data. Let's now explore additional interpretations of statistics provided by various authors.

As per the Merriam-Webster dictionary, statistics are defined as the arrangement of classified facts representing the conditions of a population, particularly those that can be expressed numerically or organized in tables.

Statistician Sir Arthur Lyon Bowley defines statistics as numerical statements of facts from any area of investigation placed in relation to one another.

2.0Central Tendency in Statistics

Central Tendency in Statistics is a concept used to identify the central or average value within a data set. It provides a single value that represents the typical or central value of the data distribution. The main three measures of central tendency are the mean, median, and mode.

1. Mean: Also known as the average, the mean is calculated by summing up all the values in the data set and dividing them by the total number of values. It is sensitive to extreme values, making it useful for symmetric distributions.

2. Median: The median is determined as the central value within a dataset once the values have been arranged in ascending or descending order. If there is an even number of values, the median is the average of the two middle values. It is less affected by extreme values and is useful for skewed distributions.

3. Mode: The mode is the value that occurs most frequently in the data set. It is useful for identifying the most common value or values in a distribution, regardless of whether the data is numerical or categorical.

These measures provide valuable insights into the central tendencies of a data set, helping researchers and analysts understand its characteristics and make informed decisions.

Related Video:

Arithmetic Mean

For Ungrouped Distribution

The arithmetic mean, often simply called the mean, is a measure of central tendency of statistics that represents the average of a set of numerical values. It is calculated by adding up all the values in the data set and then dividing the sum by the total number of values.

The formula for calculating the arithmetic mean $\overline{x}$ of a data set x₁, x₂, ..., x_n with n observations is:

$\overset{x}{ˉ} = \frac{x _{1} + x _{2} + \dots . + x _{n}}{n}$

In simpler terms, you add up all the values and then divide by how many values there are. The mean provides a single value that summarizes the central tendency of the data set. It is widely used in various fields for summarizing and analyzing data. However, it can be influenced by extreme values (outliers) in the data set, so it's important to consider the context of the data when interpreting the mean.

For Ungrouped Frequency Distribution

For ungrouped frequency distributions, where each data point has a corresponding frequency (how often it occurs), the arithmetic mean is calculated slightly differently.

Let's say we have a set of n data points x₁, x₂, ..., x_n with corresponding frequencies f₁, f₂, ..., f_n. To find the mean $\overline{x}$ , you follow these steps:

1. Multiply each data point by its frequency.

2. Sum up all these products.

3. Divide the sum by the total frequency (sum of all frequencies).

The formula for calculating the mean for ungrouped frequency distributions is:

$\overline{x} = \frac{f _{1} x _{1} + f _{2} x _{2} + \dots . + f _{n} x _{n}}{( f _{1} + f _{2} + \dots .. + f _{n} )}$

$\overset{x}{ˉ} = \frac{\sum _{i = 1}^{n} f _{i} x _{i}}{\sum _{i = 1}^{n} f _{i}} or \overset{x}{ˉ} = \frac{\sum _{i = 1}^{n} f _{i} x _{i}}{N}$

Where N= $\sum_{i = 1}^{n} f_{i}$

In other words, you are finding the weighted average of the data points, where each data point is weighted by its frequency. This accounts for the fact that some values may occur more frequently than others in the data set.

Median

The median serves as an additional measure of central tendency, similar to the mean, but it represents the middle value of a data set when arranged in ascending or descending order. To find the median:

1. Arrange the data set in either ascending or descending order.

2. When the number of observations (n) is odd, the median corresponds to the value positioned at the center of the ordered list.

3.If the total number of observations (n) is even, the median is calculated as the arithmetic mean of the two middle values.

If n is the number of observations:

For Ungrouped Distribution and Ungrouped Frequency Distribution

For odd n: Median = Value of $(\frac{n + 1}{2}) (\frac{n + 1}{2}) t h$ observation.
For even n: Median = $\frac{Value of ( \frac{n}{2} ) thobservation + Value of ( \frac{n}{2} + 1 ) th observation}{2}$ .

The median is often preferred over the mean when the data set contains outliers or is skewed, as it is less affected by extreme values. It provides a better representation of the typical value in such cases.

For Grouped Frequency Distribution

When dealing with a grouped frequency distribution, where data is organized into intervals or classes along with their corresponding frequencies, finding the median involves a slightly different approach.

Here's how to find the median for a grouped frequency distribution:

Find the Median Class: Determine the class interval containing the median. This is usually the class where the cumulative frequency exceeds half of the total frequency.
Calculate Median: Use the formula for finding the median within a class interval:

Median = $l + \frac{( \frac{N}{2} - F )}{f} \times h$

where:

$ℓ$ = Lower boundary of the median class

N = Total frequency

F = Cumulative frequency of the class before the median class

f = Frequency of the class interval that contains the median.

h = Width of the class interval

This formula calculates the median by interpolation within the median class, considering the width of the class interval and the cumulative frequencies.

If the median class is not a continuous interval (e.g., if it's a single value), you can still use interpolation within that interval using similar principles.

Once you find the median, it represents the middle value of the data set. It's a useful measure of central tendency, especially for data sets with continuous values.

Mode

Mode is the value of variate, which has maximum Frequency.

(i) for ungrouped distribution

The value of the variable which repeated the maximum number of times.

(ii) for ungrouped frequency distribution.

The value of the variate has the maximum frequency.

(iii) for grouped frequency distribution.

First, we find the class with the maximum frequency. This is a modal class, then

Mode = $ℓ + \frac{( f _{0} - f _{1} )}{( 2 f _{0} - f _{1} - f _{2} )} \times h$

Where

$ℓ$ = lower limit of modal class.

f₀ = frequency of modal class.

f₁ = frequency of the class preceding modal class.

f₂ = frequency of the class succeeding modal class.

h = class interval of modal class.

3.0Relation between Mean, Median and Mode

In a moderately asymmetrical distribution following relation between mean, median and mode of a distribution.

It is known as the Empirical Formula.

Mode = 3 Median – 2 Mean

4.0Range

The range is a simple measure of variability in a data set, representing the difference between the maximum and minimum values. It provides a quick understanding of the spread or dispersion of data points.

The formula for calculating the range is straightforward:

Range = Maximum Value – Minimum Value

For example, if you have a data set {5, 10, 15, 20, 25}, the range would be:

Range = 25 – 5 = 20

The range in this case is 20, indicating that the data values span a range of 20 units.

While the range gives a sense of the spread of data, it can be greatly influenced by outliers and may not fully represent the variability within the data set.

5.0Variance and Standard Deviation

Variance

The Mean of the square of deviation of each variate from their mean is called variance, it is denoted by σ².

Standard Deviation (S.D.)

The positive square root of the variance is referred to as the standard deviation. It is denoted by σ.

i.e., $= S.D. = + variance$

Formulae for variance

(i) For ungrouped distribution

$σ^{2} = \frac{\sum ( x _{i} - x ˉ ) ^{2}}{n}$

(ii) For frequency distribution

$σ^{2} = \frac{\sum f _{i} ( x _{i} - x ˉ ) ^{2}}{N}$

6.0Statistics Formulas

Here are some Statistics Formula

Statistics	Formulae
Mean(Arithmetic mean)	$\overset{x}{ˉ} = \frac{x _{1} + x _{2} + \dots + x _{n}}{n}$
Median	For odd n: Median = Value of $(\frac{n + 1}{2}) t h$ observation.
	For even n: Median = $\frac{Value of ( \frac{n}{2} ) thobservation + Value of ( \frac{n}{2} + 1 ) th observation}{2}$
Mode	Mode $= ℓ + \frac{( f _{0} - f _{1} )}{( 2 f _{0} - f _{1} - f _{2} )} \times h$
Range	Range = Maximum Value − Minimum Value
Variance	$σ^{2} = \frac{\sum f _{i} ( x _{i} - x ˉ ) ^{2}}{N}$
Standard Deviation	$S.D. = + variance$

7.0Importance of Statistics

The importance of statistics cannot be overstated, as it plays a crucial role in various facts of research, decision-making, and everyday life. Here are some key reasons why statistics are important:

Data-driven Decision Making: Statistics provide a systematic framework for collecting, organizing, analyzing, and interpreting data. By applying statistical methods, individuals and organizations can base their decisions on empirical evidence rather than relying on intuition or guesswork, thanks to statistics.
Prediction and Forecasting: Through the examining past data and recognizing trends and patterns, industries such as stock market analysis, weather forecasting, and economic forecasting are examples of areas where statistical methods are used to make predictions about future events.
Risk Management: Statistics play a crucial role in risk assessment and management. By analyzing probabilities and distributions, individuals and organizations can assess and mitigate risks in various contexts, including insurance, finance, and healthcare.
Healthcare and Medicine: Statistics play a critical role in healthcare and medicine by analyzing clinical data, evaluating treatment effectiveness, and conducting epidemiological studies. Medical research relies on statistical analyses to identify risk factors, assess outcomes, and develop new treatments.
Democracy and Citizenship: Access to reliable and accurate statistical information is essential for informed citizenship and democratic participation. Statistics provide transparency, accountability, and evidence-based information for policymakers, stakeholders, and the general public.

8.0Limitations of Statistics

Here are some limitations of statistics:

Limited Scope: Statistics can only provide insights into quantitative aspects of a phenomenon. It may not capture qualitative or contextual information, leading to a partial understanding of the subject matter.
Sampling Bias: Statistical analyses are often based on samples rather than entire populations. If the sample is not representative of the population, the results may not be generalizable.
Assumptions: Many statistical methods rely on certain assumptions about the data, such as normality or independence. Failure to adhere to these assumptions can result in inaccurate outcomes.

9.0Solved Examples on Statistics

Example 1: In a group of 30 observations the mean of the first 10 is 12 and the last 20 is 9, then find the mean of the whole distribution.

Solution:

Sum of first 10 observation = 10 × 12 = 120

Sum of last 20 observation = 20 × 9 = 180

Sum of 30 observation = 120 + 180 = 300

Mean = $\frac{\sum x _{i}}{n} = \frac{300}{30} = 10$

Example 2: Find the A.M. of the following frequency distribution.

$x_{i} f_{i} 548511614101720$

Solution:

Here $N = \sum f_{i} = 4 + 5 + 6 + 10 + 20 = 45$

$\sum f_{i} x_{i} = (5 \times 4) + (8 \times 5) + (11 \times 6) + (14 \times 10) + (17 \times 20) = 606$

$∴ \overset{x}{ˉ} = \frac{Σ f _{i} x _{i}}{N} = \frac{606}{45} = 13.47$

Example 3: Find the median of following frequency distribution.

$x_{i} f_{i} 4235266211729478578675$

Solution:

$x_{i} 42475257626772 f_{i} 38681159 C \cdot f 3111725364150$

Here N=50 (even)

Median = $\frac{( \frac{N}{2} ) ^{th} term + ( \frac{N}{2} + 1 ) ^{th} term}{2}$

$= \frac{2 5 ^{th} term + 2 6 ^{th} term}{2}$

$= \frac{57 + 62}{2} = \frac{119}{2} = 59.5$

Example 4: Find the median of the following frequency distribution.

$Class f_{i} 0 - 5 6 5 - 10 10 10 - 15 8 15 - 20 9 20 - 25 7$

Solution:

$Class 0 - 5 5 - 10 10 - 15 15 - 20 20 - 25 f_{i} 610897 C . f . 616243340$

Here

$\frac{N}{2} = \frac{40}{2} = 20$

Median Class is 10 – 15

∴ Median = $10 + \frac{( 20 - 16 )}{8} \times 5$

= 10+2.5=12.5

Example 5: Find the mode of the following frequency distribution.

$Class f_{i} 0 - 10 8 10 - 20 30 20 - 30 40 30 - 40 10 40 - 50 12$

Solution:

Here model class is 20–30 (which has maximum frequency)

∴ Mode = $l + \frac{( f _{0} - f _{1} )}{( 2 f _{0} - f _{1} - f _{2} )} h$

=20+ $\frac{( 40 - 30 )}{( 2 \times 40 - 30 - 10 )} \times 10$

= 20+2.5

= 22.5

Example 6: Find variance of first 'n' even natural number is:

Solution:

x_i → 2,4,6,.....,2n

$= \frac{\sum x _{i}}{n} = \frac{n ( n + 1 )}{n} = n + 1$

Variance $(x_{i}) = \frac{\sum x _{i}^{2}}{n} - (\overset{x}{ˉ})^{2}$

$= \frac{2 ^{2} + 4 ^{2} + \dots . ( 2 n ) ^{2}}{n} - (n + 1)^{2}$

$= 4 \times \frac{n ( n + 1 ) ( 2 n + 1 )}{6 n} - (n + 1)^{2}$

$= \frac{4}{6} (n + 1) (2 n + 1) - (n + 1)^{2} = \frac{n ^{2} - 1}{3}$

Example 7: If the mean and variance of data a, b, 8, 5, and 10 are 6 and 6.8 respectively. Then the value of a and b is:

Solution:

Mean = 6

$\frac{a + b + 8 + 5 + 10}{5} = 6$

a + b = 7 ...(i)

and $v a r (x_{i}) = 6.8$

$\frac{a ^{2} + b ^{2} + 8 ^{2} + 5 ^{2} + 1 0 ^{2}}{5} - (6)^{2} = 6.8$

a²+ b²= 25 ...(ii)

from (i) and (ii)

${a = 3, b = 4 a = 4, b = 3$

10.0Statistics Practice Questions

1. The mean of discrete observations y₁, y₂,......,y_n is given by

(A) $\frac{\sum _{i = 1}^{n} y _{i}}{n}$

(B) $\frac{\sum _{i = 1}^{n} y _{i}}{\sum _{i = 1}^{n} i}$

(C) $\frac{\sum _{i = 1}^{n} y _{i} f _{i}}{n}$

(D) $\frac{\sum _{i = 1}^{n} y _{i} f _{i}}{\sum _{i = 1}^{n} f _{i}}$

2. The range for the given set of observations 2, 3, 5, 9, 8, 7, 6, 5, 7, 4, 3 is

(A) 11 (B) 7 (C) 5.5 (D) 6

3. What is the standard deviation of the set of observations 32, 28, 29, 30, 31?

(A) 1.6 (B) $2$

(C) 2 (D) None of these

4. The variance of the data 2, 4, 6, 8, 10 is

(A) 8 (B) 9 (C) 6 (D) 7

Also Read:

Central Tendency	Correlation and Regression	Coefficient of Correlation
Standard Deviation	Mean Deviation	Variance and Standard Deviation
Correlation Coefficient	Mean Median Mode Questions	Median of Grouped Data

Understand Probability and Statistics

Probability is a branch of mathematics focused on measuring the likelihood or chance of an event happening. It is a...

Understanding Sets

Sets are fundamental in Mathematics and are used to study relationships, perform operations, and define various Mathematical structures.

Fibonacci Numbers

Fibonacci numbers are a sequence of numbers where each number is the sum of the two preceding ones, usually starting with 0 and 1. This...

Exploring Fibonacci Sequence

The Fibonacci sequence is a sequence of numbers in which each number (called a Fibonacci number) is the sum of the two preceding ones. Typically...

Geometric Mean

Geometric Mean (GM) is a measure of central tendency that signifies the average value of a set of numbers by considering the product of their values.

Dot Product

The Dot Product is calculated as the sum of the products of corresponding components of two vectors.

Learn Tangent and Normal

The tangent line touches the curve at one point and shares the same direction as the curve at that point. The normal line...

Statistics

1.0Introduction to Statistics

2.0Central Tendency in Statistics

Arithmetic Mean

Median

Mode

3.0Relation between Mean, Median and Mode

4.0Range

5.0Variance and Standard Deviation

Variance

Standard Deviation (S.D.)

Formulae for variance

6.0Statistics Formulas

7.0Importance of Statistics

8.0Limitations of Statistics

9.0Solved Examples on Statistics

10.0Statistics Practice Questions

Table of Contents

Frequently Asked Questions

What is statistics?

What are the main branches of statistics?

What are measures of central tendency?

What is the difference between population and sample?

How do descriptive and inferential statistics differ?

What are the types of data in statistics?

What are some common statistical tests?

How do you interpret p-values?

Join ALLEN!

Related Articles:-

Understand Probability and Statistics

Understanding Sets

Fibonacci Numbers

Exploring Fibonacci Sequence

Geometric Mean

Dot Product

Learn Tangent and Normal