# Descriptive Statistics Rahul's Noteblog Notes on Biostatistics Descriptive Statistics

## Simple Random Samples:

• This sample has the simplest kind of probability.

• Every sample has an equal probability of being included. Eg., a card game or tossing a coin.

• This sample can be representative if it closely resembles the population from which it is drawn.

## Stratified Random Sample:

• The population is first divided into relatively internally homogenous groups, or strata, from which random samples are then drawn.

• A stratified sample is used to ensure that various segments of the population are represented in the sample. If a random selection of diabetic people were selected to test the effects on a new drug and you wanted to be sure that people in various age groups were were included in your sample you could separate the population into three strata-psychology. People under 25, people 26-50 and people over 50.

## Cluster Sampling:

• Used when simple random or stratified random samples are too expensive to obtain.

• A cluster is a naturally occurring group. If the tests were done at 3 different clinics, each cluster would be a clinic. Note that each clinic would have people of all ages. Use this type of sampling when all clusters have similar characteristic.

## Stratified Sampling vs. Cluster Sampling:

• These two are often confused. In stratified sampling, we're artificially dividing the population based on an important factor. Second, stratified sampling involves one level of division. In cluster sampling, however, the divisions are naturally occurring. Also, there are two levels of division in cluster sampling.

## Systemic Sample:

• We say we choose every kth member. In this example, k = 5. Every 5th member of the population is selected.

• Provides an equivalent of random sampling without actually using randomization.

## Probability:

• The probability of any one of several particular events occurring is equal to of the sum of their individual probabilities, provided that the events are mutually exclusive.

### Multiplication rule:

• The probability of two or more statistically independent events all occurring together is equal to the product of their individual probabilites.

## Types of Data:

### Nominal:

• Data is qualitative categories (Male/Female, Black/White, etc.)

### Ordinal:

• Scale data with meaningful order. (Student rank of 1st/ 2nd/ 3rd in their class)

### Interval:

• Meaningful order and intervals are usually measured quantities. There is no absolute zero. (Celsius scale).

### Ratio:

• Same as interval only it has absolute zero. There is absolute zero. (Biomedical variables of weight in grams or pounds).

### Discrete:

• Only certain values and none in between.

• Ex: Number of patients in a hospital may be 178 or 179, but cannot be in between these two. The number of syringes used in a clinic on any given day.

### Continuous or Quantitative:

• Take any value.

• Ex: A person's weight, height, age, blood pressure.

• You can calculate a group mean, such as the mean age of a class of medical students.

### Categorical or Qualitative Data:

• Categories of things, people, animals, trees, etc.

## Normal (Bell Curve) Distribution: ## Bell Curve Notes:

• Frequency distributions take on many different shapes, but many naturally occur as a symmetrical, bell-shaped curve.

• Approximately 68% of the distribution falls within +- 1 standard deviation of the mean.

• Approximately 95% of the distribution falls within +- 2 standard deviations of the mean.

• Approximately 99.7% of the distribution falls within +- 3 standard deviations of the mean.

• Because these proportions hold true for every normal distribution, they should be memorized.

## Descriptions of Mode, Mean, and Median:

### Mean:

• The average of a distribution.

### Median:

• The Mid point of a distribution.

### Mode:

• The most common value of a distribution.

## Variance:

• Sum of the squares of deviation of scores divided by sample size minus 1.

## Standard Deviation:

• Square-root of variance.

## Z-Scores

• The location of any element in a normal distribution can be expressed in terms of how many standard deviations the element (value) lies above or below the mean. In order to do this we have to standardize the element (value) into a z score using the following formula. A positive z score says the element will lie above the mean and a negative z score says the element will lie below the mean.

• Z-scores allow us to specify the probability of a randomly picked element being above or below a particular score.

• Z-Score = (sample mean - population mean) / standard deviation.