data processing and analysis of data

March 18, 2017 | Author: ankita3031 | Category: N/A
Share Embed Donate


Short Description

Download data processing and analysis of data...

Description

A Talk On ‘Data Processing and Analysis of Data’ (Research Methodology)

Introduction • The data has to be processed and analyzed for the purpose of research plan • This is essential for scientific study and comparisons. • Processing implies – – – –

Editing Coding Classification and Tabulation

• Analysis implies – Computation of certain measures – Searching for patterns of relationships that exists among data groups.

Processing Operations 1. Editing – The process of examining the collected raw data to detect errors and omission and also correct these. – It involves scrutiny of the completed questionnaires and/or schedules. – There are two variations of editing • •

Field editing. Central editing.

• Field editing – Consists of review of the reporting forms by the investigator for completing (rewriting) what has been written in abbreviated form at the time of recording the response. – This editing is expected to be done as soon as possible after the interview. – While doing field editing the investigator should not try to correct errors or omissions by simply guessing the suitable option.

• Central editing – Takes place when all forms or schedules have been completed and returned to office. – All the forms should be edited by a single editor in a small study or a team of editors in case of large inquiry. – Corrections are allowed in this editing.

– There are certain points to be kept in view while performing their work a) Editors should be familiar with instructions given to the interviewers and coders. b) Single line should be drawn to cross out any information. c) Entries should be made in some distinctive color and in standardized form. d) They should initial all answers which they change or supply,. e) Editor’s initials and the date of editing should be placed on each completed from or schedule.

2. Coding – Refers to the process of assigning numerals or other symbols to answers so that the response can be put into limited categories. – Necessary for efficient analysis. – Coding decision is usually taken at the design stage of the questionnaire.

3. Classification – Individual Data should be reduced into homogeneous groups to get meaningful relationships. – classification is the process of arranging data in groups or classes on the basis of some common characteristics.

• Broadly there are two types of classification based on the nature of the phenomena involved. a) Classification according to attributes. b) Classification interval.

according

to

class-

• Classification according to attributes: – Data are classified on the basis of common characteristics either descriptive or numerical. – Descriptive characteristics refer to qualitative phenomenon which cannot be measured quantitatively – Data obtained this way is known as statistics of attributes.

– This classification can be either simple or manifold – In Simple classification, we consider only one attribute and make two classes; one possessing the considered attribute and the other devoid of it. – In Manifold classification, more than one attributes are considered and data is divided into number of classes.

• Classification according to classinterval: – Data relating to income, production, age etc are known as statistics of variables and are classified on the basis of class intervals.

4. Tabulation – Tabulation refers to the process of summarizing the raw data and displaying the same in compact form. – It is essential because: • •

It conserves space and reduces explanatory statements to minimum. Facilitates the process of comparison.

the

Elements/Types of Analysis • In case of survey or experimental data, analysis involves – estimating the values of unknown parameters of the population, – Testing of hypotheses for drawing inferences.

• Categories of analysis: a)Descriptive b)inferential

• Correlation analysis: – Studies the joint variation of two or more variables for determining the amount of correlation between two or more variables.

• Casual analysis: – Studies how one or more variable affect changes in another variable.

• Multivariate analysis: – “All statistical methods which simultaneously analyze more than two variables on a sample of observations.” – It involves: a) b) c) d)

Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance Canonical analysis

STATISTICS IN RESEARCH • Statistics in research functions as a tool in designing research, analyzing its data and drawing conclusions there from. • The important statistical measures used to summarize the survey/research are: 1) Measure of central tendency or statistical averages. 2) Measures of dispersion

3. Measures of asymmetry(skewness) 4. Measures of relationship 5. Other measures

Measure of Central Tendency – It tells the point about which items have a tendency to cluster. – Mean, Median ,Modes are the most popular averages. – Mean is also known as arithmetic average – Median is the value of the middle item of series when it is arranged in ascending or descending order. – Mode is the most commonly or frequently occurring value in a series.

Measure of Dispersion – It is used to give an idea about the scatter of the values of items of a variable in the series around the true value of average. – Important measures of dispersion are: a) Range b) Mean deviation and c) Standard deviation

• Range – Is the simplest possible measure of dispersion – It is defined as the difference between the values of the extreme items of a series.

• Mean deviation – It is the average of difference of the values of items from some average of the series.

• Standard deviation – Most widely used measure of dispersion – Denoted by the symbol σ

– Standard deviation is defined as the square root of the average of squares of deviations.

Where

Measure of Asymmetry – When the distribution of the elements in a series happens to be perfectly symmetrical then we get the following type of curve. Technically such curves are described as normal curve.



If the curve is distorted, it is said to exhibit asymmetrical distribution which indicates the presence of skewness.

– Where

Measures of Relationship – In context of bivariate and multivariate population, it is required to know the relation of the two or more variables in the data to one another. – These association/correlation and causeand-effect relationship are studied using correlation technique and the technique of regression

• In case of bivariate population: – Correlation can be studied through: a) Cross tabulation b) Charles Spearman’s coefficient correlation c) Karl Pearson’s coefficient of correlation

of

– Cause-and-effect relationship can be studied through simple regression technique.

1. Cross tabulation: – –

Useful when the data are in nominal form Classify each variable in two or more categories and then cross classify the variables in these categories. The interaction between them can be as follows:

– • • •

Symmetrical Reciprocal Asymmetrical

• In a symmetrical relationship the two variables vary together. • In reciprocal relationship the two variables mutually influence or reinforce each other. • In an asymmetric relationship one variable (independent variable) is responsible for another variable (dependent variable).

2. Charles Spearman’s correlation:

coefficient

of

― This technique deals with ordinal data where ranks are given to the different values of the variables ― The objective is to determine the extent to which the two sets of ranking are similar of dissimilar.

3. Karl Pearson’s coefficient of correlation: – Most widely used method to measure the degree of relationship between two variables.

• Simple regression analysis: – Regression is the determination of a statistical relationship between two or more variables, where one variable is the cause of the behavior of another variable. – If X is the independent variable and Y is the dependent variable then, the regression equation of Y on X is given as below

• In case of multivariate population: – Correlation can be studied through: a)coefficient of multiple correlation. b)coefficient of partial correlation.

– Cause-and-effect relationship can be studied through multiple regression equations.

1. Multiple Correlation and Regression – When there are two or more independent variables then the analysis concerning relationship is known as multiple correlation – The equation describing such relationship is known as multiple regression equation.

• In the context of two independent variables and one dependent variable the equation can be given as:

• Partial correlation: – Partial correlation measures separately the relationship between two variables such that the effect of other related variable is eliminated – In other words the aim is at measuring the relation between a dependent variable and particular independent variable by holding all other variables constant.

Other Measures 1. Index number: – Used when the series are expressed in different units. – In such scenario the series is converted into series of index numbers. – For example the given figures can be expressed in terms of percentage.

2.

Time- Series Analysis: –



When the data collected relates to some time period concerning a given phenomenon, particularly in economic and business scenario, such data are labeled as ‘Time-Series’ Factors affecting such series are I. II.

Secular trend (T) : changes taking place at long duration of time Short time oscillations: changes taking place at short duration of time



Short time oscillation are affected by the following factors: a) Cyclic fluctuations (C): the fluctuations as a result of business cycles. b) Seasonal fluctuations (S): these fluctuations are of short duration occurring at a regular sequence at specific interval of time. c) Irregular fluctuations (I): such fluctuations takes place at completely unpredictable fashion.

• For analyzing time series there are two models: a) Multiplicative model b) Additive model Multiplicative model assumes that the various component interact in a multiplicative manner to produce the given values of the overall time series and can be stated as;

The additive model considers the total of various components resulting in the given values of the overall time series and can be stated as

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF