IS4241 - Revision

March 12, 2017 | Author: jiebo | Category: N/A
Share Embed Donate


Short Description

Download IS4241 - Revision...

Description

QUICK REFERENCE GUIDE

GRAPH CONCEPTS Name Graph

Multigraph Weighted Graph Labelled Graph Distance between 2 nodes Simple Path Length of a path

Definition A set of nodes & edges Can be directed or undirected A graph is connected if there is a path between any two of its vertices, otherwise they are connected components A graph that allows loops and multiple edges Graph with weighted edges A graph where its nodes or edges have properties (attributes) Shortest path between the 2 nodes

Name Diameter of graph

Example

Definition Maximum distance in graph

Distance between A & D is 2

Nodes are unique Number of Edges

NETWORK CHARACTERISTICS A Full network contains all entities and connections among them Ego: Node in focus Alter: neighbor of Ego Egocentric Network: an ego and its connections

Unimodal Network Multimodal Network

Only one type of vertex Vertices have

≥ 2 types

e.g. person, document Multiplex Network

Name

Edges of

≥ 2 types

e.g. people and modes of communication Graph Level Metrics Definition Usage

1

QUICK REFERENCE GUIDE

Size of network Density of network

The number of nodes in the network, or The number of edges in the network Number of ties in the network over number of ALL possible ties Directed network of size

Used to compute connectiveness of the network

n ,

no . of ties=n ×(n−1) Undirected network of size

no . of ties=n × Reachability

Degree Centrality In-degree Centrality Outdegree Centrality Closeness Centrality

n ,

n−1 2

The ability to get from one vertex to another within a graph Vertex Metrics (Centrality) Count of the total number of connections linked to vertex Note: in/out degree for directed graphs

∑ of shortest distance −1

Closeness Centrality=¿ all other vertices¿ OR Average Distance to all other vertices OR (Average Distance to all other vertices)-1

Betweenness Centrality

Measure of how often a given vertex lies on the shortest path between two other vertices

Betweenness Centrality=∑

Using geodesic (shortest) distance,

Node A=

1 =0.25 1+ 1+ 1+1

Node B=

1 =0.14 1+2+2+2

NodeC=

1 =0.17 1+2+1+2

Node D=

1 =0.2 1+2+1+1

Node E=

1 =0.17 1+ 2+ 2+ 1

Node

Betweennes Eigenvecto s r 0.5 Number of shortestApaths passing through v0.162 0.241 Number ofBshortest1.5 paths C 0.0 0.194 D 0.5 0.162 E 1.5 0.241 ¿0

Note: Betweenness centrality of all nodes when network density

Eigenvector Centrality

β centrality

¿1

Depends on both the number and quality of its connections Small value: Analysis weighted towards local structure surrounding the ego Positive Beta: Good for ego to be connected

For Node B , betweenness=A B C+ A B E/ ADE

¿ 1+ 0.5 Large value: Weighs towards wider network structure Negative Beta: Ego’s disadvantage

2

QUICK REFERENCE GUIDE

Metric

to highly central people

Cut Vertex

A vertex whose removal disconnects a graph Note: See Structural Balance An edge whose removal disconnects a graph Vertex Characteristics (Pivotal, Gatekeeper) A node X is Pivotal for a pair of distinct B is pivotal nodes Y and Z if X lies on every shortest for pairs A path between Y and Z & C, and A &D

Bridge

Pivotal Node

Gatekeeper Node

A node X is a Gatekeeper if for a pair of nodes Y and Z, every path from Y to Z

to be connected to others who are themselves well-connected

Gatekeeper  Pivotal

passes through X A node V is a Local Gatekeeper if there are two neighbors of V, Y and Z, that are not connected by an edge Gatekeeper/Pivotal  Local Gatekeeper Node A is a gatekeeper Node D is a local gatekeeper, but not a gatekeeper Comparison Generally, the 3 centrality types will be positively correlated, when they are not, it probably tells you something interesting about the network Low Degree High Degree

High Closen ess High Betwe enness

Key player tied to important important/active alters Ego's few ties are crucial for network flow Alter is super important, connected to a big chunk of the network

Low Closeness Embedded in cluster that is far from the rest of the network

Low Betweenness Ego's connections are redundant - communication bypasses him/her Alter connects to each other Probably multiple paths in the network, ego is near many people, but so are many others

Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.

3

QUICK REFERENCE GUIDE

SOCIAL GROUPS Total mutual Total connected Total dyads Reciprocity

Dyads (2 nodes) Undirected Directed Reciprocity

2 ties - Yes/No No, 1-way (which way), 2-way - Ratio of all dyads to reciprocated r/s

-

Ratio of all connected dyads to reciprocated r/s

Cliques Every member of a clique knows everybody else, i.e. Density

¿1

2

Undirected Directed Transitivity

x→ y→ z

Ye

x→ y→ z

Ye

Triadic Closure

Clans An N-clan is an N-clique where every pair h distance

 Any subset of nodes from a clique also forms a clique N-Clique Members within a N-clique are at most N distance away from each other Example { A, C, E } forms a 2clique BUT { A, B, C, E } is not a 2clique because B-E is 3 distance away

Triads (3 n

6 10 0, 1, 2, or 3 ties 2/10 16OR possible r/s (See below) N x→ y→ z 2/6

≤N

i.e., N-clan cannot use nodes outside the c

Example

{ A, B, C } is a 2-clan { A, C, E } is not a 2-

Note: Nodes in N-clique can depend on non-clique nodes to form the N-path

Clustering Clustering Coefficient

Clustering coefficient of an ego is defined as: - How well the alters are connected among themselves, i.e.:

actual ties Max ties

-

Agglomerat ive

Density in 1.5 degree egocentric network

Clustering coefficient of the entire network is the: Average of the clustering coefficients of ALL the nodes Clustering Algorithm Bottom up: Start from singleton and merge Divisive Top down: Start from cluste

4

QUICK REFERENCE GUIDE

STRUCTURAL BALANCE Triadic people

friends future.

Closure is the idea that if two in a social network have a friend in common, then there is an increased likelihood that they will become themselves at some point in the Strong Triadic Closure Property: a node A has edges to nodes B and C, then the B-C edge is likely to form A-B and A-C are both strong ties.

if if

AF=

1 6

AE=

2 5

∴ AE ismore of a local bridge

Structural Balance Property: For every set of three nodes, considering the three edges connecting them, either all three are

Node A violates the Strong Triadic Closure Property as there is no edge between B and C at all.

Bridge: An edge whose removal disconnects a graph Local Bridge: An edge whose removal results in a path

¿2

from its

labeled +, or else exactly one of them is labeled +

endpoints, A & B, i.e. A & B have no common friends



if A-B is a strong tie bridge, A/B cannot have a strong tie to

another node or it violates the Strong Triadic Closure Property Graded Measure for a local bridge:

number of nodes who are neighbors of both A∧B number of nodes whoare neighbors of at least one of A∧B

Weak Structural Balance Property: There is no set of three nodes s.t. the edges among them consist of exactly 2 +ve and 1 –ve,  If a graph is weakly balanced, its nodes can be divided into groups where every 2 nodes in the same group are friends and every 2 nodes in different groups are enemies Balance Theorem: If a labeled complete graph is balanced, then either all pairs of nodes are friends, or the nodes can be divided into two groups, X and Y, such that 1) every pair of nodes in X like each other, 2) every pair of nodes in Y like each other, and 3) everyone in X is the enemy of everyone in Y i.e. if a complete graph has 2 sets of mutual friends, with complete mutual antagonism between the two, it is balanced

5

QUICK REFERENCE GUIDE

INFORMATION FLOW 1.

Find any path from source to sink that has a positive flow capacity remaining. If no more such paths, exit

2.

Determine

f

, the maximum flow along this path, which is equal to

the smallest flow capacity on any arc in the path (the bottleneck arc) 3.

Subtract

f

from the remaining flow capacity in the forward

direction Add

Freeman’s formula for Network Centralization g

CD=

4.

(∑ [ i=1

C D ( n ¿ )−C D ( ni ) ]

)

each arc (if needed) Go to Step 1

A cut is any set of directed arcs containing at least one arc in every path from the source to the sink. The cut value is the sum of the flow capacities in the source-to-sink direction of all the arcs.

C D is centralizationof the network C D ( ni ) is degree centrality of node i C D ( n¿ ) is degree centrality of the highest centrality node g isthe number of nodes ∈thenetwork Centralization shows the degree of inequality or variance in the network as a percentage of that of a perfect star network of the same size0.

Max Flow/Min Cut (Flow, Capacity)

to the remaining flow capacity in the backwards direction for

Finding Min Cut

(g−1)(g−2)

Note: The star network is the most unequal network &

f

C D =1

Node 2 1-2: 2 1-3: 3 1-4: 4(2) 1-5: 2(1) 1-6: 4(2) 2-3: 0 2-4: 3 2-5: 1 2-6: 3 3-4: 1 3-5: 1 3-6: 2 4-5: 0 4-6: 2 5-6: 3

Node 3 1-2: 2 1-3: 3 1-4: 4(1) 1-5: 2(1) 1-6: 4(2) 2-3: 0 2-4: 3 2-5: 1 2-6: 3 3-4: 1 3-5: 1 3-6: 2 4-5: 0 4-6: 2 5-6: 3

By the max-flow min-cut theorem, the cut value of the min cut is equals to the max flow. UCINET: Network > Cohesion > Max Flow Flow Betweenness Let

m jk

vertex

be the amount of flow between vertex

k

which must pass through

flow. The flow betweenness of vertex

m jk

where

i ,

j

and

k

i

j

and

for any maximum

i

is the sum of all

are distinct and

j Centrality and Power > Flow Betweenness …

Finding Max Flow

Information Cascade

Bookkeeping Algorithm:

6

QUICK REFERENCE GUIDE

Conditional Probability:

P ( A|B )=

P ( A ) ∙ P ( A|B ) P( B)

There are four key conditions in an information cascade model: 1. Agents make decisions sequentially 2. Agents make decisions rationally based on the information they have 3. Agents do not have access to the private information of others 4. A limited action space exists (e.g. an adopt/reject decision Occurs when a person observes the actions of others and then— despite possible contradictions in his/her own private information signals—engages in the same acts

STUDY DESIGN 1. Basics: Measurements & Data Variable: Characteristic or property Scales: Nominal, Ordinal, Interval, Ratio Nomina l Ordinal Interval

Categorical; Qualitative e.g. Male, Female; North, South, East, West No concept of gap size:

e.g. first, second, third; primary, secondary, jc Gaps measured in continuous units Can perform

Ratio

a>b >c

+,−¿

+,−,× , ÷

e.g. dollars

What type of scale to use?0 -

-

Pivotal/Non-pivotal: Categorical Survey Ratings: Ratio Edge (Yes/No): Categorical Weighted edge (e.g., 1…10): Ratio 2. Data collection

Asking Responde nts

1) 2) 3) 4)

Experime nts Web Access Secondar y Data

-

Web crawling Blogs, forums, social media

1) 2) 3) 4)

Datasets on the internet (context) Reports Email Records Company transaction record

operations

e.g. Celcius Ratios can be compared Can perform

-

Degree Centrality: Ratio Betweenness Centrality: Ratio

operations

Simple Questions (e.g. age, education) Survey Type Questions Open-ended questions Roster choice method, i.e., respondents given a list (roster) of people and asked questions about them e.g. which of the following would you regard as a friend Measure variables

7

QUICK REFERENCE GUIDE

3. Steps in doing a social network study

8

QUICK REFERENCE GUIDE

Decide what to study Choose relevant populati on

Collect data Analyse Deduce Findings Report

What to study? The Hypothesis See Notes for examples

Variables Identify variables, consider independent variables e.g. Node properties, edge properties Level of Detail e.g. team email: sender, receiver, etc. Sampling Identify the population study is interested in - Roles/positions (directors/politicians) - Relationships (friends of …) - Events (participation/communication) - Time - Location Complete Population (Census) VS Random (ego) + snowball (alters) Refer to 2. Data Collection Mixture of qualitative, descriptive statistics, and statistical tests Statistics, and compare with prior studies Clear, meaningful and obvious graphs Introduction  Literature Review  Objective (Hypothesis)  Methodology  Analysis  Findings

9

UCINET CHEAT SHEET Display dataset

Data > Display (cntr-d)

NetDraw

Visualize > NetDraw To open a dataset: File > Open > Ucinet dataset > Network Data > Unpack

Separate files with multiple matrices Prepare Data Produce matrix from attributes Display Univariate Statistics Compute Network Metrics

Outputs matrix in selected data file

Note: Refer to Notes

Data > Data Editors > Matrix Editor (cntr-s) Data > Attribute to matrix Tools > Univariate Statistics (cntr-u) Note: Refer to Notes Network > Centrality and Input: .##h file

power > Multiple measures (cntr-m)

Test observed mean/density against a fixed value Find p-value against a fixed value

Test whether the density of the selected network is close to the Expected density.

Network > Compare densities > Against theoretical parameter

In this case, z-score is -3.7943 i.e. 3.79 s.d. to the left of expected density  observed density is significantly smaller than expected density of 1.0 as

Actual density as shown in UCINET Output in Display dataset

p-value Test of density (more than mean, takes into account of variability) difference between 2 networks

Network > Compare densities > Paired (same nodes)

Find p-value of 2 groups divided on node attributes

Compares Matrix VS Matrix.

Correlation between 2 networks with same actors

Tools > Testing Hypothesis > Dyadic (QAP) > QAP Correlation (cntr-q)

¿ 0.0002

Used to compare density difference between two networks. Good for testing time difference of the same network t-statistic p-value

¿ 2.4089 ¿ 0.0052 0.05

UCINET Output

 correlation is not significant

Analysis

Type Regressio n (you have control over the independe nt variable)

Tools > Testing Hypothesis > Dyadic (QAP) > QAP Regression > MR-QAP Linear Regression > Double Dekker: if no missing values Semi-partialling: missing values

T-test of 2 group means

Tools > Testing Hypothesis > Node-level > TTest

Compares Matrix VS Matrix

Compares Column VS Column

Look at R-sq first to see if model is a good fit. Then look at individual variables T-test used to test if there are differences between the means of two groups, in this case, whether the govt or non-govt groups have different out-degree centrality (col 1). Is one group bigger than the Result: No difference across groups, all p-other? values are

ANOVA for 2 or more groups

Tools > Testing Hypothesis > Node-level > Anova

¿ 0.05

Look at f-statistic and significance. Significance is the same as that of twotailed test. Note: Refer to Notes

Compares Column VS Column

Triad Undirected

X1 X2 X3 X 1 : No. of mutual dyads

D: Down U: Up T: Transitive C: Cyclic

X 2 : No. of asymmetric dyads X 3 : No. of null dyads

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF