Download 2014_Finite state machine based flow analysis for WebRTC applications.pdf...
Finite State Machine based Flow Analysis for WebRTC Applications Sergej Alekseev
Christian von Harscher
Marco Schindler
Fachhochschule Frankfurt am Main University of Applied Sciences Nibelungenplatz 1 60318 Frankfurt am Main, Germany email: alekseev@f
[email protected] b2.fh-frankfurt.de
Fachhochschule Frankfurt am Main University of Applied Sciences Nibelungenplatz 1 60318 Frankfurt am Main, Germany email:
[email protected] [email protected] -frankfurt.de
Fachhochschule Frankfurt am Main University of Applied Sciences Nibelungenplatz 1 60318 Frankfurt am Main, Germany email: mschindl@stud
[email protected] .fh-frankfurt.de
open source project. We present some experimental results and statistics based on this implementation at the end of the paper.
Abstract—This paper presents an approach for analysing the Abstract—This behaviour of WebRTC - based applications, typically being used for direct audio or video brow browserser-to-br to-browser owser communication. communication. The app approa roach ch is bas based ed on fini finite-s te-stat tatee mac machin hines es der derive ived d fro from m the WebR ebRTC TC spe specific cificati ation. on. A sta state te cha change nge of a WebR ebRTC TC peer involved in the communication process generates an event which is collecte collected d and analysed. We present algorithms for analysing the coll collect ected ed eve events nts and gen genera eratin ting g var variou iouss sta statis tistics tics abo about ut a WebRTC session. Finally we present some experimental results based on the library named WebRTCStateAnalyser ebRTCStateAnalyser.. This self-made library libr ary is an ope open n sou source rce project project whi which ch is av availa ailable ble under the Apache License. Index Terms erms—WebRTC, —WebRTC, Finite State Machine, Pattern Matching
I I . R ELATED W ORKS The ide ideaa to use fini finitete-sta state te mac machin hines es for mod modell elling ing the networ net work k pro protoc tocols ols and dis distri tribu buted ted sys system temss is kno known wn for a long time. Comm Communic unicatio ation n prot protocol ocolss are ofte often n model modelled led as a network of two finite-state machines that communicate by exchanging over unbounded channels [3], [4], [5]. There are a lot of pub public licati ations ons dea dealin ling g wit with h fini finitete-sta state te mo model del bas based ed tests [6], [7] which are partially related to our approach. Very close clo se to our app approa roach ch are pub public licati ations ons for dia diagno gnosti stics cs and fault localisation problems based on finite-state machines [8], [9]. Some recent publications are [10] and [11]. However, our approach has its focus on collecting and examining statistics for WebRTC based applications. The seco second nd algor algorithm ithm (Ev (Event ent Seque Sequence nce Patt Pattern ern Matc Matching) hing) in this paper uses the structural comparison to recognize an event sequence. There are a lot of known algorithms for text pattern matching. The Boyer-Moore [18], Quick-search [19] or Horspool [20] algorithms and their variants are widely used in the software industry. They are in general close to our algorithms, but not suitable to perform the structural comparison. For our app approa roach, ch, we ada adapte pted d the pat path h pro profili filing ng alg algori orithm thm published publi shed by Ball and Larus [17], origi originall nally y dev develope eloped d for program profiling. The essential idea behind the algorithm is to identify sets of potential event sequences which are encoded as integer values. The comparison costs are thereby reduced to a minimum. The costs are limited to the calculating of an integer value for an event sequence to be analysed.
I. I NTRODUCTION WebR ebRTC TC (W (Web eb Real Time Comm Communic unicatio ation) n) is a comm commuunication standard developed by the W3C [1] in close cooperation erat ion with the RTCW TCWeb eb stan standard dard developed developed by the IETF [2].. WebR [2] ebRTC TC of offer ferss de deve velop lopers ers the abi abilit lity y to cre create ate web multimedia applications for real time communication without plugins, downloads or installations. In this paper we present an approach which allows to represent the state of a WebRTC based application by collecting and analysing the state events. The field of application could be e.g. error handling, collecting statis sta tistic tics, s, det detect ection ion of fra fraud ud att attemp empts ts or ana analys lysis is of use userr behaviour. The main idea of the proposed approach is to collect events generated by WebRTC peers when a change of a state occurs. For modelling the possible WebRTC states and flows we derive the finite-state machines from the WebRTC specification [1]. The sta state te cha change nge ev event entss are gen genera erated ted asy asynch nchron ronous ously ly by WebRTC peers and this process does not have any influence on the overlaying application. The collected events are validated by the pro propos posed ed alg algori orithm thm to det detect ect il ille legal gal sta state te cha change ngess according accor ding to the WebR ebRTC TC spec specificat ification ion [1]. The seco second nd algorithm gorit hm reco recognize gnizess the pred predefined efined sequences sequences of eve event nt stat statee changes to create various statistics. To evaluate our approach, the open source library WebRTCStateAnalyser has has been impl implemen emented ted and test tested ed with various various WebRTC applications. The implemented library may be easily integrated into any WebRTC application and is available as an
978-1-4799-4233-6/14/$31.00 978-1-4799-4233-6/14 /$31.00 ©2014 IEEE
III. W EB RTC F INITE S TATE M ACHINES Finite-s Fini te-state tate mach machine ine (als (also o call called ed a finite finite-sta -state te auto automato maton) n) provides a simple computational model for modelling software systems. A finite-state machine (FSM) is defined formally as a 3-Tuple 3-Tuple,, (V , σ , E ) consisting of a finite set of states V , a finite fini te set of inp input ut sym symbol bolss σ and a trans transitio ition n func function tion E : V × σ → V . A FSM is usu usuall ally y rep repres resent ented ed as a dir direct ected ed graph G = (V, E ), whe where re V is a set of nodes and E is a set of edges. The finite set of input symbols σ (corresponding actions) is assigned to the edges of the Graph.
1
☛✡ ☛✡ ☛✡
We deri derive ve from the WebR ebRTC TC spec specificat ification ion [1] foll followi owing ng finite-state machines: • • •
(1)
The state of the entire system Z s is defined by the equation 1, where s ∈ V s , g ∈ V g and c ∈ V c . When the WebRTC peer within a browser has been created [1],, it has a sig [1] signal naling ing state, state, an ICE ga gathe therin ring g sta state te [14 [14], ], and an ICE con connec nectio tion n sta state te [14 [14]. ]. The WebR ebRTC TC pee peerr has two associated stream sets: a local streams set, representing stream str eamss tha thatt are cur curren rently tly sen sent, t, and a rem remote ote str stream eamss set set,, represent repr esenting ing stre streams ams that are curr currentl ently y rece receiv ived. ed. Once the WebR ebRTC TC peer has been init initiali ialised sed the meth methods ods createOffer , setLocalDescription, createAnswer and setRemoteDescription are executed to initialise the connection. The following steps are performed:
•
•
•
c✲
❄
c
✟ ✠ ✟ ✠ ✟ ✠
have-remote-pranswer
❅ g b ❅ g ❅ ❄ ❄ ✠ g ❅ ❘ ✲ stable closed ■ ✒g ✻ ❅ ✻ ❅ g e d d ❅ ❄ f ❅ have-local-pranswer have-remote-offer ✲ ✻ e ✻ f b
The specification defines the set of states for each finite state machine as an enum structure. The sets of transitions (edges) are derived from the methods or event handlers.
•
✟ ✠ ☛✡ ✟ ✠ ☛✡ ✟ ✠ ☛✡
a
have-local-offer
= (V s , E s ) (section III-A). Signaling state machine S = Gathering state machine G = (V g , E g ) (section III-B). = (V c , E c ) (section III-C). Connection state machine C =
Z s = (s , g , c)
❄
✻ a
Fig. 2.
Signalin Sign aling g state state machine machine..
the close method (fig. 2: transition g) the connection is closed and the state is changed to closed . Further transitions are: • • • •
setRemoteDescription(pranswer) (fig. 2: transition c) setLocalDescription(answer) (fig. 2: transition d) setRemoteDescription(offer) (fig. 2: transition e) setRemoteDescription(pranswer) (fig. 2: transition f)
The typ typee pranswer indicat indicates es tha thatt a des descri cripti ption on sho should uld be The WebRTC peer starts gathering ICE addresses and sets treated as a non final answer. the ICE gathering state to gathering. If one or more candidate pairs have been found the ICE B. Gathering state machine connection state is changed to connected . The gathering state definitions from [1] are represented in When the WebRTC WebRTC peer finishes checking of all candidate fig.. 3. The fig The st stat atee new represe represents nts the stat statee of the WebR ebRTC TC pairs the ICE connection state is changed to completed . If the ICE con connec nectio tion n sta state te is connected or completed enum RTCI RTCIceGat ceGatheri heringSta ngState te { and both the local and remote session descriptions have "new", "gathering", received a valid SDP offer / answer pair, the ICE con"complete" nection state is set to stable. };
The ne next xt sub subsec sectio tions ns des descri cribe be the for formal malisa isatio tion n of the signaling, gathering and connection state machines.
Fig. 3.
A. Signaling state machine
peer before any networking actions have been executed. The transition from the state new to the state gathering is executed if the WebRTC peer starts the process of gathering candidates to setup a connection. The WebRTC peer changes the state to
To mode modell the signalin signaling g stat statee mach machine ine the following following stat statee definit defi nition ionss (fig (fig.. 1) fro from m [1] are use used. d. The corres correspon pondin ding g
☛✡ ☛✡ ☛✡
enum RTCS RTCSignal ignalingS ingState tate { "stable", "have-local-offer", "have-remote-offer", "have-local-pranswer", "have-remote-pranswer", "closed" }; Fig. 1.
Gatherin Gath ering g state statess
new
❄
✟ ✠ ✟ ✠ ✟ ✠
gathering
❄ ✻
complete
Fig. 4.
Signalin Sign aling g states and and transition transitionss
Gathering Gath ering stat statee machine. machine.
complete if it has completed gathering. Events such as adding a new interface will cause the state to go back to gathering. The corresponding gathering state machine G is shown in fig. 4.
signaling state machine S is represented in fig. 2. The state stable is the initial state in which case the local and remote descriptions are empty and there is no offer/answer exchange in progress. The execution of methods createOffer and setLocalDescription will will cau cause se the sta state te cha change nge fr from om stable to have-local-offer (fig. 2: transition a ). The methods createAnswer and setRemoteDescription will change the state from have-local-offer to to stable (fig. 2: transition b). By calling
C. Conne Connection ction state machine machine
The state definitions of the connection state machine from [1] are repr represen esented ted in fig. 5. The corresponding corresponding connection connection
2
enum RTCIc RTCIceCon eConnecti nectionSta onState te { "new", "checking", "connected", "completed", "failed", "disconnected", "closed" }; Fig. 5. 5.
Definitio Defin ition n of connecti connection on states. states.
state mach state machine ine C is pre presen sented ted in fig. 6. The state state new represent res entss the state in whi which ch a WebR ebRTC TC pee peerr wai waits ts unt until il the gathering process is completed and all connection candidates are determined. The state new changes to state checking if the
Fig. 7. 7.
a
☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ❄ ❄ ✛a new ✲ b ✻ ✻ a a
WebR ebRTC TC call state state analyse analyserr
a
c ✲
checking
❅ g ❅ ❘
d ✲ ✻ c
connected
g ✠
An event is defined by a unique id, peer id pid, session id sid, time timestam stamp p t and the sta state te of the entire entire sys system tem Z s accord acc ording ing to the equation equation 1. In the tuple notation notation an ev event ent can be formalised as:
completed d
✻
closed
e
❅ ■g g ✒ g ✻ ✻ g ❅
❄
failed
Fig. 6.
f
f
✠ ❄ ❅ f✲ f disconnected
e = (id id,, pi pid,sid d,sid,, t,Z s ),
(2)
where the sta where state te Z s is an el elem emen entt of th thee co comb mbin inat atio ion n of signaling, gathering and connection states Z s ∈ V s × V g × V c as described in the section III. The sequences of events are programma progr ammatica tically lly deri derived ved from the coll collecte ected d raw ev events. ents. A sequence of events Q is defined as an ordered set of elements:
Connectio Conn ection n state state machine. machine.
WebRTC peer received the remote candidates and is able to check candidate pairs (fig. 6: transition b ). The state connected is reached after the gathering process is fully completed and a connection for at least one component has been found (fig. 6 transition c ). The state completed is is reached after the WebRTC peer has dete determin rmined ed conne connectio ctions ns for all comp componen onents ts (fig. 6: transition d ). ). If the WebRTC peer is not able to determine any connections then the n the sta state te checking change changess to the state failed (fig. (fig. 6: transition e). The state disconnected is reached if the established connection is lost for one or more components (fig. 6: transition f ). The sta state te closed is re reac ache hed d fr from om an any y st stat ates es wh when en th thee WebR ebRTC TC peer is shut down (fig. 6: transition transition g). Restart of the WebRTC peer causes that the model changes back to the state new from any state (fig. 6: transition g).
Q = { . . . , en , en+1 , . . .}, ∀en , en+1 ∈ Q : sidn = sid n+1 ∧ tn < tn+1
(3)
All events in the sequence Q belong to the same session and are sorted in topological order by the time stamp. The session id is created at the start of the session. All peers involved into the same communication process have the same session id. An example of an event sequence is presented in fig. 8. V. ANALYSING W EB RTC EVENTS The analysis of the collected WebRTC events is realized in two steps. In the first step the sequence of events is validated for each particular peer that is involved in the communication process (subsection V-A). In the second step application specific patterns are recognised by the pattern matching algorithm (subsection V-B).
IV.. C OLLECTING W EB RTC E VENTS IV
A. Validation Algorithm
During the communication process WebRTC peers generate event ev entss asy asynch nchron ronous ously ly whe when n the st state ate of a pee peerr has bee been n changed. The idea of the WebRTC event based state analyser framewo fram ework rk is to impl implemen ementt ev event ent handlers which log these events, store them in a database and then analyse and validate these events algorithmically. The fig. 7 shows the architecture of the WebR ebRTC TC ev event ent based state analy analyser ser fram framewo ework. rk. The event handler interface [1] represents a callback method and event eve nt types as defined in [15] [15].. The integratio integration n of the eve event nt handler interface is realised by including the javascript on the HTML page of the WebRTC application. The code modification of the WebRTC application is not necessary.
The sequence of events Q is divided into subsets of events P [ ] corresponding to each peer as follows:
Q =
n
P i , ∀ pn , pn+1 ∈ P : pid n = pidn+1
(4)
i=1
The subset of events P includes events generated by a single peer and sorted in topological order by the time stamp.
∀ pn , pn+1 ∈ P : t n < tn+1
(5)
In fig fig.. 8 we ha have ve th thee fir first st su subs bset et P 1 with with the ids {23, 25,, 28, 33 25 33,, 35 35,, 37 37,, 44} and sub subset set P 2 = {24 24,, 26 26,, 27 27,,
3
id 23 24 25 26 27 28 33 34 35 36 37 38 44 45
pid 0x01 0x02 0x01 0x0 0x 02 0x02 0x 02 0x01 0x 01 0x01 0x02 0x01 0x02 0x0 0x 01 0x0 0x 02 0x01 0x02
sid 0x421ff0d 0x421ff0d 0x421ff0d 0x4 x42 21f 1ff f0d 0x42 0x 421f 1ff0 f0d d 0x421 0x4 21ff ff0d 0d 0x421ff0d 0x421ff0d 0x421ff0d 0x421ff0d 0x4 x42 21f 1ff f0d 0x4 x42 21f 1ff f0d 0x421ff0d 0x421ff0d
time stamp ’03-19-2014 ’03-19-2014 ’03-19-2014 ’03’0 3-1 199-2 201 014 4 ’03-1 ’03 -199-20 2014 14 ’03-1 ’03 -199-20 2014 14 ’03-19-2014 ’03-19-2014 ’03-19-2014 ’03-19-2014 ’03’0 3-1 199-2 201 014 4 ’03’0 3-1 199-2 201 014 4 ’03-19-2014 ’03-19-2014
13:27:18.276’ 13:27:18.345’ 13:27:18.678’ 13: 3:2 27: 7:1 18. 8.7 701 01’ ’ 13:2 13 :27: 7:18 18.7 .723 23’ ’ 13:27 13: 27:1 :19. 9.75 755’ 5’ 13:27:20.765’ 13:27:20.821’ 13:27:20.987’ 13:27:20.998’ 13: 3:2 27: 7:2 21. 1.0 067 67’ ’ 13: 3:2 27: 7:2 22. 2.1 126 26’ ’ 13:27:32.567’ 13:27:33.663’ Fig. 8.
state = { Vs, Vg, Vc } { stable, new, { stable, new, { have-local-offer, new, { ha have ve-re rem mot ote e-o -off ffe er, ne new w, { ha have ve-r -rem emot otee-of offe fer, r, ga gath ther erin ing, g, { ha have ve-l -loc ocal al-o -off ffer er, , ga gath ther erin ing, g, { stable, complete, { stable, complete, { stable, complete, { stable, complete, { st stab abl le, com co mpl ple ete te, , { st stab abl le, com co mpl ple ete te, , { closed, complete, { closed, complete,
1) or invalid (value = 0). The valid combination of states is ... [stable], [have[ha ve-loc localal-off offer] er], , [hav [h avee-lo loca call-of offe fer] r], , [closed], ... Fig. 10. 10.
Fig. 9.
[new], [gatherin [gathe ring], g], [new [n ew], ], [gathering],
[new] [new] [new] [clo [c lose sed] d] [new]
=1 =1 =0 =0
An excerpt excerpt from from the combinat combination ion matrix. matrix.
✩
derived from the [1]. I.e. all the events from fig. 8 are valid. The verification step is to find the appropriate combination and to read the corresponding value from the matrix. The runtime complexi comp lexity ty of this step is O(|Q|) and depends only on the number num ber of sto stored red ev event ents, s, bec becaus ausee the size of the matrix matrix is constant.
VALIDA ALIDATE TE EVENT SEQUENCE( Q, F SM ) 1 last = NULL; 2 for ( each e ∈ Q in topological order ) { 3 if ( las lastt != NUL NULL L ){ 4 STATE ATE ( F SM , e.state, last ) ) { if (! FIND ST return false; 5 6 } } 7 last = e.state 8 } 9 return true; FIND ST STA ATE TE(( F SM , state, last ) 1 for (each incoming edge of state state ∈ F SM ) { 2 if( otherend(edge, state) == last ) { 3 return true; 4 } } 5 return false;
✫
} } } } } } } } } } } } } }
Raw Even Events ts examp example le
34, 36, 38, 45}. The validation validation is exe execute cuted d for each subs subset et P the procedure V ALID IDA ATE EV ENT SEQ SEQUENCE UENCE to validate the sequence of signaling, gathering and connection states. Additionally the combination of states for each event is validated by the combination validation algorithm. 1) Val alid idat atin ing g st state atess fo forr a si sing ngle le peer : The foll followi owing ng algorithm demonstrates the validation of events generated by a particular peer. The method otherend() returns the other state connected to the given transition and state.
✬
new new ne w new ne w new ne w new new new new checking checking con co nne nec cte ted d con co nne nec cte ted d closed closed
B. Event Sequence Pattern Matching Algorithm
PatternPatte rn-mat matchi ching ng is rou routin tinely ely use used d in va vario rious us com comput puter er applications, for example, in editors, retrieval of information (from text, image, or sound). The proposed matching algorithm matc ma tche hess th thee se sequ quen ence ce of ev even ents ts wi with th a se sett of pr pred edefi efine ned d patterns. Each sequence pattern consists of three event state sequences Qs , the sequence of signaling states, Qg , the sequence of gathering states, and Q c , the sequence of connection states. Some typical patterns of WebRTC events are presented in the following list:
✪ •
Algorith Algo rithm m for validati validating ng states states for a single peer
Successful session: } Qs = { have-local-offer → stable→ closed } Qg = { new → gathering → complete }
→ completed → closed } Qc = { new → checking → connected →
The runtime complexity of the procedure V ALID IDA ATE EV ENT SEQ SEQUENCE UENCE is O(|Q||E |), where |Q| is the num number ber of pro proces cessed sed ev event entss and |E | the number num ber of edg edges es (tr (trans ansiti itions ons)) of the rel relev evant ant fini finite te sta state te machin mac hine. e. Sin Since ce the num number ber of tra transi nsitio tion n in all mac machin hines es is const constant, ant, the runtime complexity complexity is O(|Q|). I. I.e. e. the tw two o sequences P 1 and P 2 from fig. 8 are valid. 2) Validati alidating ng combinatio combination n of states: states: To validate the combination of states for single events, we created a static three dimensional matrix (fig. 10) representing all combinations of signaling, gathering and connection states. The values of the matrix defines if the combination of states is valid (value =
•
Session failed: } Qs = { have-local-offer → closed } Qg = { new → gathering }
→ closed } } Qc = { new → checking → failed →
The direct way to match the pattern is to compare the sequence of events and the pattern in the forward direction element by element. This process is repeated for each predefined pattern until a pattern is matched or not matched. This approach is commonly known as a brute-force method. The proposed algorithm is based on structured comparison and allows to minimise the number of calculation steps. The
4
☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟ ☛✡ ✠✟
essential essent ial ide ideaa beh behind ind the alg algori orithm thm is to ide identi ntify fy set setss of potential paths with states which are encoded as integers. For this purpose the path profiling algorithm algorithm ASSIGN LABELS from Ball and Larus [17], orig original inally ly dev develop eloped ed for prog program ram profiling (fig. 11), is used. The path profiling algorithm works efficiently for directed acyclic graphs (DAG). Therefore, the finite state machine has to be transformed into a DAG (fig. 12: line 1). The Ball-Larus algorithm labels edges in a DAG with integer values, such that each path from the entry to the exit of the DAG produces a unique sum of the assigned edge values along that path (fig. 12: line 2). Each sequence pattern
✬
2
1
1
0
1
2
✲
2
completed 2
❄
❄
disconnected
❄✛ ✲ ❄ ✲ closed ✛
1
1
0
❄ ✛ 0 ✲ EXIT ✛ ✲ ✛
0✲
4✲
connected
0
0
Fig. 13. Fig. Trans Tr ansfor formed med Connec Connecti tion on FSM with with values values compute computed d by th thee algorithm ASSI ASSIGN GN LABEL LABELS S in fig. 11.
ASSIGN LABELS ASSIGN LABELS(( F SM ) 1 for ( each vertex v ∈ V in reverse topological order ) { 2 if ( v is is a leaf leaf vert vertex ex ){ 3 NumPaths NumP aths(v) (v) = 1; 4 } else { 5 NumPaths NumP aths(v) (v) = 0; 6 for( each edge e = v → w ) { 7 Val(e) = NumPaths(v) NumPaths(v);; 8 NumPaths(v) = NumPaths(v) + NumPaths(w); 9 } } }
V al (new → EXIT ) V al (new → closed → EXIT ) V al (new → checking → EXIT ) V al (new → checking → closed → EXIT ) V al (new → checking → failed → EXIT ) ... V al (new → checking → con conne nect cted. ed. . .)
=0 = 1+0 =1 = 2+0 = 2 = 2+1+0 = 3 = 2+ 2 + 0 = 4 = 2 + 6 . . . = 15
number of possible paths (sequence patterns) is calculated as: 16 × 3 × 17 = 816. 2) Calcu Calculating lating path value for an event sequence: sequence: The path value va lue of an ev event ent seq sequen uence ce is cal calcul culate ated d by fol follo lowin wing g the corresponding transition and adding the assigned values to the path value. Because the FSMs have been transformed to a DAG, the cycles have to be removed from the sequence. This limitation may cause the situation that two different sequence patterns are reduced to the same template. This limitation is nevertheless acceptable, because there are only a few use cases where this situation may occur.
✪
Fig. 11. Ball-Larus-Alg Ball-Larus-Algorithm orithm for assigning assigning values values to edges in a DAG DAG [17].
is represented by an integer value as a sum of three paths:
V al(Qs ) + V al(Qg )