top of page
Search

Exploring the Capabilities of Graph Neural Networks for Directed Multigraphs

  • Writer: Peter Johnson
    Peter Johnson
  • Dec 31, 2023
  • 11 min read
ree

I recently participated in a Kaggle competition which used a GNN model architecture. This prompted me to learn more about GNN research and articles. Additionally, for a school project, I was looking for datasets related to anti-money laundering. On Kaggle, I chanced upon IBM's Transactions for Anti Money Laundering (AML) dataset. I found their paper to be particularly insightful, and surprisingly, it utilized GNN! This sparked my interest to write this article. IBM的研究團隊於2023/06發表了一篇名為Provably Powerful Graph Neural Networks for Directed Multigraphs的論文,該論文提出了一個新的方法,加入具有方向性的指標,用來強化先前的GNN模型。並且在若干個金融交易的資料集上測試,證明該理論和方法正確且非常有潛力。本文旨在以容易理解的文字,用中文去解析該論文的重點與主要目標,如有不足之處,請參考原論文。 This paper presents a novel approach to performing deep learning computations that exploits the capabilities of constrained hardware devices. This paper introduces a new technique for conducting deep learning operations that takes advantage of the features of limited hardware devices. Previous feedback indicated confusion about the terminology we had been using, so here is a clarification: "Article" refers to this Medium post, and "paper" refers to the paper currently being discussed. To ensure consistent usage, this clarification will be observed from now on. Thanks for the feedback~ The following is a summary of the entire piece: This piece provides an overview of a particular topic, discussing the key points and providing a conclusion. It outlines the main ideas and offers a brief summary of the subject. It also includes useful information and resources to help readers gain further insight. This paper explores how to convert the standard GNN into a powerful Directed GNN. The authors proposed three methods, including Reverse Message Passing, Multigraph Port Numbering, and Ego IDs, and demonstrated that their combination of adaptations can detect any directed subgraph pattern. They proved the effectiveness of these adaptations through experiments on both synthetic subgraph detection tasks and real-world Financial Crime Analysis tasks. The results show that these adaptations achieved a significant improvement in the detection of financial crime transactions and phishing accounts. The paper also includes experiments on synthetic graphs that demonstrate the effectiveness of the adaptations in detecting complex subgraph patterns. At the same time, these adaptations were also applied to a real-world financial crime dataset, demonstrating a significant improvement in detecting money laundering transactions and phishing accounts. The paper provides a comprehensive analysis of the proposed adaptations and their practical impact on subgraph detection tasks, supported with theoretical and experimental results. Two – Background Introduction 2. Background Introduction Prior to entering into the method, it is necessary to understand the problem this paper aims to address. The author's goal was to explore a key question in financial crime detection: detecting patterns of money laundering. Past papers have already identified some common money laundering patterns (as shown in Figure 1), which are fairly generic and also present in many completely legitimate transactions. However, this genericity and presence in legitimate transactions make it difficult to rely on detecting a single pattern to identify financial crime. A need to learn various combinations arises, and this is where neural networks come into play. But the author points out that traditional GNNs can be ineffective for detecting these patterns, since financial transactions are directional and traditional GNNs cannot detect this property. To that end, the author introduces a Directed GNN, allowing the neural network to more effectively detect these particular patterns. Specifically, a directed graph G is given with nodes v ∈ V(G) representing individual accounts, and directed edges e = (u,v) ∈ E(G) representing transactions from u to v. Each node u has a set of account features h(0)(u), including account number, bank ID, and account balance. Each transaction e = (u,v) also has a set of features h(0)(u,v); including transaction amount, currency type, and time. The set of incoming neighbors and outgoing neighbors of u are denoted as N_in(u) and N_out(u), respectively. Multiple transactions exist between the same two accounts, making G a multi-graph. In the prediction task, each node and edge will have a binary label indicating whether the account and transaction is legal or illegal. For a Subgraph Detection task, the goal is to determine whether each node in a graph is present in a subgraph that is the same as a given subgraph pattern H (some common money laundering pattern). In other words, given a node v ∈ V(G) in a graph G, it is determined whether there exists a subgraph G′ with the same structure as H. MPNNs, a collective term for a family of GNNs, operate in such way that the information of each node is propagated along the edges, aggregating information from neighboring nodes, and finally updating its own state (h(v) in the below figure) according to the received message (a(v) in the below figure). This iterative process allows the neural network to collect and integrate information from different areas of the graph, thus enabling deeper and more comprehensive analysis and learning. Mathematically speaking, it looks like this: This is a typical approach for GNNs which do not consider the directionality, yet for directed graphs, such as those representing financial transactions, we need to distinguish between two accounts as to who is transferring money to whom. In other words, for one node (account), it is necessary to consider if the neighboring nodes and edges are out (money going out) or in (money coming in). In MPNN, required information is passed through directed edges in the indicated direction. As such, the aggregation step only considers the neighboring nodes with direction as 'in', as in the following equation: Taking the peripheral data into account, the equation would be as follows: Thirdly, methods Third, approaches Having now understood how directional GNNs work conceptually, we will now look into how these concepts are implemented. Continuing from the previous section, suppose there is a sub-chart today, as follows: If only considering nodes with direction in, i.e. the equation of Figure 3, the distinction between a and b cannot be made, since both nodes have only out-neighbors, and thus the state of the nodes would not be updated according to the information of the graph. Suppose now we go back to not considering directions, i.e. Figure 2, then a,d and b,c would not be distinguishable. A simpler explanation: If the initial state (a,b,c,d) is (0,0,0,0), then the way to update the state is to add 1 to the neighboring node to consider. In equation 3 of the diagram, since only the direction is considered, (a,b,c,d) will become (0,0,1,2). a and b are the same. In equation 2 of the diagram, (a,b,c,d) will become (2,1,1,2). I hope this helps to understand.每一個node (x)先更新它的本地資訊,然後取出它的outcoming邊e,更新位於底下的所有node (z),最後發送反向消息以更新位於上方的節點y。 To solve this problem, a different way of representing the direction of the edges was needed. The author proposed to deal with the in and out neighboring nodes separately, i.e., by adding reverse message passing. The aggregation and update mechanism then became: each node (x) first updates its local information, then extracts its outcoming edge e, updates all the nodes (z) at the bottom, and finally sends a reverse message to update the node y above. Building on the example above, a new approach has been devised for handling the information of adjacent nodes in different directions. For each node, if one of its neighbors has an "in" connection, the value increases by 1, while if one of the neighbors has an "out" connection, the value decreases by 1. So, the values for (a,b,c,d) can be transformed into (-2,-1,1,2), thus demonstrating the influence of each node's respective neighbors.方法為每個編號的端口分配一個唯一的編號。 The Directed Multigraph Port Numbering method assigns a unique identifier to each numbered port. In transaction networks, multiple transactions on the same account are often represented as parallel edges. To detect patterns of fan-in (multiple sources pointing to the same target) or fan-out (one source pointing to multiple targets), the model needs to differentiate edges from the same neighbour versus different neighbours. Using a unique account ID (node ID) does this naturally. However, relying on the account ID does not perform well in terms of generalization. During training, the model can memorize the fraudulent account numbers instead of learning to identify fraud patterns. Such an approach is not applicable for unknown accounts. The author thus applies the concept of port numbering to directed multi-graphs. Port numbering is a way of assigning local identifiers to each neighboring node, allowing nodes to recognize messages from the same neighbor in a continuous message-passing process. To apply port numbering to a graph, the author assigns a port number for each directed edge, both an inlet and an outlet, and edges coming from (or pointed to) the same node are assigned with the same inlet (or outlet) port number. Notably, unlike previous studies, port numbers are not only attached to received messages but are attached to each of the two directions. In other words, the node not only sees the port numbers assigned to neighbors, but also those assigned to itself by the neighbors. The author mentions that, in general, the port numbers assigned around a node are arbitrary, and a node with d incoming neighbors can be assigned entry-port numbers in d! ways. To break this symmetry, the author uses transaction timestamp to order incoming (or outgoing) neighbors. For parallel edges, the earliest timestamp is used to determine the order of the neighbors. This ordering is justified because transaction timestamp has a real-world significance in financial crime detection. The author emphasizes that even if the same subgraph pattern, different timestamp may have different meanings. The figure below shows an example of port numbers: 自我ID是一种描述个体认知和行为特征的特殊标识。 Self-IDs are special identifiers that describe an individual's cognitive and behavioral characteristics. Despite the fact that the previously mentioned reverse message passing and port numbering of the multiple graphs help in capturing the directed edges, some challenges still remain when dealing with cycles in the graph (e.g. g of the graph one). In order to tackle this issue, the author introduces the concept of Ego IDs which are specifically aimed at detecting cycles in graphs. This idea is to give a central node a unique binary characteristic, allowing it to identify when a sequence of messages has returned to it, thereby detecting its cycle. However, the proof for Proposition 2 in the paper is incorrect; Ego IDs do not enable cycle detection. The figure below provides a counterexample. The use of self-ID alone can offer some assistance in detecting short cycles, but has little effect when detecting longer ones. This can also be explained theoretically: if there are no cycles (edges from a node to itself) in the graph, paths of length 2 and 3 that return to the start node are also regarded as cycles, since there are no repeat intermediate nodes. By combining the approach of reverse message passing, port numbers, and self-ID, it is possible to detect all suspicious patterns such as loop (g in Figure 1), fan-out-aggregate pattern (d in Figure 1), and bi-uniform subgraph (c in Figure 1), thus completing the list in Figure 1. In fact, experiments have proven that a sufficiently powerful MPNN combined with these methods can discriminate between two graphs with any different structures. Furthermore, when using port numbers consistently, they will not mistakenly think two identical graphs are different.結果 Fifth, Experimental Results The paper has many proofs for the methods mentioned above, which I won't mention here. Please refer to the original paper for more details. The team conducted multiple tasks to test these methods, which included synthesis financial transaction datasets, and I find them quite interesting, so I would love to share them here. In addition, GIN, GAT and PNA were used as the comparison baseline models. 1. 聚類分析,2. 隨機森林,3. 卷積嵌入學習。 As previously mentioned several times, their first task was to detect the common money laundering patterns seen in Figure 1. Looking at the results, it is noted that the “Multi-” method was used, which combined three new approaches: 1. Clustering Analysis, 2. Random Forests, and 3. Convolutional Embedding Learning. From the results of the compositional subgraph detection shown in Table 9, it is evident that the deg-out result (43.58) suggests that traditional GNNs (columns 1–4) are unable to differentiate out-oriented subgraphs. Nevertheless, all GNNs with reversed message passing abilities (columns 7–9) scored above 98%, and a combination of reversed message passing and port numbering (column 8) was required to achieve a score over 99% for fan-out mode. Combining reversed message passing, port numbering and self-ID (column 9) yielded high scores in all sub-tasks, albeit with a score below 90% only in 6-cycle detection. Results of columns 8 and 9 indicate a considerable improvement in the detection of cyclic patterns when self-ID is incorporated, which indicates that the combination of these three approaches significantly aids in the detection of directed graphs. Construct an anti-money laundering dataset Due to the stringent privacy regulations of banking data, real world financial crime data is not publicly available. Consequently, the author used the simulated money laundering data released by IBM recently on Kaggle. The underlying simulation generator for the datasets simulated agents (banks, companies and individuals) in a virtual world to generate a financial transaction network. The generator used known money laundering patterns to increase the realism of the money laundering transactions. I find this data generator quite interesting, because when constructing such a model for crime prevention, a major problem is data imbalance and insufficiency, so it's very important to simulate data in a reasonable way. In future there will be opportunities to introduce how this dataset is constructed. The author employed two small data sets, one with a high illegality ratio (HI) and the other with a low illegality ratio (LI), as well as a medium-sized set. Let's take a look at the outcomes. In the Small HI dataset, we found that our methods improved the GIN model's F1 score from 28.7% to 59.2%, an increase of more than 30%. The largest improvement came from Reverse Message Passing, raising the F1 score from 28.7% to 46.8%. Self-IDs here seemed to have no significant effect. Results from other AML datasets show a similar trend, with an overall gain of 16.5% and 14.0% for the GIN model, and again, gains gradually diminish with the addition of more new methods. The largest individual gain came from Ports, but usually the three adaptive approaches together were need for the highest scores. The two lines corresponding to the inclusion of Ports, GIN+Ports (column 8) and +Ports (column 10), demonstrate that the use of port numbers on top of Reverse Message Passing provides better results. For Self-IDs, sometimes it even decreases the performance of the GIN model, and has little effect on the score when added on top of Reverse Message Passing and Ports. This behavior may not be surprising as Self-IDs are meant to tag/distinguish central nodes and their neighbourhoods. Theoretically, the same goal may be achieved through appropriately modified input features. It is noteworthy that out of the three AML datasets, Multi-PNA performed better than LightGBM + GFs, which has been the most advanced approach for financial applications prior to this. Fishing for Ether Coins Dataset This dataset comes from the Ethereum transaction network and was released on Kaggle. It involves transactions on the Ethereum blockchain, with some nodes labeled as phishing (Phishing) accounts. Since banks do not publicly disclose their data, the author turned to the cryptocurrency domain to obtain a real-world dataset. At the last column of the above graph, similar to the AML dataset, the final score continued to rise as we added more enhancements. During the whole process, the F1 score was increased from 26.9%, without any new method, to 42.9% with the usage of reverse message passing, port number, and self-ID. Similarly, the largest single improvement came from the reverse message passing. In this dataset, GIN+ improvement did not surpass all baseline models (such as PNA and LightGBM+GFs performing better), but the improvement method also obviously improved the performance of PNA, making it more than 13% better than all baseline models. 綜上所述,我們看到了許多有趣的可能性 In conclusion, we observed numerous fascinating possibilities. This paper proposes a series of simple and intuitive new methods to transform general GNNs into directed multi-graph neural networks and demonstrates that these methods can theoretically detect any directed subgraph pattern. Experimental results further verify the effectiveness of these methods, including tasks such as detecting financial crime transactions and phishing accounts. The combinations of these adapted methods achieve significant improvement in various subgraph detection tasks, and exhibit excellent performance when processing the real-world financial crime dataset. Therefore, the results of this study provide new possibilities for the applications of GNNs, and offer important references for future research in the field of directed multi-graph analysis. This concludes our discussion; please do not hesitate to point out any mistakes and provide additional feedback. Thank you for reading. Note: All images are taken from the original paper.

 
 
 

Comments


bottom of page