Federated Learning for Bank Statement Parsing
As the prevalence of Machine Learning (ML) systems in our day-to-day products rises, so does the need for ensuring data privacy. There is some data that we may not feel comfortable sending to an external server. Another problem is that the service provider itself may not want to bear the responsibility that comes with keeping this data. At the same time, we still want to be able to use this data to train ML models.
To solve the problem of ensuring privacy, we can use a technique called federated learning. The idea is simple: The client doesn't send the data to the server, instead the model comes to the client!
We used this method to solve the task of extracting transactions from German bank statements. Users pass in their scanned bank statements and get out the transactions e.g. as a CSV file. This is clearly highly sensitive data that should not be given away easily.
Since this is part of a desktop application, another reason for using this technique is that the functionality is also available offline.
The Role of Federated Learning in Bank Statement Parsing
In this case study, federated learning enabled us to offer a secure and efficient solution without compromising on usability or privacy. Traditional approaches to bank statement parsing would typically involve uploading the scanned document to a centralized server where the machine learning model processes it. However, with federated learning, the model itself is sent to the user's device where it is trained locally. This ensures that no sensitive financial data ever leaves the user's device.