The Listing Checker

The Challenge

A company that provides a service to their customers to post their business on websites such as Yelp and Foursquare hired me to build them an automated accuracy checker for the sites that they used to post listings for their clients. Before RDD got involved, the listings would be posted (over 80 individual websites) and each one would be checked for accuracy. This was both time consuming and unreliable.

The Solution

RDD created a simple web application where they first form-entered their client's listing information, as well as a list of websites it was posted to, in a controlled and standardised way. Then, they could verify the accuracy of each individual website which used the parallel nature of the cloud to run discreet workers that would crawl each listing site and compare the results with the form-entered data. The result was that listings could be checked in a matter of seconds and all in parallel rather than one at a time.

CloudFormation

CloudFormation was used exclusively, as it is with all RDD projects, because it makes for a consistent deployment to the client's own environment. The entire solution is built within the RDD developement environment, then, upon client satisfaction, the same stack is deployed in the client's own AWS account to ensure continuity.

Additional benefits, beyond the consistency, are that CloudFormation is also version controlled which promotes exploratory work at low risk. If a change breaks the application or infrastructure, CloudFormation can easily be rolled back using a previous version pulled from source.

Lambda

AWS Lambda was the real workhorse, here. As every single listing site (Yelp, Foursquare, and so on) has their own way of displaying data, each domain got their own Lambda function. In practice, as each site was triggered for inspection, Lambda would reach out and screen-scrape the targeted listing and parse it with CSS selectors (using Cheerio), and finally return to the browser with structured data, just like it as it is stored in the database for easy comparison.

DynamoDB

To store the data for quick read and write access, I leveraged DynamoDB. The flexibility of DynamoDB allowed me to store the unstructured data coming back from the various listing sites which did not always list the same information. For example, not all sites allowed for the entry of opening hours for businesses, where others did. DynamoDB allowed me to store non-linear JSON structures with a high performance index and IO for super-fast access to the data.

DynamoDB can be tricky especially with indexing, so it was important to come up with an index solution that met the needs of the application.

S3

S3 is much more than just a storage platform. S3 provides a very low cost, highly available and performat front-end to static content. As the site was built entirely using Angular (HTML/CSS) and made use of client-side libraries (JS), the entire application can be hosted from S3 without the need to maintain servers or http serving configurations.

© Copyrights Solid. All Rights Reserved

Created with Solid template by TemplateMag