{"id":1892,"date":"2025-06-03T09:07:29","date_gmt":"2025-06-03T09:07:29","guid":{"rendered":"https:\/\/violethoward.com\/new\/how-sp-is-using-deep-web-scraping-ensemble-learning-and-snowflake-architecture-to-collect-5x-more-data-on-smes\/"},"modified":"2025-06-03T09:07:29","modified_gmt":"2025-06-03T09:07:29","slug":"how-sp-is-using-deep-web-scraping-ensemble-learning-and-snowflake-architecture-to-collect-5x-more-data-on-smes","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/how-sp-is-using-deep-web-scraping-ensemble-learning-and-snowflake-architecture-to-collect-5x-more-data-on-smes\/","title":{"rendered":"How S&P is using deep web scraping, ensemble learning and Snowflake architecture to collect 5X more data on SMEs"},"content":{"rendered":" \r\n
\n\t\t\t\t
\n

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n


\n<\/div>

The investing world has a significant problem when it comes to data about small and medium-sized enterprises (SMEs). This has nothing to do with data quality or accuracy \u2014 it\u2019s the lack of any data at all.\u00a0<\/p>\n\n\n\n

Assessing SME creditworthiness has been notoriously challenging because small enterprise financial data is not public, and therefore very difficult to access.<\/p>\n\n\n\n

S&P Global Market Intelligence, a division of S&P Global and a foremost provider of credit ratings and benchmarks, claims to have solved this longstanding problem. The company\u2019s technical team built RiskGauge, an AI-powered platform that crawls otherwise elusive data from over 200 million websites, processes it through numerous algorithms and generates risk scores.\u00a0<\/p>\n\n\n\n

Built on Snowflake architecture, the platform has increased S&P\u2019s coverage of SMEs by 5X.\u00a0<\/p>\n\n\n\n

\u201cOur objective was expansion and efficiency,\u201d explained Moody Hadi, S&P Global\u2019s head of risk solutions\u2019 new product development. \u201cThe project has improved the accuracy and coverage of the data, benefiting clients.\u201d\u00a0<\/p>\n\n\n\n

RiskGauge\u2019s underlying architecture<\/h2>\n\n\n\n

Counterparty credit management essentially assesses a company\u2019s creditworthiness and risk based on several factors, including financials, probability of default and risk appetite. S&P Global Market Intelligence provides these insights to institutional investors, banks, insurance companies, wealth managers and others.\u00a0<\/p>\n\n\n\n

\u201cLarge and financial corporate entities lend to suppliers, but they need to know how much to lend, how frequently to monitor them, what the duration of the loan would be,\u201d Hadi explained. \u201cThey rely on third parties to come up with a trustworthy credit score.\u201d\u00a0<\/p>\n\n\n\n

But there has long been a gap in SME coverage. Hadi pointed out that, while large public companies like IBM, Microsoft, Amazon, Google and the rest are required to disclose their quarterly financials, SMEs don\u2019t have that obligation, thus limiting financial transparency. From an investor perspective, consider that there are about 10 million SMEs in the U.S., compared to roughly 60,000 public companies.\u00a0<\/p>\n\n\n\n

S&P Global Market Intelligence claims it now has all of those covered: Previously, the firm only had data on about 2 million, but RiskGauge expanded that to 10 million. \u00a0<\/p>\n\n\n\n

The platform, which went into production in January, is based on a system built by Hadi\u2019s team that pulls firmographic data from unstructured web content, combines it with anonymized third-party datasets, and applies machine learning (ML) and advanced algorithms to generate credit scores.\u00a0<\/p>\n\n\n\n

The company uses Snowflake to mine company pages and process them into firmographics drivers (market segmenters) that are then fed into RiskGauge.\u00a0<\/p>\n\n\n\n

The platform\u2019s data pipeline consists of:<\/p>\n\n\n\n