Skip to content

Core Data (Market Insights Data) Overview

This document explains the infrastructure and workflow for Core Data (also known as Market Insights Data), which powers our analytics and search features.


Infrastructure

  • Core Data Infra:

    • Defined in infra/core-data/core-data.ts.
    • Uses AWS OpenSearch for storing and searching records.
  • Production Playground:

    • Located in us-west-2 on the prod account.
    • Contains all records and acts as a sandbox for production-like data.
  • Developer Clusters:

    • Each developer has their own OpenSearch instance.
    • Indices are synced from the production playground using queries defined in the mmv3 config.
    • This allows each dev to work with a relevant subset of data in isolation.

Data Syncing

  • Index Syncing:
    • Each index in a dev cluster is populated by running a query against the production playground.
    • The query is specified in your mmv3 config, allowing you to control which data is synced for development.

Event Topics (SNS)

  • SNS Topics:
    • There are topics for upserts (triggered as data is added) and for job completions (when a pipeline finishes processing).
    • This event-driven approach allows the pipeline to be easily extended for new use cases.

Meilisearch

  • Meilisearch:
    • Another important search instance for this data.
    • Can also be synced for developers, providing fast, full-text search capabilities on your dev data subset.

Summary

  • Production playground holds all data and is the source of truth.
  • Devs get isolated clusters with synced subsets for safe, realistic development.
  • Event-driven pipelines (via SNS) make it easy to extend and react to data changes.
  • Meilisearch provides additional search capabilities and can be used in dev environments as well.