Appearance
Core Data (Market Insights Data) Overview
This document explains the infrastructure and workflow for Core Data (also known as Market Insights Data), which powers our analytics and search features.
Infrastructure
Core Data Infra:
- Defined in
infra/core-data/core-data.ts. - Uses AWS OpenSearch for storing and searching records.
- Defined in
Production Playground:
- Located in
us-west-2on the prod account. - Contains all records and acts as a sandbox for production-like data.
- Located in
Developer Clusters:
- Each developer has their own OpenSearch instance.
- Indices are synced from the production playground using queries defined in the
mmv3config. - This allows each dev to work with a relevant subset of data in isolation.
Data Syncing
- Index Syncing:
- Each index in a dev cluster is populated by running a query against the production playground.
- The query is specified in your
mmv3config, allowing you to control which data is synced for development.
Event Topics (SNS)
- SNS Topics:
- There are topics for upserts (triggered as data is added) and for job completions (when a pipeline finishes processing).
- This event-driven approach allows the pipeline to be easily extended for new use cases.
Meilisearch
- Meilisearch:
- Another important search instance for this data.
- Can also be synced for developers, providing fast, full-text search capabilities on your dev data subset.
Summary
- Production playground holds all data and is the source of truth.
- Devs get isolated clusters with synced subsets for safe, realistic development.
- Event-driven pipelines (via SNS) make it easy to extend and react to data changes.
- Meilisearch provides additional search capabilities and can be used in dev environments as well.