Methodology

How we find signals, score them, and turn them into articles.

The Content Pipeline

1
Collect
RSS feeds + HTML scraping from 16 Japanese government and industry sources. robots.txt respected. Rate-limited.
2
Extract
Body text extracted via trafilatura (HTML) or pypdf (PDFs). Min. 200 chars to proceed.
3
Classify
Keyword matching against 7 categories and 16 investment themes. LLM assist for low-confidence items.
4
Score
7-axis scoring (0–5 each, max 35). Only items scoring ≥15 proceed.
5
Generate
Claude API with journalist-voice prompt. Deterministic humanize pass applied automatically.
6
Review
Human editor reviews draft. [VERIFY] markers block publish until resolved.
7
Publish
Manual publish command moves draft to published/. Translations generated for all 4 locales.

Signal Scoring

Only signals scoring 15 or above out of 35 proceed to the article generation stage.

Investment Relevance — Does this signal have a direct connection to an investment or business decision?

Foreign Utility — Is this information specifically useful to a non-Japanese reader?

Novelty — Is this fresh — not yet covered by mainstream financial press?

Market Scale — How large is the market or policy budget involved?

Niche Depth — Does this cover a Japan-specific angle unavailable elsewhere?

SEO Potential — Is there demonstrated search demand for this topic?

Shareability — Does this contain a number, ranking, or angle that spreads?

Article Generation

Qualifying signals are drafted using a custom Claude API prompt designed around journalist voice — not consulting language. The prompt explicitly bans the patterns that make AI writing detectable: uniform sentence length, filler transitions, vague quantifiers, and marketing copy. A deterministic post-processing pass then replaces any remaining AI-typical phrases using a curated list of 80+ substitutions.

Human Review Gate

Every draft is saved to a staging directory and reviewed by a human editor before publication. Drafts containing unverified factual claims (marked [VERIFY] by the generation system) are blocked from publishing until the editor resolves them. No article is ever auto-published.

Translations

English articles are the authoritative source. Translations to Hindi, French, and Simplified Chinese are generated via the Claude API with explicit instructions to preserve company names, yen figures, and proper nouns unchanged. Translation quality is reviewed spot-check for each language.