Data in Context

In the enterprise world, data isn’t just a by-product—it’s the connective tissue linking BSS, OSS, and ERP. BSS is usually where data first appears, whether it's a customer signing up, an order being placed, or a billing event triggered. OSS kicks in to provision the service, monitor its health, and keep things ticking, while ERP quietly reads from both and handles the heavy lifting: revenue, workforce, stock, and compliance. Between them, they touch every part of the CRUD lifecycle—create, read, update, delete—often in that exact order. But here’s the kicker: without clear data lineage, you’re just hoping everything adds up. Lineage tells you who changed what, when, and why—a breadcrumb trail that makes the difference between confidence and chaos when someone asks, “Where did this number come from?”

And that’s the point. Understanding the data is how you understand the soup to nuts of an organisation. If the data’s been captured, any question can be answered—either directly at the level of detail or by combining it, aggregating it, or surmising something new. That’s where AI starts earning its keep. But if the data hasn’t been captured, the question can’t be answered—not without some serious enhancement or guesswork. In short, data isn’t just a reflection of the business—it is the business. And knowing its lineage is how you stay in control of the story it tells.

Which brings us to security and control. If you want to manage data properly—never mind securing it—you need to know where it lives, where it’s working, and where it’s on the move. That means understanding data at rest (in databases, files, and lakes), in use (in memory, being computed), and in flight (moving between systems or modules). You need to know where it’s persisted, where it’s just hanging around in RAM, and where it’s crossing boundaries—especially if those boundaries span clouds, partners, or regulatory zones. Otherwise, you’re not really managing data—you’re crossing your fingers and calling it governance.

In fact, it might be more accurate to see this as a continuum: at one end, a data-centric (DC) approach where decisions, systems, and safeguards are built on a clear understanding of the data’s lifecycle and flow; at the other, a narrative driven (ND) approach, where assumptions stand in for facts, and structure gives way to improvisation. Most organisations wobble somewhere in between—but only the DC end of the spectrum holds up under scrutiny, scale, or regulation of organised data. 

 

Positioning Statement:

🗄️ "Think of a star schema like the well-organised filing cabinet. Not one of those bottomless pits full of dog-eared manila folders, but one where the files are sorted by topic, clearly labelled, and easy to grab. You only get value from it when you can actually find what you’re looking for — without rummaging around like a Victorian detective. 

That’s the point — good structure makes useful answers possible.

It’s not about picking sides between Data Centric (DC) and Narrative-Driven (ND) methods like they’re rival soap opera characters. They’re not opposites — they’re more like dance partners.
1) First you organise the data (Star Schema 🧩)**
2) Then you put it to work (Procedural Logic ⚙️).

 

** Semi-structured data lends itself to flattening for integration, while metadata from unstructured sources can be harnessed to introduce structure and improve discoverability.

 

TL;DR: 🧭 Star Schema gives you the organisation; procedure gives you the motion. To move before being organised means you want arrive at your destination in an effective and efficient way.

Useful definitions for platforms involved with data lineage

What It Stands For What It Actually Does Where You’ll See It in the Wild
📊 Business Support System (BSS) Handles the bits the customer sees — orders, billing, product setup, CRM. Basically, everything that makes money. Taking an order online, billing it, changing a phone plan — all BSS territory.
⚙️ Operations Support System (OSS) Deals with the plumbing — provisioning, fault management, network operations. Keeps the lights on. Think service activation, monitoring outages, and network diagnostics. If it’s behind the scenes and blinking, it’s OSS.
🧱 Enterprise Resource Planning(ERP) The digital HQ — finance, HR, supply chain, procurement. One giant database trying to keep the company from eating itself. Payroll, expense approvals, stock levels, invoice runs. Usually backed by SAP, Oracle, or similar enterprise beast.
🔄 Enterprise Application Integration (EAI) The glue. Connects systems that were never meant to speak to each other, but now have to. Syncing data between ERP and CRM, integrating legacy systems, handling API chaos. If systems are chatting nicely, thank EAI.

Data states

📦 State 🧠 What It Means 🔐 Why It Matters
🛏️ Data at Rest Stored data that’s not moving—sitting in a database, data lake, disk, or archive. Must be encrypted and access-controlled. It’s a sitting duck if not locked down.
⚙️ Data in Use Data that’s being processed—held in memory, calculated, edited, or queried. Vulnerable in RAM. Needs runtime protections like masking or secure enclaves.
🚚 Data in Flight Data moving between systems—across networks, APIs, or between components. Needs strong encryption in transit (TLS, VPN). Otherwise, it’s open to interception.

Quick Summary:

  • At Rest – The data’s asleep. That doesn’t mean it’s safe—someone could still walk off with the hard drive.
  • In Use – The data’s awake and working—visible in memory, spreadsheets, or dashboards. Sensitive? Handle with gloves.
  • In Flight – The data’s on the move. You’d be mad to send it unbuckled. Strap it in with TLS or secure tunnels.

ETL & ELT Tools – Comparison Table

Tool Style Strengths Watch Outs Best Fit
Informatica ETL Enterprise-grade; strong governance, lineage, scheduling Heavyweight; steep learning curve; pricey Legacy systems, regulated sectors, hybrid cloud
Azure Data Factory ELT Works well with Azure stack; flexible hybrid pipelines UI can be awkward; lacks native data profiling Azure-first orgs, cloud migrations, MS tech shops
Matillion ELT Great for Snowflake/Redshift; visual, fast to deploy Debugging is fiddly; less useful beyond cloud DWs Snowflake/BigQuery-focused data teams
Fivetran ELT Low-maintenance; fast SaaS ingestion Limited transform layer; can get costly at scale MVPs, quick BI enablement, SaaS integrations
Talend ETL Open-source option; scripting-friendly Java-heavy; needs tuning for performance Custom ETL, open-source-friendly orgs
dbt ELT Top-tier SQL transformations; Git/versioned Only handles 'T' in ELT; no ingestion/load Modelling in the warehouse or lakehouse
Apache NiFi Stream ETL Real-time flows; visual routing; flexible control Not built for BI; steeper learning curve IoT, event streaming, on-prem/cloud hybrids
Databricks Workflows ELT Unified batch/stream/ML; built for scale Can be overkill for simple pipelines AI/ML pipelines, streaming-first use cases
Airbyte ELT Open-source; flexible ingestion connectors Still maturing; more dev hands-on Teams wanting custom pipelines, no lock-in

Star Schema Structure

Every transaction (stuff that gets done) becomes meaningful when seen through the right dimensions (perspectives, because marketing and sales always agree with accounts!). That’s why I use star schema thinking on every assignment—clarity, context, and perspectives built in to understand and question the facts!

Aspect Kimbal's Motivation (I like Kimbal)
User-Centric Design Edgar (Ted) F. Codd invented the relational model and formalised both relational algebra and relational calculus—laying the foundation for modern data systems. While the term OLAP (Online Analytical Processing) was popularised later, Codd’s influence reached deep into analytical processing. Ralph Kimball, together with Margy Ross and the Kimball Group, focused on making relational databases more effective for analytical (OLAP-style) workloads. They championed dimensional modelling—especially the star schema—which intentionally trades strict normalisation for performance, usability, and alignment with how business users naturally think. My seminal conclusion is their bottom-up methodology brought analytics closer to real-world decision-making. In short, Kimball, Ross, and their team translated complex data into something usable, scalable, and intuitive—and did so brilliantly. So why not build your data understanding on a foundation of star schema thinking?
Simplicity and Ease of Use (increased autonomy) The star schema gives a clear, intuitive structure that mirrors user understanding, enabling self-service analysis with less reliance on  technical intermediaries.
Query Performance Optimized for performance, star schemas respond quickly to users’ queries, supporting real-time insights and fluid exploration of business questions.
Data Consistency Conformed dimensions ensure users see consistent definitions and metrics across reports, building trust and reducing confusion.
Flexibility A single star schema can support diverse reporting needs and dashboards, adapting to users’ evolving analytical questions.
Scalability Business processes can be added incrementally, allowing data warehouse growth to follow user demand and organizational priorities.
Separation of Concerns Keeping facts and dimensions distinct reflects how users separate events from descriptive context, aiding cognitive clarity.
Historical Tracking Slowly Changing Dimensions (SCDs) model how context changes over time, crucial for trend, audit and ownership analysis.
Agility in Development Star schemas are easy to sketch, build and refine, keeping pace with users’ shifting needs in agile business environments. Excel supports star schemas useful for early proof of concept activities.
Star Schema's can be used in conjunction with other data modelling techniques Star schemas are complementary to the other data modelling approaches.
Aspect Best Practice Concern if Absent
Use of Standard Models Star Schemas, Snowflake Schemas, 3NF, Data Vault, etc. are well-documented, proven, and widely adopted. Invented models may lack scalability, interoperability, and clarity—making governance and analytics harder.
Transparency Clear, auditable structures allow for validation, collaboration, and easy onboarding. Custom diagrams may obscure assumptions, hide poor design choices, or create consultant dependency.
Alignment with Tools BI tools (like Power BI, Tableau, Looker) are optimized for star schema-type models. Non-standard models often require complex, inefficient queries and diminish performance.
Client Empowerment Good models enable in-house teams to maintain, enhance, and use data confidently. Vague or unique diagrams can make the client overly reliant on the consultancy.

Star schema is effective across all stages of a project: 

STAGE ROLE OF STAR SCHEMA BENEFITS
Requirements Elicitation Helps identify key business entities (e.g., Employees, Companies, Requisitions) and measurable events (facts) Clarifies scope early by focusing on facts and dimensions that align with user needs
Process Modelling Maps business processes (e.g., purchase request approval) to fact-dimension relationships Visualizes how data flows through systems; supports stakeholder understanding
Solution Design Guides the logical and physical design of data tables using a central fact table with surrounding dimensions Promotes consistency, modularity, and scalability of the solution
Solution Build Guides the logical and physical design of data tables using a central fact table with surrounding dimensions Promotes consistency, modularity, and scalability of the solution
Integration Testing Enables validation of data consistency and correctness through test cases across joined fact and dimension tables Easier to test due to normalized structures and deterministic joins
User Acceptance Testing (UAT) Users validate that facts and dimensions reflect the business logic and expected outputs Increases confidence and traceability, as data is easy to verify by business users
Analytics Supports fast, aggregated queries for trends, anomalies, and patterns using metrics from the fact table Enables powerful, performant analytics over large datasets
Business Intelligence (BI) Star schema is the foundation for semantic models in tools like Power BI, Tableau, or Looker Provides intuitive structure for drag-and-drop exploration and dashboarding
Reporting Powers tabular and visual reports (e.g., PR totals by department, employee, date) Reports remain performant and readable due to dimensional clarity and optimized joins
Historic Data Selection (a function of owner and time) A star schema is ideal for repeatable, transparent reporting, especially in self-service BI tools like Power BI. Its structured fact-and-dimension model supports reusable filters, easy slicing across dimensions, and systematic handling of changes over time. Procedural logic is better suited for one-off data extractions or complex filtering that’s hard to model. While it offers precise control, it lacks transparency and reusability, making it less efficient for evolving or repeatable reporting needs.

Prepping Star Schema Data for AI

Right then — if you want to get anything useful out of AI and machine learning, the first job is clearing up the mess. That’s where the star schema earns its keep.

AI doesn’t come with built-in knowledge of your business. It has no clue what a “customer” is, doesn’t care what a “product” does, and doesn’t naturally link a sales figure to anything meaningful. All it sees is columns of numbers and the vague hope of a pattern. So, it’s on us to lay the data out properly.

Think of a star schema as a neat table setting before the guests arrive. In the middle: the main course (facts like transactions, sales amounts, etc.). Around it: all the trimmings — customers, products, dates — the context that makes the numbers make sense.

Why bother?

  • It’s tidier to work with — no tangled joins or cryptic table mazes.
  • It’s quicker for AI to process — because you’ve done the heavy lifting up front.
  • And crucially, the patterns it finds actually mean something — not just statistical gibberish.

We’re not trying to teach the machine what a “dimension” is. We just feed it clean, well-structured data it can chew through and learn from — no need for a data science sermon.

Bottom line: we do the groundwork with a star schema so the AI can get on with being clever. Simple as that.

Data centricity upstages procedural centricity

Aspect Option 1: Star Schema Structure Option 2: Procedural Logic
Definition A data model where a central fact table (e.g., Purchase Requisition) is surrounded by dimension tables (e.g., Employee, Company Code, Cost Center). A set of step-by-step procedural rules written in code (e.g., SQL stored procedures, Python logic) to determine entitlement dynamically.
Lookup or Legwork: The algorithm is efficient when more than one occurrence is possible Fact Table: Purchase Requisition Dimensions: Employee, Company, Cost_Center, Material, Plant. A keyed lookup indicates whether the selection criteria has been met. The algorithm is inefficient when there is more than one occurrence. is possible IF employee.company_code = (112233 OR 334455) AND NOT EXISTS in (445566 OR 667788) THEN allow—entitlement ELSE reject.
Reusability High – data model is reusable for different queries and reporting Low – logic must be rewritten or refactored for other use cases
Performance Optimized with joins and indexed tables; faster for large-scale analytics Slower for large datasets; logic must scan and evaluate for each transaction
Transparency Easy to understand data relationships; business users can query Logic is often hidden in code; requires technical knowledge
Auditability Clear lineage and logs via tables Harder to trace logic changes over time
Scalability Scales better with large enterprise systems Procedural logic becomes complex and error prone as data volume grows
Entitlement Flexibility Entitlement rules can be modelled through relationships in dimensions (e.g., linking only active employees in Newco) Requires logic to handle every exception and state manually