/Data

Data in Context

In the enterprise world, data isn’t just a by-product—it’s the connective tissue linking BSS, OSS, and ERP. BSS is usually where data first appears, whether it's a customer signing up, an order being placed, or a billing event triggered. OSS kicks in to provision the service, monitor its health, and keep things ticking, while ERP quietly reads from both and handles the heavy lifting: revenue, workforce, stock, and compliance. Between them, they touch every part of the CRUD lifecycle—create, read, update, delete—often in that exact order. But here’s the kicker: without clear data lineage, you’re just hoping everything adds up. Lineage tells you who changed what, when, and why—a breadcrumb trail that makes the difference between confidence and chaos when someone asks, “Where did this number come from?”

And that’s the point. Understanding the data is how you understand the soup to nuts of an organisation. If the data’s been captured, any question can be answered—either directly at the level of detail or by combining it, aggregating it, or surmising something new. That’s where AI starts earning its keep. But if the data hasn’t been captured, the question can’t be answered—not without some serious enhancement or guesswork. In short, data isn’t just a reflection of the business—it is the business. And knowing its lineage is how you stay in control of the story it tells.

Which brings us to security and control. If you want to manage data properly—never mind securing it—you need to know where it lives, where it’s working, and where it’s on the move. That means understanding data at rest (in databases, files, and lakes), in use (in memory, being computed), and in flight (moving between systems or modules). You need to know where it’s persisted, where it’s just hanging around in RAM, and where it’s crossing boundaries—especially if those boundaries span clouds, partners, or regulatory zones. Otherwise, you’re not really managing data—you’re crossing your fingers and calling it governance.

In fact, it might be more accurate to see this as a continuum: at one end, a data-centric (DC) approach where decisions, systems, and safeguards are built on a clear understanding of the data’s lifecycle and flow; at the other, a narrative driven (ND) approach, where assumptions stand in for facts, and structure gives way to improvisation. Most organisations wobble somewhere in between—but only the DC end of the spectrum holds up under scrutiny, scale, or regulation of organised data.

Positioning Statement:

🗄️ "Think of a star schema like the well-organised filing cabinet. Not one of those bottomless pits full of dog-eared manila folders, but one where the files are sorted by topic, clearly labelled, and easy to grab. You only get value from it when you can actually find what you’re looking for — without rummaging around like a Victorian detective.

That’s the point — good structure makes useful answers possible. ✅

It’s not about picking sides between Data Centric (DC) and Narrative-Driven (ND) methods like they’re rival soap opera characters. They’re not opposites — they’re more like dance partners.
1) First you organise the data (Star Schema 🧩)**
2) Then you put it to work (Procedural Logic ⚙️).

** Semi-structured data lends itself to flattening for integration, while metadata from unstructured sources can be harnessed to introduce structure and improve discoverability.

TL;DR: 🧭 Star Schema gives you the organisation; procedure gives you the motion. To move before being organised means you want arrive at your destination in an effective and efficient way.

Useful definitions for platforms involved with data lineage

What It Stands For	What It Actually Does	Where You’ll See It in the Wild
📊 Business Support System (BSS)	Handles the bits the customer sees — orders, billing, product setup, CRM. Basically, everything that makes money.	Taking an order online, billing it, changing a phone plan — all BSS territory.
⚙️ Operations Support System (OSS)	Deals with the plumbing — provisioning, fault management, network operations. Keeps the lights on.	Think service activation, monitoring outages, and network diagnostics. If it’s behind the scenes and blinking, it’s OSS.
🧱 Enterprise Resource Planning(ERP)	The digital HQ — finance, HR, supply chain, procurement. One giant database trying to keep the company from eating itself.	Payroll, expense approvals, stock levels, invoice runs. Usually backed by SAP, Oracle, or similar enterprise beast.
🔄 Enterprise Application Integration (EAI)	The glue. Connects systems that were never meant to speak to each other, but now have to.	Syncing data between ERP and CRM, integrating legacy systems, handling API chaos. If systems are chatting nicely, thank EAI.

Data states

📦 State	🧠 What It Means	🔐 Why It Matters
🛏️ Data at Rest	Stored data that’s not moving—sitting in a database, data lake, disk, or archive.	Must be encrypted and access-controlled. It’s a sitting duck if not locked down.
⚙️ Data in Use	Data that’s being processed—held in memory, calculated, edited, or queried.	Vulnerable in RAM. Needs runtime protections like masking or secure enclaves.
🚚 Data in Flight	Data moving between systems—across networks, APIs, or between components.	Needs strong encryption in transit (TLS, VPN). Otherwise, it’s open to interception.

Quick Summary:

At Rest – The data’s asleep. That doesn’t mean it’s safe—someone could still walk off with the hard drive.
In Use – The data’s awake and working—visible in memory, spreadsheets, or dashboards. Sensitive? Handle with gloves.
In Flight – The data’s on the move. You’d be mad to send it unbuckled. Strap it in with TLS or secure tunnels.

ETL & ELT Tools – Comparison Table

Tool	Style	Strengths	Watch Outs	Best Fit
Informatica	ETL	Enterprise-grade; strong governance, lineage, scheduling	Heavyweight; steep learning curve; pricey	Legacy systems, regulated sectors, hybrid cloud
Azure Data Factory	ELT	Works well with Azure stack; flexible hybrid pipelines	UI can be awkward; lacks native data profiling	Azure-first orgs, cloud migrations, MS tech shops
Matillion	ELT	Great for Snowflake/Redshift; visual, fast to deploy	Debugging is fiddly; less useful beyond cloud DWs	Snowflake/BigQuery-focused data teams
Fivetran	ELT	Low-maintenance; fast SaaS ingestion	Limited transform layer; can get costly at scale	MVPs, quick BI enablement, SaaS integrations
Talend	ETL	Open-source option; scripting-friendly	Java-heavy; needs tuning for performance	Custom ETL, open-source-friendly orgs
dbt	ELT	Top-tier SQL transformations; Git/versioned	Only handles 'T' in ELT; no ingestion/load	Modelling in the warehouse or lakehouse
Apache NiFi	Stream ETL	Real-time flows; visual routing; flexible control	Not built for BI; steeper learning curve	IoT, event streaming, on-prem/cloud hybrids
Databricks Workflows	ELT	Unified batch/stream/ML; built for scale	Can be overkill for simple pipelines	AI/ML pipelines, streaming-first use cases
Airbyte	ELT	Open-source; flexible ingestion connectors	Still maturing; more dev hands-on	Teams wanting custom pipelines, no lock-in

Star Schema Structure

Every transaction (stuff that gets done) becomes meaningful when seen through the right dimensions (perspectives, because marketing and sales always agree with accounts!). That’s why I use star schema thinking on every assignment—clarity, context, and perspectives built in to understand and question the facts!

Aspect	Kimbal's Motivation (I like Kimbal)
User-Centric Design	Edgar (Ted) F. Codd invented the relational model and formalised both relational algebra and relational calculus—laying the foundation for modern data systems. While the term OLAP (Online Analytical Processing) was popularised later, Codd’s influence reached deep into analytical processing. Ralph Kimball, together with Margy Ross and the Kimball Group, focused on making relational databases more effective for analytical (OLAP-style) workloads. They championed dimensional modelling—especially the star schema—which intentionally trades strict normalisation for performance, usability, and alignment with how business users naturally think. My seminal conclusion is their bottom-up methodology brought analytics closer to real-world decision-making. In short, Kimball, Ross, and their team translated complex data into something usable, scalable, and intuitive—and did so brilliantly. So why not build your data understanding on a foundation of star schema thinking?
Simplicity and Ease of Use (increased autonomy)	The star schema gives a clear, intuitive structure that mirrors user understanding, enabling self-service analysis with less reliance on technical intermediaries.
Query Performance	Optimized for performance, star schemas respond quickly to users’ queries, supporting real-time insights and fluid exploration of business questions.
Data Consistency	Conformed dimensions ensure users see consistent definitions and metrics across reports, building trust and reducing confusion.
Flexibility	A single star schema can support diverse reporting needs and dashboards, adapting to users’ evolving analytical questions.
Scalability	Business processes can be added incrementally, allowing data warehouse growth to follow user demand and organizational priorities.
Separation of Concerns	Keeping facts and dimensions distinct reflects how users separate events from descriptive context, aiding cognitive clarity.
Historical Tracking	Slowly Changing Dimensions (SCDs) model how context changes over time, crucial for trend, audit and ownership analysis.
Agility in Development	Star schemas are easy to sketch, build and refine, keeping pace with users’ shifting needs in agile business environments. Excel supports star schemas useful for early proof of concept activities.
Star Schema's can be used in conjunction with other data modelling techniques	Star schemas are complementary to the other data modelling approaches.

Aspect	Best Practice	Concern if Absent
Use of Standard Models	Star Schemas, Snowflake Schemas, 3NF, Data Vault, etc. are well-documented, proven, and widely adopted.	Invented models may lack scalability, interoperability, and clarity—making governance and analytics harder.
Transparency	Clear, auditable structures allow for validation, collaboration, and easy onboarding.	Custom diagrams may obscure assumptions, hide poor design choices, or create consultant dependency.
Alignment with Tools	BI tools (like Power BI, Tableau, Looker) are optimized for star schema-type models.	Non-standard models often require complex, inefficient queries and diminish performance.
Client Empowerment	Good models enable in-house teams to maintain, enhance, and use data confidently.	Vague or unique diagrams can make the client overly reliant on the consultancy.

Star schema is effective across all stages of a project:

STAGE	ROLE OF STAR SCHEMA	BENEFITS
Requirements Elicitation	Helps identify key business entities (e.g., Employees, Companies, Requisitions) and measurable events (facts)	Clarifies scope early by focusing on facts and dimensions that align with user needs
Process Modelling	Maps business processes (e.g., purchase request approval) to fact-dimension relationships	Visualizes how data flows through systems; supports stakeholder understanding
Solution Design	Guides the logical and physical design of data tables using a central fact table with surrounding dimensions	Promotes consistency, modularity, and scalability of the solution
Solution Build	Guides the logical and physical design of data tables using a central fact table with surrounding dimensions	Promotes consistency, modularity, and scalability of the solution
Integration Testing	Enables validation of data consistency and correctness through test cases across joined fact and dimension tables	Easier to test due to normalized structures and deterministic joins
User Acceptance Testing (UAT)	Users validate that facts and dimensions reflect the business logic and expected outputs	Increases confidence and traceability, as data is easy to verify by business users
Analytics	Supports fast, aggregated queries for trends, anomalies, and patterns using metrics from the fact table	Enables powerful, performant analytics over large datasets
Business Intelligence (BI)	Star schema is the foundation for semantic models in tools like Power BI, Tableau, or Looker	Provides intuitive structure for drag-and-drop exploration and dashboarding
Reporting	Powers tabular and visual reports (e.g., PR totals by department, employee, date)	Reports remain performant and readable due to dimensional clarity and optimized joins
Historic Data Selection (a function of owner and time)	A star schema is ideal for repeatable, transparent reporting, especially in self-service BI tools like Power BI. Its structured fact-and-dimension model supports reusable filters, easy slicing across dimensions, and systematic handling of changes over time.	Procedural logic is better suited for one-off data extractions or complex filtering that’s hard to model. While it offers precise control, it lacks transparency and reusability, making it less efficient for evolving or repeatable reporting needs.

Prepping Star Schema Data for AI

Right then — if you want to get anything useful out of AI and machine learning, the first job is clearing up the mess. That’s where the star schema earns its keep.

AI doesn’t come with built-in knowledge of your business. It has no clue what a “customer” is, doesn’t care what a “product” does, and doesn’t naturally link a sales figure to anything meaningful. All it sees is columns of numbers and the vague hope of a pattern. So, it’s on us to lay the data out properly.

Think of a star schema as a neat table setting before the guests arrive. In the middle: the main course (facts like transactions, sales amounts, etc.). Around it: all the trimmings — customers, products, dates — the context that makes the numbers make sense.

Why bother?

It’s tidier to work with — no tangled joins or cryptic table mazes.
It’s quicker for AI to process — because you’ve done the heavy lifting up front.
And crucially, the patterns it finds actually mean something — not just statistical gibberish.

We’re not trying to teach the machine what a “dimension” is. We just feed it clean, well-structured data it can chew through and learn from — no need for a data science sermon.

Bottom line: we do the groundwork with a star schema so the AI can get on with being clever. Simple as that.

Data centricity upstages procedural centricity

Aspect	Option 1: Star Schema Structure	Option 2: Procedural Logic
Definition	A data model where a central fact table (e.g., Purchase Requisition) is surrounded by dimension tables (e.g., Employee, Company Code, Cost Center).	A set of step-by-step procedural rules written in code (e.g., SQL stored procedures, Python logic) to determine entitlement dynamically.
Lookup or Legwork:	The algorithm is efficient when more than one occurrence is possible Fact Table: Purchase Requisition Dimensions: Employee, Company, Cost_Center, Material, Plant. A keyed lookup indicates whether the selection criteria has been met.	The algorithm is inefficient when there is more than one occurrence. is possible IF employee.company_code = (112233 OR 334455) AND NOT EXISTS in (445566 OR 667788) THEN allow—entitlement ELSE reject.
Reusability	High – data model is reusable for different queries and reporting	Low – logic must be rewritten or refactored for other use cases
Performance	Optimized with joins and indexed tables; faster for large-scale analytics	Slower for large datasets; logic must scan and evaluate for each transaction
Transparency	Easy to understand data relationships; business users can query	Logic is often hidden in code; requires technical knowledge
Auditability	Clear lineage and logs via tables	Harder to trace logic changes over time
Scalability	Scales better with large enterprise systems	Procedural logic becomes complex and error prone as data volume grows
Entitlement Flexibility	Entitlement rules can be modelled through relationships in dimensions (e.g., linking only active employees in Newco)	Requires logic to handle every exception and state manually

Generic ERPs

Data in Context

Positioning Statement:

Useful definitions for platforms involved with data lineage

Data states

Quick Summary:

ETL & ELT Tools – Comparison Table

Star Schema Structure

Star schema is effective across all stages of a project:

Prepping Star Schema Data for AI

Bottom line: we do the groundwork with a star schema so the AI can get on with being clever. Simple as that.

Data centricity upstages procedural centricity

Data: the grail we can actually find, even if not obvious.

Project management&Minimum Viable Products

Cloud: like a DC, just no weird humming noise

AI/ML: Raw data into gold — or at least a dashboard.

Articles: like a blog post, but with more opinions.

Tools and perhaps a Blueprint to adapt & apply