How much do you pay for a line of code
Isn't it odd that we spend so much time, effort, and money designing, architecting, and building applications yet cannot
answer basic questions about our operations?
Here are a few more questions that your IT organization should be able to answer:
Questions
- What are the characteristics of a successful project?
- Which component (API, ML, Data) do we spend the most time and money on?
- What is the optimal number of software engineers for a project?
- What is the impact of adding more software engineers to a project?
- Is it true that reducing the number of meetings increases productivity?
- Does the number of lines of code in pull requests/code reviews impact the number of bugs?
- How long will it take to complete a project/how can we be more accurate at estimating the cost and duration.
- Which software engineer has the most commits to the repository
I have worked as an IT Professional for 20+ years as a consultant, contractor,
and employee at Fortune 500
companies, yet I have not encountered an organization that could answer the question.
The key to answering the questions above, learning from the past,
and improving as a team is to collect
operational data and use it to understand what we do and how we do it.
Here is an actual application and process I architected and built for a
customer investment firm.
Application development teams seek to deliver quality software on time and within budget.
Project Overview
The customer was an investment firm that built in-house data products. Their portfolio is around 50
applications. The firm employed approximately 100 software engineers. Each application development team had a
project manager, 5-10 software engineers, and a product owner as the core team.
They used Jira as their issue-tracking system and GitHub as their source control system. They wanted to limit
the scope to issues and source code and possibly add more systems later.
Architecture Requirements
- Event-driven architecture.
- Serverless
Application Design
- Identify operational systems
- source control
- issue tracking
- meeting data
- ci-cd platform
- project data (budget, duration, roles, hours)
- team members, skills, rates
- Build an application to harvest and store the data
- Build analytic processes to find insights into the data
- Create reports and dashboards that publish the insights and make them actionable
- Store results and use them for training/enhancing/validating the actions
The Jira and Git ingestion pipelines have the same architecture, which uses webhooks to post the data to an API.
- Events (git events, jira events) would be forwarded to the AWS API Gateway endpoint via webhook.
- API Gateway triggers a lambda function that evaluates the payload and routes to the SQS topic (commit queue,
pr queue, etc). This is an integration point that allows other processes to utilize the events.
- Another set of lambdas is aligned to each SQS topic.
- Lambdas evaluate, format, and store the events in the Postgres DB.
- Lambdas to increment counters in DynamoDB.
- Compaction processes execute when event counters reach a threshold that aggregates and consolidates records.
Counters are the cornerstone of the process, so let's talk about them. Each event will increment many counters. The
number and type of counters needed are driven by the questions to answer. Here are a few:
- user_commit_count
- project_commit_count
- component_commit_count
- user_line_of_code_per_commit
- project_line_of_code_per_commit
- component_line_of_code_per_commit
- user_jira_duration_minutes
- project_jira_duration_minutes
- component_jira_duration_minutes
So, store data for each level of the organization and element for each Jira ticket and git commit. This might
seem excessive, but you can build a highly performant real-time dashboard with this data and use it in reports.
Dashboard
Dashboards show real-time information about the state of projects. Projects are displayed on the dashboard
with basic stats:
- green trending yellow
- yellow trending red
- total issues vs. issues by status
- chart of commits, issues, etc, over the past 2 weeks
- issue/tickets that have been rolled over more than 2 times
- many more
Managers can click on any stat and drill down into details of that product, stat, etc.
Reports
- The firm uses a reporting platform for BI reports.
- Business analysis would create and publish reports answering the questions above.
- Managers, Leads, Project Managers, etc., receive emails daily with detailed statistics on projects in their
portfolios.
Conclusion
So, how much do you pay for a line of code?
- Cost per line = total cost of app dev / total lines of code
- This firm spent $13.15 per line of code.
- This company has since expanded the process to include more operational systems, reports, and dashboards.
- The data collected has genuinely transformed how they do business.
Related Articles
Data Engineering Best Practices
Build a ChatGPT Model
Hadoop Migration