Below is an ultra-rough draft from a sub-section of my upcoming book, Beginning CI/CD. I’ve only started the chapter, so it’s pretty messy–still, though, I would appreciate feedback!

TL;DR? Just read the “useful links”.

Useful links:

  • https://jfrog.com/help/r/jfrog-artifactory-documentation/modules-and-path-patterns-used-by-repository-layouts
  • https://stackoverflow.com/questions/41954891/ci-what-to-do-with-old-versions
  • https://stackoverflow.com/questions/33821137/build-versioning-in-continuous-delivery/33821876#33821876
  • https://stackoverflow.com/questions/24523055/continuous-delivery-for-multi-component-project
  • https://stackoverflow.com/questions/55359931/when-to-create-a-branch-from-tag
  • java - CI - what to do with old versions? - Stack Overflow

Artifacts, Docker, and versioning

What are artifacts?

Artifacts are anything that is generated by the build process. This could be files, folders, applications, executables, documentation, Docker containers, etc. Therefore, it is important to clarify the context when someone says “artifacts” because this could refer to many different things. However, in practice, this normally refers to the final applications, and does not usually refer to Docker images (as these are sometimes called containers). Docker containers are still outcomes of the build process/deployment process, therefore, they are in theory considered artifacts. It depends on the context that this is being used.

Artifacts can be grouped together or considered individually. For example, when a pipeline runs, it generates many artifacts, some are just parts of the build process and some are required for the application to run. Artifacts can be grouped together, i.e., packaged (in a tar file) or can be considered individually. One artifact can contain many sub-artifacts.

Artifacts can be inputs or outputs to a build system, depending on which way or context it is considered in. For example, if application “A” depends on application “B”, then application “B”’s artifacts can be considered as inputs for application “A”. Normally in this context, this would be considered a dependency on a set of artifacts.

Importance of versioning

Versioning is necessary because it is important to keep track of which version of an application is published. This is important because when developers test the application, or need to find and report bugs, they must do so in a specific version. This allows the ability to reproduce the bug and to create a fix for it. Another version of an application might not have that bug, and so a developer would not be able to recreate that bug. Or, the version might have a different variant of the bug with different source code, and thus the fix for one version might differ substantially.

It also allows documentation and marketing material to be associated with a specific version, or to keep track of which features were developed and when they were released. This is especially important for large, complex features where the work may have to be divided among multiple teams. If the teams are large and complex, then it might not be clear if the feature is done or how long it has existed in production.

Another reason is for auditing and compliance. Imagine that there was a security issue in your application. Do you know how long the security issue has existed for? If you keep track of versions in your software, then you are able to go back in time and perform the same tests to check if the previous versions are vulnerable to those exploits.

Depending on how your application is structured, you may have multiple versions of the application that you support.

Ensure you version your application accurately. Proper versioning allows tracking and differentiation of multiple application versions. Whether addressing issues, making sales pitches, or managing deployments, unique version identifiers or tags are crucial. This prevents confusion among developers and stakeholders, ensuring everyone knows which version is in use, even amidst deployment challenges or staff absences.

Role of Dependency Managers

Some package repositories can help manage your artifacts. The reason why you would want to have a certain package repository is to allow you to make sure that the artifacts that you are publishing are in the right place and are available to all developers. By having a central source of truth, this reduces confusion with (potentially) multiple versions referring to different copies of the application (which differ in the underlying code.) Artifacts are meant to be stored immutably. Git repos, theoretically can rewrite history, which is not a useful property for immutability.

The dependency manager is software that normally runs on your computers and is usually specific to the type of application that you are trying to build. Its responsibilities are trying to determine the correct versions for your application to run. It may read a dependency manifest (sometimes called a package or lock file) that the programmer creates to indicate which versions of dependencies that your application needs.

When your program needs certain versions of the dependency, then the dependency manager is able to retrieve those versions easily. This is because it has internal logic which is able to resolve the dependency tree and automatically download the right dependencies for use.

A good dependency manager (or somewhere to host your artifacts) abstracts away most of this complexity by having a way to store your artifacts and provides instructions to your dependency manager (run locally) that can help resolve potential dependency conflicts. The artifact manager is just a server that hosts the artifacts, including metadata and might enforce ACLs.

Imagine if distributing software was not standardized (in the past it was a bit more complex, but I will leave out the history.) This means that someone (usually a developer) has to read specific instructions on how to install the software, and to make sure that only that version is installed, and to reference it correctly. The issue with this approach is that it is tedious, error-prone, and time-consuming. The instructions provided by the package maintainer might be incomplete or may not support adding multiple versions of the software on the system. The package maintainer can’t possibly know all of the software that is installed on the customer’s computer and thus cannot know about all of the potential packaging conflicts. Trying to find out which versions of two different software are compatible with each other may require manually managing the versions and trying different versions. For example, “v1” only works with “v2” of application “B”, but application “B”’s v2 is stored somewhere else, and might require application “C”.

When artifacts are published to an artifact server, the artifact server usually has specific functionality catered specifically for the distribution of artifacts. This may include allowing developers to connect to an artifact repository (which seamlessly integrates with other tools.) It may also be required for package managers which rely on a certain format for the package manifests and metadata to list available versions and to do dependency resolution. It also allows for finer-grained access control on the artifacts, whereas if everything is stored in a Git repository, it is likely that all of the files have the same access control.

Package repositories can also allow you or have the capability to generate package manifests to allow other developers to easily consume your artifacts or packages as part of their build processes. For example, in JFrog Artifactory it is possible to do something with package repositories.

Package repositories usually have the capability of ensuring file integrity through the use of checksums. This can help ensure that the integrity of the package is safe from corruption and may also provide capabilities to host it on multiple servers or to provide backups. The act of backing it up is abstracted away using the artifact manager and allows you to focus on coding.

Version numbers: difference between internal builds, release builds, and customer builds

What are internal builds? They are builds that are generated from the pipeline that aren’t intended for release to customers. They might be generated many, many times a day (for example, as part of a branch or PR pipeline.) Normally, these builds aren’t retained for long and are not released to customers. What are release builds? These are builds which are destined to be released to customers, but are not ready for customer consumption yet.

Versioning Strategies

It’s good to have a clear versioning strategy (and clear versions), because if, for example, the versions were just a string of 32 letters (perhaps it is) but you had to use a spreadsheet to convert it into something else (i.e., a lookup table.) This would make it difficult in day-to-day operations because the act of having a version number is important for many reasons.

Many people are interested in the status of the artifacts, albeit indirectly. For example, if the PM needs to know if “feature X” is being released, then they need to know where it is in the process and what the status is. Developers would also want to know so that they are able to meet the deadlines.

There are some human aspects to versioning as well. When you are using a “manual” versioning strategy such as SemVer, it is possible that this is not added in code review. Therefore, it can be helpful to create PR templates which can remind people to make sure that a release is created. If not, then it is possible to push an empty commit which can “tag” a version in Git. This approach might be preferred if the releases are manual, and can help keep things consistent. In this case, the empty commit would purely be for the CI/CD pipeline to tag the associated commit, but the commit itself would have no data. It would simply refer to the tag (or the snapshot.) It is important to not make any changes to the code in this commit (except maybe the version number) because any small changes to the code (to generate the commit) might impact behavior, especially if it has already been tested. It also makes it unclear what changes are “fake” and which changes are real and can add noise to the commit log.

The act of “cutting” a version or tagging can be a bit complicated because it requires the intersection of many processes. In this case, it can rely on build triggers, which versioning strategy to use, pushing multiple commits, and (embracing) a slow feedback loop with the CI runner, with lots of trial and error using unfamiliar syntax and complex branch regex rules. This requires that you know how all of these pieces fit together, and somewhat in depth, which can lead to difficulties trying to set up automated tagging in Git.

Before making a release, a developer would have to manually type in a version number which would correspond to the SemVer that should be applied to this release. This depends on which type of versioning system you are using, however. Some versioning systems, such as incremental or evergreen versioning, use the date, which can be completely automated. The downsides with using the latter approach is that major changes can be introduced, and it is unclear to consumers that this has occurred. This is less of an issue when developing client-side applications, as customers are unlikely to care which version they are running. However, if you are creating a library or software that is intended to be used by other developers or a library, then backwards incompatible changes can cause breakage which can make it difficult for application consumers to consume the library and use it correctly. It is simply an act of communication.

What would be an example of a poor versioning strategy?

Hypothetical example, the version number is a string of 64 characters, with 32 characters are just “a”. This would make it difficult to distinguish between two different versions of the application, because you would have to visually inspect them and figure out how many “a”’s to ignore.

It would also be very long, and difficult to display. Some artifact managers or repository managers might not accept a version number that is that long.

Mixing commonly confused letters, like I, l, 1, and together. This would make it visually difficult to distinguish two versions. It is possible to check if the versions are different using a diff-algorithm but this would be complicated.

Using special characters in the version number. The version number might appear in many places, and those places might not be able to accept special characters. For example, git tags and docker tags cannot accept certain types of special characters and have length restrictions.

Don’t put private information in the version number. It is likely that it will be public in some way or another.

Make sure that the customer is able to view the version number, using a non-complicated method or procedure.

Trying to follow SemVer, but making too many exceptions.

Therefore, it is important to make sure that they have the ability to be compared. Are they different? Is one greater than another one, or does it matter? Which one do we have to release for customers and which one is in testing?

You can change your versioning formats in the future if your needs change. But try to be consistent. Don’t change it too often because this will cause a lot of confusion. Change it and be done with it.

However, version numbers are a bit more flexible, but build number should only refer to a specific checksum or build of the application and should be immutable. For example, “iOS 17” refers to the latest copy of Apple’s iOS operating system. This could be any versions from 17.0.1, 17.0.2, etc. and these are versions. Internally, they might have multiple builds per day that are not released to the public, or some developer beta versions.

Versions in software are sort of like serial numbers for products. They allow traceability back all the way through the entire software development process, normally to find bugs or errors and for auditing purposes. It is also useful to know which version is deployed in production so that the product can be marketed correctly and developers know which version of the product contains bug(s), or the ability to know if a release was successful.

Your versioning strategy should be able to trace the artifact back to the source code, and the versions of the build tools (optional but still useful.) Make sure that the environment is part of the version. There are several ways to version your application, depending on your type of application. SemVer is popular for libraries that have potentially API-breaking changes that consumers should know about. Consumers can specify (in their manifests) which versions of your library they choose to consume, and can do so safely because they know that SemVer will not be violated. In some cases, you might have an evergreen version of an application, or an application that is intended for the end-user (such as a website.) In this case, the API doesn’t really have any breaking changes and SemVer might not apply. Therefore, consider using a date-based versioning strategy or a version that just increments. This will help you differentiate between releases.

Do you segment your customers based on different platforms or levels of service? Make sure to include that in the version. For example, do you have a macOS application and a Windows one? Then make sure that the platform is in the version to make sure that they are differentiated. Is one intended for enterprise customers? Then add that.

  1. Semantic Versioning
    • Clear communication on changes
    • Requires discipline in adhering to the rules (SemVer)
    • Popularized and widely adopted
    • May result in rapid version number inflation for unstable software
    • Differentiates between major, minor, and patch releases
    • Not ideal for projects where public API doesn’t change often but internals do
    • Easily integrated with dependency management tools
  2. Date-based Versioning
    • Easily identifies when a release was made
    • Doesn’t communicate the nature or impact of changes (e.g., YYYY.MM.DD)
    • Neutral in terms of software changes – it doesn’t imply severity or size
    • Can be confusing if releases are made more than once a day
    • Can be combined with other versioning methods for clarity
    • Not as widely adopted as other strategies
  3. Sequential Versioning
    • Simple and straightforward
    • Doesn’t provide insights into the nature or impact of changes (e.g., 1, 2, 3…)
    • Continuously increments with each release
    • May give the impression of major changes even for minor updates
    • Users can easily identify newer versions
  4. Release Trains
    • Predictable release schedule
    • Doesn’t provide specifics about changes within each release
    • Can help ensure regular updates and feature drops
    • Can lead to rushed or half-baked features if sticking strictly to the train schedule
    • Useful for larger organizations or projects with a lot of interdependencies
    • If a feature misses a “train”, it might have to wait for the next scheduled release

Programming-language specific versioning strategy quirks

7 Understanding Maven Version Numbers (oracle.com)

Storing artifacts and artifact retention

It depends on the type of artifact. If this is an artifact that was distributed to customers, then in general, these are retained longer than artifacts created by CI as part of the build process (e.g., during a pipeline run) but were not released. This is because the artifacts created by the CI or CD pipeline (and are subsequently not released) can be created tens or hundreds of times a day, and it may not be worthwhile to keep them because they are created as an “artifact” of the build process and to show that the build process is still sane and are considered temporary files. Make sure that you only store the necessary files to build and run your application as part of its artifacts. This is because if you include too many files, then it can use up unnecessary space, and can be a potential security issue if it is unclear what those files contain (e.g., passwords, credentials, etc.) if they should not be shipped to customers. It is important to have a link to trace the inputs (source code) to the outputs (artifacts.) You can use Git tagging to create this link. This will allow more reproducibility later on, and can help fix issues (or to backport fixes) to prior versions should they have an issue. This depends on your versioning strategy, of course. For example, webapps that are evergreen do not normally have a version (for example, going to Facebook is always the latest version) but an enterprise desktop application might have many versions that have to be supported at one time.

There are several things that would inhibit storing artifacts forever. One is the cost of storage, as depending on your application, artifacts may contain multiple dependencies and thus might be large.

Some artifacts might be tricky to delete because if they are required as dependencies for other versions of software, then it can be difficult to untangle the dependencies. Therefore, keeping them for around longer is sometimes the safer approach.

You may want to consider cold storage options if you have artifacts that are around for a while. This will allow you to save on storage costs.

Not all artifacts have to be stored forever. Some are generated when a pipeline is run, and they are sometimes called “snapshots” or “revisions”. These are usually temporary artifacts that provide the capability for them to be, in theory, published. In many cases they are never published and thus can be safely deleted. They should still be generated and retained for a while, however, because this will allow you to easily make a deployment should one be needed (or to revert to an older version of the software.)

Think about the utility of storing these files against the cons. If I store the entire node_modules folder, what do I gain that I don’t have if I were to just store the revisions of the package.json and package-lock.json file? If NPM is down, consider using another registry instead of committing node_modules.

It is difficult or not usually possible to delete items from the Git history. Artifact managers can deprecate or remove old version of the software or make them not available for package consumers.

My pipeline runs a lot, why is that? Do I need to retain artifacts at each run?

A pipeline runs when a trigger has been hit (i.e., a PR was created, push to a branch, a new tag, etc.), or it was run manually. This is important because each run of the pipeline can generate artifacts, and the pipeline also should pass at various stages to prevent committing code that is not ready to be merged.

Commit to a Branch: When a commit is pushed to a branch, the pipeline can provide developers with early feedback. While this early-stage feedback is invaluable, it is most beneficial if developers actively utilize it. Typically, artifacts from these runs need not be stored long-term, as they often represent work-in-progress features.

PR Creation: It’s imperative to run the pipeline when a PR is created and updated. This ensures the code meets the required checks before merging. Artifacts from this phase, much like the previous touchpoint, aren’t usually stored long since multiple updates might be pushed before finalization. The pipeline must pass before the PR is merged.

New Tag Addition: If a new tag is pushed, it often signifies a deployment phase, and the pipeline might be geared towards initiating a release. In such cases, retaining the artifacts is crucial.

Post-PR Merge: The next time it might run is after the PR is merged. This appears strange, because it already ran on the PR, right? Things get a bit complicated and might depend on your CI/CD software’s implementation.

When you create a PR, the CI/CD pipeline is run on (your branch, merged with the target branch.) This does not update your branch with the target branch.

If the pipeline was successful, the fact that it was successful might be retained for any length of time, usually 12 hours. This means that pushes to the target branch do not cause the PR pipeline to be re-run.

Since the target branch can update independently of the PR pipeline running, this means that there is a possibility for conflicting changes to occur (not merge conflicts.) This means that the pipeline has to be rerun on the target branch once it is merged to ensure that there are no issues. If, however, the target branch cannot be merged due to a merge conflict, then it will not allow the merge.

I’m assuming that this is the case because if there are multiple team members pushing to the pipeline at once, then this would quickly cause a bottleneck and cause all PRs to be recompiled every time there is a merge to the target branch. This would be fairly wasteful and would reduce throughput significantly. However, there is risk as there is a possibility of conflicting changes.

Treat internal dependences as external dependencies.

If you do an upgrade, and it breaks things (i.e., you’re always on the latest and you do a PR to upgrade it, thus, running the CI pipeline), then just don’t go through with the PR. Keeping it always on the latest version automatically means that previously passing code might start to fail and it might not be clear why.

Another reason why you’d want to retain artifacts is because you want to have a reproducible build, even if you’re generating a new one. This is because in order to know if something is broken (i.e., to narrow it down), then you have to have a stable environment to be able to rule out other factors. Also, dependencies can change functionalities, so if everything is always changing at once then it is difficult to test, and if you try to rollback changes (e.g., via a roll forward), it might not actually fix it because there are new dependencies that are being used instead of the old ones. There isn’t an opportunity to test the dependency upgrades in isolation (e.g., via a PR.)

Do not change artifacts that are in the artifact repository without changing their version number first.

12 From Build Automation to Continuous Integration (oracle.com)

[TheNEXUS A Community Project (sonatype.com)](https://books.sonatype.com/mvnex-book/reference/simple-project-sect-summary.html)

Changing the manifest of the artifact is important when you upgrade the version number so that it can be identified/stamped. Otherwise, it might theoretically be possible for two artifacts with different version numbers to be the same in terms of file content which would be weird, and it’s also difficult for the customer to know which version they are running (or you.)

How do I version code and Docker images?

A container is a managed execution environment that isolates its contents from the host, meaning the container doesn’t know about other applications on the host. It shares the host’s kernel and resources.

A CI/CD server shares similarities with a container. It offers a stateless execution environment, often with some pre-installed dependencies. This environment is discarded post-run, ensuring a clean build environment every time.

Once you’ve successfully built your program, you can test building it inside a Docker container, like “ubuntu-latest” for Linux builds. This mimics a CI/CD environment, which typically starts with a minimal setup, devoid of your application’s specific dependencies or your codebase. You’ll need to add these dependencies and your code to the container to build it.

Note: When creating a tag for your CI/CD pipeline, you’ll need to have a merged PR. Use git commit -m –allow-empty “Commit message here” to push a tag without any commit content.

Note: if you are planning on using tags to support multiple versions of your software simultaneously, and are using trunk-based development, then this might be a bad idea. This is because tags only refer to a single commit which makes it difficult to change something at one point without changing everything after it. Therefore, you might be interested in different branching strategies. However, if the history is linear, and you’re using a rolling versioning strategy (e.g., today’s date), and the previous versions are never supported, then therefore tagging will provide a linear history, which should be suitable for most applications.

All tags do is add an alias to a commit hash. It makes it easy to retrieve a particular version, as you can just view the tag and find the associated commit hash.

Git’s git tag command lets you label specific commits. Here’s how:

  1. Lightweight Tags: A simple pointer to a specific commit.

    git tag v1.0
    
  2. Annotated Tags: These are full objects in Git’s database. They contain metadata like the tagger’s name, email, date, and a tagging message.
    git tag -a v1.0 -m "First stable release"
    
  3. Tagging Earlier Commits: To tag a non-recent commit, use the commit’s hash.

    git tag v0.9 9fceb02
    
  4. Pushing Tags to Remote: Explicitly push tags to a remote repo.

    git push origin v1.0
    

    Note: you may have to make a commit first (i.e., see previous command.) This is because some CI/CD software does not allow pushing tags because the only way to update the master branch is via a PR, and the PR must have at least one commit.

  5. Deleting Tags: To remove a tag:

    git tag -d v1.0
    

CI/CD tools may differ in their tagging setups. While Git allows for release tagging, some teams use third-party tools like Azure DevOps. If you need deep project management software integration, consider using built-in CI/CD offerings. Should you tag in Git? Weigh the benefits against potential confusion from mismatched tags and releases.

Ok, so I have tagged a container. How do I tag the associated source code?

Git Tags don’t do anything on their own; they are not capable of creating a release. The CI or CD runner has to look at the tags and do useful work. This normally occurs when a new tag is pushed.

Git Tags are one way of tracking releases and offer a provider-agnostic way to check out the code at a specific version. There are many ways to track releases, and sometimes tracking must occur at multiple steps. In this case, there must be tracking at the source code level to make sure that one understands which source code is being released. There may be tracking at the user story or task level to understand which task(s) were part of the release, or QA test plans.

When creating a new release, and the task(s) are not yet done, put them under the next version that has not yet been released.

Tags do not change the source code on their own. For example, if your application displays its version in its “About” dialog, this won’t change if you tag the release. Therefore, you may want to change the version number in the application before or when you tag the release. This can usually be done via automation and the version number for the application might exist in one of the application’s configuration files.

How do I know when a tag has been pushed or created? I would like to run a script in this case (or kick off another pipeline.)

This depends on if the tag is annotated, as the commands will differ. I recommend adding in a manual override, as there will be situations where you may need to delete existing tags or rewrite them because of mistakes or exceptions to the procedure.

If you are using a monotonically increasing or random version for each application (e.g., evergreen), then I recommend that this process is automated. If you are using semver, then you may want to consider doing releases manually. It should be very easy to do.

It can be complex to manage and create tags when releasing software because it may require knowledge of bash scripting, which people might not be familiar with. It also has different programming paradigms.

Normally, you’d want to kick off a release when there is a new tag pushed, and the tag is merged. There are a long tail of exceptional situations, such as two tags being pushed at the same time, tags being deleted, merges, etc. that makes things more complex.

First, you’d want to figure out your release strategy before your tagging strategy. Tagging strategy is just a technical implementation of your release strategy.

Some software allows creating a release manually.

The issue is, if I am using SemVer for example, how do I automate the tagging process? In this case, there are many tools to show if the tag will be backwards not compatible or not, but SemVer usually requires human intervention because “major” changes are subjective. In this case, releases would still be manually initiated but the process itself would be automated. There are some tools to automatically notify of API breakage, but this would depend on the type of library that you are building and whether there exists a tool for this. It cannot detect all changes, normally, only changes to the public API.

What does it mean when a container is generated every time I merge code?

Depending on the CI setup, CI might be linked to CD, which means that a deployment is automatically made on every push. Therefore, the CI system might generate the docker container using the Dockerfile included in the repository. The container is usually pushed to a registry after it has been created. How to use Docker to make releases? - Stack Overflow git - When to create a branch from tag? - Stack Overflow useful to know that each application has a version, although it may not be released to the public. For example, intermediate versions. This would be tedious to do manually.

Integrating Artifact Repositories with CI/CD pipelines

Package Manager Dependency: Your choice depends on the package manager you’re using. For example, C# uses the NuGet package manager, so it would have to be a “restore” step in the pipeline to get the packages from the repository.

Authorization: Connect to your package repository, often using API keys, credentials, a service connection, or an identity. If you are using your CI/CD providers package manager, it will usually have steps on how to connect to it.

Local Testing: Before using CI, test the setup locally, potentially using your IDE for assistance.

Note! Theoretically, artifacts can also be re-generated which would mean that there isn’t a need for artifact repositories (i.e., just build the code again.) However, this process is time-consuming and error-prone because the build-tools are usually not version controlled, which means that a small difference in the build tools will cause the outputs to be different. If the output changes by one bit, it does not necessarily mean that program behavior is impacted. However, this means that the artifact is no longer the same, thus opening up the door for potential exploits/vulnerabilities/security issues.

Artifact tracking and naming

Another issue is: when you have an artifact, how do you trace it throughout its entire lifecycle? For example, say it is in QA. How do you know it is in QA? Artifacts are generated at several points during the build process, are generated during non-customer pipeline runs, and during testing. How do I track which one(s) are being used by the customer?

How do I name artifacts?

Organization-module name-revision Repository Layouts JFrog Artifactory Documentation Reader JFrog Help Center.

When is a version assigned to an artifact?

Sometimes, the CI or CD runner will assign build numbers to the artifacts.

An artifact might have a lot of metadata associated with it, such as build numbers, versions, revisions, dates, etc. Usually, versions are assigned when doing a release, or they are assigned automatically through the build process, and then whichever version is released, then its version is recorded in the release log. A version might exist as a floating version. For example, the marketing material might say “Version 5”, when, in fact, there are many updates to that version, such as 5.0.1, 5.2, etc. It is also possible to give other developers evergreen versions of the artifact if they are injected at runtime. For example, if they are not bundled with the application (and thus fetched from a remote server), for example a JavaScript payload, then this can ease distribution for multiple clients. You would want to make sure to capture sufficient telemetry, record which version(s) are currently in use so that you can associate error telemetry.

Artifact maintenance

When you try to maintain artifacts, there are several issues that arise. It might be unclear which versions of the application you are trying to keep and you might keep too many versions. This can cause confusion, especially if you use manual dependency management (although dependency managers can usually automatically choose which version is necessary.) It can be a financial cost as well, and a potential liability if there are too many copies of your application stored everywhere. It increases storage costs as the applications are not necessary to be stored and will never be used. Recall that artifacts are only the essential information that your application needs to run. By default, typically, artifacts are stored for 30 days on most providers.

The other issue is: given that we have artifacts that have been deployed to customers, when do we delete them? After seven years? We might need them again depending on the level of support. If unsure, I would recommend keeping them, because it could be very complex to recreate the artifact from scratch. There are usually ways to specify retention policies with your artifact manager.

When the artifacts are no longer useful, then they can be decommissioned. This is where the artifact managers come into play. They are able to track downloads over time, and might be able to track it down to specific pipelines. This helps you understand where the artifacts are being used. You can also selectively deprecate different versions, which will make it so that application developers cannot use that specific version (unless, of course, it is cached on their machine.)

When an artifact is deprecated, it might be possible to mark it as deprecated in the dependency manager, and in some cases not make it available for download. You should send sufficient communication to relevant stakeholders regarding its deprecation, including its replacement, when it will be removed, ramifications of what will happen after it is removed, its impact, and who to contact if there are questions. Sometimes, if the artifact is not being used much, you might be able to deprecate it without notifying others.

This can be made more complex if operations have to be performed on those tags. For example, incrementing a tag or determining if one tag is before another tag, for example, is “v1.0-beta” before “v1.0-dev”? For example, incrementing a tag requires knowing what the last tag was, and then adding something to it (or incrementing it.)

If you want to tag your releases with branch names, or to associate it with branch names, then consider slugifying the branch name. This is because Docker image names have a restricted character set as to how they can be named.

In order to have a reproducible build environment, you have to have enough information about the environment to make it reproducible, such as versions, inputs, their checksums, hardware, etc. Any small change in any part of the software chain can cause the artifacts to be non-reproducible because the tooling is very complex, and has dependencies on other parts of the build process. One way to do this is through Dockerfiles, which are a set of instructions that contain the specific versions of tools that you use to build your application. Because it runs in an isolated environment, this means that you can run multiple conflicting copies of other dependencies on your machine and it will not interfere with the Docker container.

Containerization

Docker packages software applications into deployable units called images. When running, these images are referred to as containers. With Docker, tags reference specific image versions.

Tags in Docker allow easily referring to a specific image. If you want the container to be pushed to a specific registry, then the Docker’s tag has to contain part of the registry URL. Tags can be thought of both as an identifier and the desired location for the image. This is most likely because the Docker push command does not take a registry as an argument and thus relies on the container tag to disambiguate the context. Say I create an image with “docker build .”. I get an image, but there’s no repository and no tag. This makes it difficult to determine what version or what thing I am looking at.

alex@DESKTOP-7M8V9ET:/dev/shm/getting-started-app$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE

d49c4d85c3ea 24 minutes ago 269MB I don’t know what d49c4d85c3ea is. It could contain anything. Therefore, we can use tags to keep track of the images. Tagging images in Docker is a vital part of managing and organizing your images, especially when collaborating or deploying applications. Here's a step-by-step guide on how to tag Docker images, push them to registries, and pull them based on their tags: ### 1. Building and Tagging a Docker Image: Firstly, when you're building an image using a `Dockerfile`, you can tag it right away: ```bash docker build -t [username/imagename]:[tag] . ``` - `[username/imagename]`: The name of the Docker image (often prefixed with a username or organization name). - `[tag]`: The tag for the Docker image (e.g., `latest`, `v1.0`, `development`, etc.) For example: ```bash docker build -t myuser/myapp:v1.0 . ``` ### 2. Tagging an Existing Image: If you have an existing image that you'd like to tag or retag, you can use the `docker tag` command: ```bash docker tag [source_image]:[source_tag] [username/imagename]:[new_tag] ``` For example, to retag an existing `myapp:latest` image to `myapp:v1.0`: ```bash docker tag myapp:latest myuser/myapp:v1.0 ``` ### 3. Pushing Tagged Image to Docker Hub: Before pushing, ensure you're logged into Docker Hub (or another Docker registry): ```bash docker login ``` Then, push your tagged image: ```bash docker push [username/imagename]:[tag] ``` For example: ```bash docker push myuser/myapp:v1.0 ``` ### 4. Pushing to Other Registries: If you're not using Docker Hub, but another registry like Google Container Registry (GCR), Amazon Elastic Container Registry (ECR), or any other, your image name (and tag) will usually include the registry URL: ```bash docker tag myapp:latest registry-url/myuser/myapp:v1.0 docker push registry-url/myuser/myapp:v1.0 ``` ### 5. Pulling a Tagged Image: To pull an image based on a specific tag: ```bash docker pull [username/imagename]:[tag] ``` For example: ```bash docker pull myuser/myapp:v1.0 ``` If you don't specify a tag, Docker will usually default to the `latest` tag: ```bash docker pull myuser/myapp ``` ### Tips: - It's good practice to use meaningful tags. Common tags include version numbers (`v1.0`, `v1.1`), development stages (`dev`, `prod`), or even Git commit hashes for granularity. - Keep in mind that while the `latest` tag might sound like it represents the most recent version of your image, Docker does not enforce this. The `latest` tag is simply the default tag if no tag is specified. Therefore, it's always recommended to be explicit with your tags to avoid confusion. - Remember, each time you change and retag an image, you'll need to push the newly tagged image to your registry if you want to share or deploy it. Let’s tag it with “docker tag d49c4d85c3ea my-app:v1.0”. The resultant images list now shows our container, with our version: alex@DESKTOP-7M8V9ET:/dev/shm/getting-started-app$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE my-app v1.0 d49c4d85c3ea 25 minutes ago 269MB If I make some changes and rebuild the image with “docker build .”, then I will get another untagged image, alex@DESKTOP-7M8V9ET:/dev/shm/getting-started-app$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE 0e3996fbe4ca 3 seconds ago 269MB my-app v1.0 d49c4d85c3ea 26 minutes ago 269MB Instead, I can pass the “-t” argument and tag it immediately. This is helpful because you may have multiple images, and so having many untagged at once can cause a bit of confusion. It is also more efficient and makes sure that you don’t forget to tag it. Now, we have an image that contains the things that your application needs to run. How do we push it to a container registry, where it can be built and published using a CI pipeline? NOTE: when you are building images locally, they might contain cached layers. This is a useful property which makes building the containers faster. However, some commands may not be idempotent. For example, running apt-get install curl may install the latest version of curl that is in your package repositories. Depending on how you have your Dockerfile set up, it might be referring to a cached layer which might be outdated. Also, CI runners are unlikely to use cached layers, which is why you might be getting different results when building locally. Therefore, you may consider doing some uncached builds, or, making sure that the items in the steps cannot change by using specific versions of the software. Let’s go back to publishing it to a container registry. Azure Container Registry (ACR) is a managed Docker container registry service used for storing private Docker container images. To publish your Docker image (`myapp:v1`) to ACR, follow these steps: ### 1. Prerequisites: - Ensure you have the `azure-cli` (Azure Command-Line Interface) installed. - Ensure you have Docker installed. ### 2. Authenticate with Azure: Login to your Azure account: ```bash az login ``` A browser window will open asking you to sign in to your Azure account. ### 3. Create an Azure Container Registry (if you haven’t already): Replace `myregistry` with a unique name for your registry, and `myresourcegroup` with the name of your Azure resource group: ```bash az acr create --resource-group myresourcegroup --name myregistry --sku Basic ``` You can choose different SKUs (`Basic`, `Standard`, or `Premium`) based on your needs. ### 4. Login to ACR: Before you can push an image, you need to authenticate Docker to the Azure Container Registry: ```bash az acr login --name myregistry ``` ### 5. Tag Your Image with the Full ACR Login Server Name: To push an image to ACR, it needs to be tagged with the full ACR login server name. First, retrieve the login server name: ```bash az acr list --resource-group myresourcegroup --query "[].{acrLoginServer:loginServer}" --output table ``` Once you have the login server name (something like `myregistry.azurecr.io`), tag your image: ```bash docker tag myapp:v1 myregistry.azurecr.io/myapp:v1 ``` ### 6. Push the Image to ACR: Now you can push the image to your Azure Container Registry: ```bash docker push myregistry.azurecr.io/myapp:v1 ``` ### 7. Verify: You can verify that your image was successfully pushed by listing the images in your ACR: ```bash az acr repository list --name myregistry --output table ``` And to see the tags for a specific image: ```bash az acr repository show-tags --name myregistry --repository myapp --output table ``` You should see `v1` in the list of tags for `myapp`. ### 8. Optional - Logout from ACR: After you've finished working with ACR, it's a good practice to log out: ```bash az acr logout --name myregistry ``` That's it! Your `myapp:v1` image is now published to your Azure Container Registry. Whenever you want to deploy or run this image from the registry, you'll pull from `myregistry.azurecr.io/myapp:v1`.