C-Style ADTs & Strings in C++

What is an Abstract Data Type (ADT)?

One of the things that was super confusing when I started programming was "What does abstract mean in this context?" Well, the definition of the word abstract is "existing in thought or as an idea but not having a physical or concrete existence." Okay, but what does that mean for code? How do you have code that doesn't have a 'concrete existence'? This is a great question, but in order to answer that, let's first understand the definition of an Abstract Data Type (ADT), then show an example, at which point it all becomes a LOT clearer.

An Abstract Data Type defines what you can do with data without specifying how it's actually implemented. It's essentially a contract that describes the operations available on a data type, separate from the underlying implementation details. ADTs serve as a blueprint for creating structures (that word is important for later) that bundle together both data and the operations that can be performed on that data. There are varying levels of complexity to this, but with C, it's a lot more fun & very simple to understand.

C-Style ADTS

In C, we can create ADTs using the keyword struct (structures!). A struct is a user-defined data type that allows us to group related variables together under a single name. This is super useful for organizing data and creating more complex data types. Here's an example of a simple struct that represents a Person:

The syntax is painfully easy. A struct, for all intents & purposes, is literally a new type - so you declare it using the same syntax as you would other types: [type] [variable] = [value]. This is what makes it an ABSTRACT data type. You define, not a Person, but what a Person is — you define what constitutes a a Person, aka... an abstract concept :).

Syntax with pointers

The syntax for using structs with pointers is slightly different, but still very easy to understand. When you have a pointer to a struct, you use the -> operator to access its members instead of the dot . operator.

For this next example, I'm going to show you some cleaner code by doing something called respecting the interface. This is a programming concept where you only interact with a data structure through its defined methods (functions) and not directly accessing its internal data.

So, for example, if we have a Person, we COULD modify the object directly via murphy.height += 0.2f;, but that's not really smart for long term maintenance. What if we want to validate that height is between 0 and 9 feet? Sure, we could add an if statement right before that, but what if we need to modify it multiple times throughout our program? We'd have to duplicate that validation logic, which is tedious & error-prone.

A better approach is to use dedicated functions: set_height(murphy_ptr, 6.2f); instead of murphy.height = 6.2f;. This way, our validation logic lives in one place. This idea of encapsulation—hiding implementation details and controlling access through functions—is a key principle in software design.

Each function follows the Single Responsibility Principle (SRP): set_height() validates and sets height, set_nickname() validates and sets nickname, etc. A core tenet of being a good programmer is being lazy in the right way—delegate responsibility whenever possible and avoid repeating yourself. This is formalized as the Don't Repeat Yourself (DRY) principle.

You fucking liar.. "Don't repeat yourself", sure bro. What's with all of the if (person == nullptr) checks??? Can we cancel this dude???

I PROMISE, I suffered more writing that code than you did reading it. My brain itches when I see that code as well, and my first thought was to abstract / fix it. The reason this is done like this is because I couldn't think of a better example for illustrating the pointer syntax without making it more complex than necessary or introducing a new example that would detract from the main point of this post.

In real-world code, you'd probably want to use references instead of pointers because references can't actually be null, they must always refer to a valid object. This eliminates the need for null checks.

You could also have a validation function that checks all fields of the struct at once, but that would still violate the single responsibility principle. There is such a thing as over-abstraction / over-engineering. The key takeaway here is understanding how to use structs with pointers and the importance of encapsulation and single responsibility in function design.

C-Style Strings

In C (and by extension, C++), strings aren't a built-in type like int or float. Instead, they're represented as arrays of characters with a special marker at the end of the array, which is a null character '\0'. The null character indicates the end of the string.

This is fundamentally different from std::string in C++, which is an actual class that handles all the messy details for you. With C-style strings, you're working directly with raw memory. For example:

That '\0' at the end is invisible when you print the string, but it's absolutely critical. Without it, functions that work with strings wouldn't know where to stop reading. They'd just keep going through memory until they randomly hit a zero byte, which could be... literally anywhere. This is how buffer overflow vulnerabilities happen.

Where is this actually applicable? If you read my post on Streams & I/O, you'll remember that argv is actually an array of C-style strings:

Otherwise.. just use std::string in C++. But why, you ask?

Why do people hate C-style strings?

Because C-style strings are just arrays, they inherit all the limitations of arrays. You can't resize them. You can't concatenate them with +. You can't compare them with ==. All of that convenient syntax you might expect simply doesn't exist.

Instead, you have to use library functions from <cstring> (or <string.h> in pure C):

As you can see, the biggest issue with C-style strings is that they don't know their own size. These functions all rely on that null terminator to know when to stop. If you forget to include it, these functions will happily read past the end of your string into whatever garbage happens to be in memory next.

This can actually be pretty dangerous, too. There's no built-in protection against writing too much data into a string buffer. This is called a buffer overflow, and it's one of the most common sources of security vulnerabilities in C/C++ programs.

That second line will write way beyond the end of buffer, corrupting whatever data happens to be there. In the best case, your program crashes. In the worst case, an attacker exploits this to run malicious code.

So why do they exist?

C-style strings are a relic from when C was designed in the early 1970s. Back then, memory was extremely limited and every byte counted. The design was simple and efficient: strings are just arrays of characters, nothing more. No fancy classes, no safety checks, just raw pointers and memory.

Today, we still use them for a few reasons:

  • Legacy code: Tons of existing C code uses them
  • C library compatibility: Many system APIs expect char* pointers
  • Performance: In very specific cases, they can be faster than std::string
  • Low-level programming: Embedded systems and OS code often work with raw memory

The modern alternative

Like I mentioned earlier, in C++, you should just use std::string instead:

std::string handles all the memory management, resizing, and safety checks for you. It knows its own length. It can grow and shrink as needed.

C-style strings are still around because C++ maintains backward compatibility with C, but in modern C++ code, you rarely need to touch them directly unless you're interfacing with older C libraries or doing very low-level work.