Introduction
The main goal of this article is to learn more about streaming data management employing file concept. It will be considered from the operating system and program context points of view. I will keep the discussion general enough to be universal, practical, and portable. To be useful, all examples are prepared using the CSharp programming language and are gathered in the public GitHub repository Programming in Practice Repository. Clone the repository on your computer and open the solution ExDataManagement
in MS Visual Studio to follow the examples.
Check out the examples directly using embedded links. Hopefully, you will join the GitHub community. To follow any activity in the repository, switch on the watch functionality. If you find the project interesting, please "star" the repository. Starring a repository also shows appreciation to the maintainers for their work. You can start repositories to keep track of projects you find interesting and discover related content in your news feed. When you start the repository, it gets added to your list of starred repositories, which you can access from your GitHub profile page.
Files Management
If we write a program to automate information processing, we inevitably have to operate on data representing this process. Generally, we can distinguish operations related to reading input data, permanently preserving intermediate data, transferring data between individual applications, and saving the final data somewhere after completing the entire processing process. All these requirements can be accomplished using the concept of file. Even sending data between applications can be done using a file server, distributed file system, Google Drive, One Drive, and Pen Drive to name only the most popular ones.
This is where the term file system came into play. Without going into details about the architecture of the computer and the operating system, we can state that the file system is a resource available in every modern computer.
The file concept is integral to an operating system (OS). An operating system plays a crucial role in managing files on a computer. The operating system uses a file system to organize and store files. Files comprise content and metadata describing the content. The file content is a bitstream because a computer is a binary device. File metadata refers to information representation providing additional details about the file itself. For example, metadata might reveal the author of a file, previous revisions, or personalized comments associated with the file. It's like a file description, encompassing various attributes that help organize and manage files effectively.
By design, measures offered by modem operating systems used to manage files can be grouped as follows:
- Graphical (GUI) or Text-based (TUI) interfaces
- Application Program Interface (API)
Operating System
By design, managing the files using an operating system users depends on a graphical (GUI) or text-based (TUI) interface offered by the operating system. Operating systems make file management intuitive, allowing users to manage files without extensive technical knowledge. Unfortunately, this approach limits the scope of file management only to functionality embedded by the operating system vendors.
In other words, the operating system may be recognized as a bridge between GUI/TUI and the underlying file system, enabling efficient file management.
Program
Concerning program context, typically, we utilize object-oriented programming. This means that we must deal with reference types at compile-time and with objects in a computer's working memory (RAM) at runtime. Let me remind you that the RAM abbreviation stands for Random Access Memory. Here, random means that each word in memory has an address, i.e. a unique identifier, and this word can be independently read or written there. Let me stress we are talking about freedom but not probability. It means that. Again, the RAM address plays the role of URL.
On the other hand, we have the streaming world where the data is organized in the form of bitstreams managed, for example, using a file system.
To overcome flexibility problems and limits imposed by operating systems embedded functionality, Files could be managed by employing custom applications. Typically, the number of applications is huge, so the problem of cloning functionality arises. A typical solution offered is exposing useful operations by operating systems using an application program interface (API). By Deploying this approach, users interact with files through API to improve flexibility. These custom applications provide the user with operations using metadata.
In other words, the operating system may also be recognized as a bridge between a custom program using API and the underlying file system, enabling customization of file management.
Operating System Context
An important feature of a file concept is that it contains content in addition to metadata. The content includes data representing information to be processed. Metadata provides additional details about the file itself.
Files include metadata, i.e. data describing data. One such description is the file identifier. It plays two roles. One is that it unambiguously distinguishes a file from all other ones. In this role, it is recognized as a Uniform Resource Identifier (URI) - a unique identifier of a file, among others. It also indicates the location where the file may be found by the file system. In this role, it is a Uniform Resource Locator (URL). We also have other metadata such as date of creation, author, length, and many others.
To make the discussion more practical, let's look using File Explorer at the .Media
folder containing a few files in the repository used as illustrations in the examples:
We have different files there, but similar descriptive data, i.e. metadata, defined for all of them. Among these data Name
, Date
, Type
, Size
, Date created
, Date modified
, and much more may be useful for daily use, but the content is not visible directly here. For example, the content can be accessed by double-clicking on the selected file. After that, an image appears.
Here, we may ask a question: How do we describe this behavior? Well, a program was launched. This program must have been written by some software developers. The program opens the file as input data, so the programmer is aware of how to use this file. The data contained in the file makes it possible to show the content graphically on the computer screen. This is the first example of graphical representation.
Program Context
From the program context perspective, files may be managed directly using the API of the operating system or indirectly using libraries embedded in the selected programming development environment.
Managing files directly using an operating system’s API involves interacting with the system’s kernel to perform file operations like creating, reading, writing, and deleting files. System calls are the primary way a program interacts with the operating system. They provide an interface for requesting services from the operating system kernel.
Alternatively, we can use libraries offering similar functionality, relaxing the necessity of bothering directly with the operating system API. In this approach, all powerful features of the selected programming language may be used. The next advantage is that the final solution is portable. Hence, the further examples assume only this approach.
File Example
Using a code snippet located in the FileExample, the differences between file and stream may be explained from a program context point of view.
From this example, we may learn that there File
is a static class that represents the available file system and provides typical operations against this file system.
The content of the file is bitstream, represented by the abstract Stream
class. It is an abstract class that represents basic operations on a data stream (on the bitstream), which allows mapping the behavior of various media that can be used to store or transmit data as the bitstream. From this perspective, it can be proved that file content is always a bitstream (a stream of bytes).
Opening a File
File Class
Let's try the FileExample class. This class is referred to by the FileStreamUnitTest unit test. After executing the test, we noticed that the test is successful.
Let us examine the behavior of files using the previously mentioned FileExample class, which contains the CreateTextFile
method. The main responsibility of this method is to save text with the label 'Today is' and the current date to a file. To accomplish this requirement. A file is needed. The word File
appears at the very beginning of the method.
File.Delete(name);
The F12 key will take us to the definition. From the definition, it can be learned that this class is static. So there are no instances of it, we cannot create objects of this class. It is just an organization container. So, this class cannot represent an individual file. It can only represent all files. It provides operations related to files, where I used one of them, and this operation deletes the file whose name was passed by a parameter.
Another interesting thing about this example is the Open operation. The question is, why perform the open operation on a file, and what would this operation be used for? We want to save the text, but we perform open operations. Here, the answer is provided by a parameter called FileAccess
. It is an enumeration type that provides all the options that can be used. I selected the write operation because I want to write to this file. Well, this operation is fundamental to the use of files that we will use later because it causes the file that is being created or opened, if it exists, to become a critical section. What does it mean? This means that no other concurrent activity can operate on this file after we have opened it.
So, if this file were to be used or shared by multiple applications, a lock placed by the operating system will prevent this and only allow one concurrent activity to write to the file. This can have crucial consequences in a situation where, for example, we use a file in a hospital where patient data is saved and used in various places by doctors. To gain access to data at the reception, where further names are added. After someone opens the file for writing - as in this example - no one else can use the file.
Stream class
So, what's important to emphasize here is that the File
class does not represent a file in the operating system context. This class represents a whole file system. It contains operations that can be performed on any file available to the computer.
The Open
operation available in the File
class creates an object (instance) of the Stream
class type, as follows:
using (Stream _stream = File.Open(name, FileMode.OpenOrCreate, FileAccess.Write))
{
FileContent = String.Format(CultureInfo.InvariantCulture, "Today is {0}", DateTime.Now);
byte[] _content = Encoding.ASCII.GetBytes(FileContent);
_stream.Write(_content, 0, _content.Length);
}
Use the go-to definition menu entry to visit the definition of the Stream
Class. Let me stress that it is an abstract class. It means that it can represent not only files but also other resources. It is an abstract class, and thanks to its various implementations, we can ensure the polymorphic behavior of the various objects it represents. In simpler terms:
- If an instance of the Stream class represents a file in the file system, these operations will be performed by the operating system on behalf of a file system,
- If an instance of the Stream class represents, for example, a computer network and operations related to a computer network, then the operations are performed on resources related to the computer network.
We will come back to this topic by discussing various examples in which the Stream
class responsibility has been overwritten and inherited by classes that represent different behaviors, i.e. polymorphic behaviors of various resources that we can use to store and manage data.
The next line of code does not add much to the considerations regarding the use of files to store data processed by the program. This line is where the final formatting of the string of characters to be saved takes place. In the next lines of the program, we write to the file.
_stream.Write(_content, 0, _content.Length);
The file is represented by the Stream
type. To write to it, we must prepare data. This means that a bitstream must be generated based on the text to be written to the file content.
Encoding
We must be aware of how the data can be prepared. Let's look at the definition of the Stream
type. Analyzing members that may be used to write to a variable of the Stream
type, we see that all `Write' operations have a parameter of a sequence of bytes type. The byte is a sequence of eight bits. Hence, the data must be formatted as the bitstream when we use a stream.
Since a stream of characters must be specially prepared in some way to be saved in a file, there must be a relationship between the stream of characters, i.e. text, and the binary content of the file. Let me remind you that at the very beginning, it was stated that any program is also a text. Let's look at this example, which starts with these two characters.
I have files of different types here, which indicate that they incorporate data for various programs. For example, if we click on the FileExample.cs file twice it is opened in a text editor. But I can also open this file using hexadecimal code. This means that a file is a sequence of bytes. Because each byte is a sequence of bits, we can conclude that the content of a file is a sequence of bits.
If we open this file in a program that allows us to analyze the content at the binary level, we will see that this file does not start with "\" as is presented on the screen. These first two characters appear in the content but later. This, among other things, indicates that there is some kind of ambiguity between the text that is displayed on the computer screen, i.e. here, the first characters, and the content of the file, the binary file. We say that this relationship between the text and the bitstream is Encoding
.
We have different standards for converting bits to characters and characters to bits. One of them is the ASCII standard. A widely known standard that contains definitions - a table that tells how to represent binary characters. The table is fixed; therefore, the number of characters is strictly defined.
To observe how to convert bits to characters and characters to bits, we must return to the FileExample example class. This class is referred to by the FileStreamUnitTest unit test. After executing the test, we noticed that the test was successful.
But let's try to replace the Today is
label with the Polish translation dziś jest
and let's execute the test again. Unfortunately, the result points out that the test hasn't passed. The behavior of our program is different because we introduced Polish letters. The main reason for this problem is that I used an encoding that doesn't contain Polish letters. Precisely, a represented set of characters doesn't contain Polish letters.
If we apply an encoding that supports Polish letters, the test is green - it means that it passed. This means that the file's content corresponds to the stream of characters containing the national letters. Hence, it can be concluded that the bitstream becomes text after directly or indirectly applying an encoding. The set of valid characters in the stream depends on the selected encoding.
To recognize text as a document, the semantics rules must be associated with the bitstream, which allows the meaning to bitstreams association. As a result, text-based domain-specific language (DSL) is defined. A domain-specific language (DSL) is a text-based language dedicated to expressing concepts and data within a specific area. Except for programming languages like Java, C#, and Python, examples of well-known and widely accepted domain-specific languages are JSON, XML, and YAML formats to name only the most crucial.
Closing file
The last thing that remains to be explained is the close operation, which we must perform on the stream after finishing working with this file. Since the open operation appeared at the beginning, the closing operation must appear at the end. It is, again, fundamentally important because it closes the file, which means that the critical section is no longer needed. So, from now on, others will also be able to use this file - they will be able to open this file. Therefore, it should appear immediately after finishing working with this file. This means that we will not be able to perform further operations on this file within the program.
The question is what will happen when, for example, an exception occurs in the program between opening a file and closing it. The throw statement breaks the sequence of statements to be executed, putting at risk of omitting the close operation. As a result, the Close
operation will never be executed.
Using modern execution environments forces this file to be closed by the environment later at some point in time. However, this will not happen immediately, and the file will be locked longer than needed. To close the file immediately, we can take advantage of the fact that Stream
it implements the IDisposable
interface, which allows the use of the using
statement. The using
statement causes the dispose operation to be executed against the Stream
variable as the last method invocation before exiting the using visibility scope. If the stream or block of statements - that is part of the using operation - is interrupted, the Dispose
operation is executed. Thanks to this, we can ensure that the file will be closed immediately when the next program statements no longer have access to the Stream
variable because it goes out of the visibility scope.
Worth To Remember
- Managing the files using an operating system users depend on a graphical (GUI) or text-based (TUI).
- Operating systems make file management intuitive, allowing users to manage files without extensive technical knowledge.
- Operating systems may be recognized as a bridge between GUI/TUI and the underlying file system.
- Concerning program context, typically, we utilize object-oriented programming. Hence, we must deal with reference types at compile-time and objects in a computer's working memory (RAM) at runtime. On the other hand, we have the streaming world where the data is organized in the form of bitstreams managed, for example, using a file system.
- An important feature of a file concept is that it encompasses content and metadata.
- From the program context perspective, files may be managed directly using the API of the operating system or indirectly using libraries embedded in the selected programming development environment.
- The
File
is a static class representing the available file system and provides typical operations against this file system.
- An open operation causes the file to become a critical section.
- If an instance of the
Stream
class represents a file in the file system, these operations will be performed by the operating system on behalf of a file system.
- If an instance of the
Stream
class represents a computer network, and then the operations related to a computer network are performed on resources related to the computer network.
- Analyzing members that may be used to write to a variable of the
Stream
type, we see that all `Write' operations have a parameter of a sequence of bytes type.
- The set of valid characters in the stream depends on the selected encoding.
- We have different standards for converting bits to characters and characters to bits. One of them is the ASCII standard.
- To recognize text as a document, the semantics rules must be associated with the bitstream, which allows us to understand bitstream meaning.
- A close operation causes the file critical section to be released.
See Also
To learn more, consider checking out the following additional resources.
- Postol Mariusz; Programming in Practice (PiP) Discipline C# Corner, Feb 27, 2024.
- Postol Mariusz; External Streaming Data - Bitstream Format C# Corner, Feb 14, 2024.
- Postol Mariusz; Information Computation Mastery: Serialization C# Corner, Apr 01, 2024
- Postol Mariusz; PiP - External Streaming Data - Useful Concepts - Part 1 C# Corner, Mar 27, 2024
- Postol Mariusz; External Streaming Data - Bitstream Format C# Corner, Feb 14, 2024.
- Programming in Practice; GitBook eBook - The content of this eBook is auto-generated using the Markdown files collected in this repository. It is distributed online according to the open access rules.
- Programming in Practice - GitHub repository
- Programming in Practice - Discussion panel
- ASCII. (2024, August 25). In Wikipedia.
- Join me on LinkedIn
- Postol M, profile on GitHub.com
- Postół M, profile on ResearchGate
- Postół M, profile on youtube
- Postół M, profile on nuget
- Postół M, profile on ORCID