^M line endings

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen

A nice case concerning file endings - 2023.12.21

Note that all lines end with ^M line endings. According to VIm's commando :set fileencoding, character encoding is UTF-8

The problem

The ^M characters you're seeing in Vim are actually carriage return characters (CR). In ASCII or UTF-8 encoding, the carriage return character is represented as ^M.

When you create or edit files on Windows systems, the line endings are typically represented by a combination of carriage return (\r) and line feed (\n) characters (\r\n). However, on Unix-based systems like Linux or macOS, only the line feed (\n) character is used for line endings.

Sometimes when a file created on a Windows system is viewed or edited in a text editor on a Unix-based system, the carriage return character (\r or ^M) might be displayed at the end of each line because Unix-based systems interpret the carriage return character as a part of the text, whereas Windows-based editors often handle this differently.

A problem for robots.txt?

In general, having Windows line endings (\r\n) in the robots.txt file might not directly affect the functionality of the file itself. The content of the robots.txt file, including line endings, is typically read and interpreted by search engine crawlers like Googlebot.

However, best practices recommend using Unix-style line endings (\n) for web-related files like robots.txt, mainly because the web servers, including Apache, generally operate in a Unix-based environment.

While search engine crawlers are often designed to interpret different line endings, using non-standard line endings might potentially lead to unexpected behaviors in certain scenarios or when specific software expects a standardized format. Therefore, it's advisable to maintain consistency and use Unix-style line endings for files served by Unix-based web servers.

Solutions

How to change all \r\n line endings in a file to \n line endings?

VIm

In VIm, you can change this through

%s/\r//g

Sublime Text

It seems that Sublime Text isn't very useful for detecting file encoding as it handles various encodings just by itself and AFAIK, file encoding isn't stated somewhere. However, it is very handy for detecting and changing line endings - See screenshot

File » Save with encoding » UTF-8
Setting file encoding in Sublime Text: File » Save with Encoding » ...
Seeing & settings line ending in Sublime Text: View » Line endings

sed

Supposingly (I haven't tried it) you can change line endings with sed through

sed -i 's/\r$//' yourfile.txt

dos2unix

Or, if you prefer a clean command for doing this, dos2unix might be an appropriate tool. Install it through sudo apt install dos2unix.