Skip to content

Add String.lines/1 #14493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions lib/elixir/lib/string.ex
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,38 @@ defmodule String do
end
end

@doc ~S"""
Returns the list of lines in a string, preserving their line endings.

If you would like the lines without their line endings, use
`String.split(string, ["\r\n", "\n"])`.

## Examples

iex> String.lines("foo\r\nbar\r\nbaz")
["foo\r\n", "bar\r\n", "baz"]

iex> String.lines("foo\nbar\nbaz")
["foo\n", "bar\n", "baz"]

iex> String.lines("")
[""]

"""
@doc since: "1.19.0"
def lines(string) do
lines(string, <<>>)
end

defp lines(<<?\n, rest::binary>>, acc),
do: [<<acc::binary, ?\n>> | lines(rest, <<>>)]

defp lines(<<char, rest::binary>>, acc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we handle /r as well? the python version does:

"asd\radf".splitlines(True)
['asd\r', 'adf']

Side note: LSP document synchronisation treats /r as legit newlines as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ruby does one not and I am skeptical about doing so because it is clearly not a newline on terminals:

$ iex
Erlang/OTP 27 [erts-15.1.2] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Interactive Elixir (1.19.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> IO.puts "foo\rbar"
bar
:ok

So treating it as a newline looks very Windows centric. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps support the Unicode Line Separator? That would seem consistent with String.split/1 which uses the Unicode definition of whitespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So treating it as a newline looks very Windows centric. :)

Wasn't it classic MacOS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps support the Unicode Line Separator? That would seem consistent with String.split/1 which uses the Unicode definition of whitespace.

python does support it

"asd
adf".splitlines(True)
['asd\u2028', 'adf']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you point, which is why I can totally understand not including VT and FF (haven't used those since punch cards and line printers!!!!). And PS I suppose.

A bit sad that Zed doesn't interpret \u2028 though and perhaps that's worth raising as an issue there (which I'm happy to do). I don't think that's noted in the Unicode security guide but I'll check that first.

Copy link
Contributor

@lukaszsamson lukaszsamson May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python has it's own rules https://docs.python.org/3/library/stdtypes.html#str.splitlines basing on https://peps.python.org/pep-0278/ and https://peps.python.org/pep-3116/

And they quite recently added \v and \f to list of line boundaries (in 3.2)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you have Erlang's interpretation of newlines, which only considers \r\n and \n as well:

~/OSS/elixir[jv-string-lines *%]$ cat example | elixir -e "IO.stream() |> Enum.each(&IO.inspect/1)"
"LINE\n"
"LINE\rLINE\n"
<<76, 73, 78, 69, 12, 76, 73, 78, 69, 11, 76, 73, 78, 69, 194, 133, 76, 73, 78,
  69, 226, 128, 168, 76, 73, 78, 69, 226, 128, 169, 76, 73, 78, 69>>

So I am thinking the best is to find a new home for this function. Perhaps the Code module indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still use Textmate often and it shows something very similar to Zed.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's even more conservative as it only shows \n but I assume there is a configuration somewhere for it to read it Windows style.

do: lines(rest, <<acc::binary, char>>)

defp lines(<<>>, acc),
do: [acc]

@doc """
Returns an enumerable that splits a string on demand.

Expand Down
Loading