Skip to content

Conversation

@kaorukobo
Copy link
Contributor

While the "R" (Reference) type rarely appears in real-world PHP serialization, certain applications may use it. (at least I met it and saw Unable to unserialize type 'R' error.)

Serialization — PHP Internals Book

$a[1] =& $a[0];

a:2:{i:0;s:3:"foo";i:1;R:2;}

This pull request introduces support for the "R" type by adding the PHP::Reference object,
which can be re-serialized to what it was before unserializing.

@jqr
Copy link
Owner

jqr commented Feb 24, 2024

In PHP unserializing that object results in an Array with two elements, both which are the the same object. Modifications to either element modify both elements.

php > $u = unserialize('a:2:{i:0;s:3:"foo";i:1;R:2;}');
php > print_r($u);
Array
(
    [0] => foo
    [1] => foo
)

php > $u[0] .= "bar";
php > print_r($u);
Array
(
    [0] => foobar
    [1] => foobar
)

Shouldn't the Ruby version do the same instead of making a PHP::Reference object?

ruby> u = PHP.unserialize('a:2:{i:0;s:3:"foo";i:1;R:2;}');
   => ["foo", "foo"]

ruby> u[0] << "bar"
   => "foobar"

ruby> u
   => ["foobar", "foobar"]

@kaorukobo
Copy link
Contributor Author

kaorukobo commented Feb 25, 2024

Reading your reply, I also noticed the level of support is crucially insufficient in production.
As shown in the example below:

  • it lacks support for "r" (represents repeating the same object, not a PHP reference = "R")
  • breaks a reference easily when we change the position of a referred object before re-serializing the entire data.
deserialized = PHP.unserialize(
    # $a = (object)[]; echo serialize([$a,$a]);
    'a:2:{i:0;O:8:"stdClass":0:{}i:1;r:2;}'     # <== needs support for "r"
)
deserialized.unshift "foo" # <== breaks `nth` of "r:nth"

I would try another PR or improve this branch.

@kaorukobo kaorukobo force-pushed the support-r-type branch 2 times, most recently from fcd0f49 to 9ba7966 Compare February 25, 2024 08:34
@kaorukobo
Copy link
Contributor Author

I have improved the patch. Could you take a look?

  • The first commit is a refactoring:
    • This patch needs more values in the state retained during serialization/deserialization. It will make the code very unreadable. So firstly I refactored the code by introducing a state object that is passed thru along recursion calls.
  • then the second commit is an essential change:
    • on PHP.unserialize:
      • Instead of introducing PHP::Reference, simply dereference R/r to the deserialized value before in the same session.
    • on PHP.serialize:
      • Generate an object reference ('r') for a recurring object instead of serializing it again.
      • The behavior is limited to Struct or objects having #to_assoc method. A simple value like String or a data structure like Array/Hash are dereferenced to a plain value. (that is, it does not generate R type.)
        • IMO, I cannot see advantages in holding non-object references inside a PHP array/object using =&... much less in retaining them in a serialized form.

@kaorukobo
Copy link
Contributor Author

Please be noted that the improved patch enables "Modifications to either element modify both elements."

def test_reference_of_object
  # ...
  assert_same unserialized[0], unserialized[1]

@kaorukobo
Copy link
Contributor Author

Rebased on v1.4.1

@nonnenmacher
Copy link

do we have an update on this PR if it pass 'tests' ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants