Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched cultivar_id / species_id in traits_and_yields_view #3

Open
2 tasks
dlebauer opened this issue Sep 13, 2018 · 0 comments
Open
2 tasks

Mismatched cultivar_id / species_id in traits_and_yields_view #3

dlebauer opened this issue Sep 13, 2018 · 0 comments

Comments

@dlebauer
Copy link
Member

@infotroph commented on Tue Jan 16 2018

[Task LIst, added 3-14-2018:]

[TERRA-REF data is OK]


select distinct cultivar_id, cultivar, scientificname, species_id from traits_and_yields_view  where cultivar_id=10;
 cultivar_id |   cultivar   |  scientificname  | species_id 
-------------+--------------+------------------+------------
          10 | Cave-In-Rock | Acer rubrum      |         30
          10 | Cave-In-Rock | Panicum virgatum |        938
(2 rows)

Looks like the weird one is id 49920:

select distinct id,cultivar_id, cultivar, scientificname, species_id from traits_and_yields_view  where cultivar_id=10 and species_id!=938;
  id   | cultivar_id |   cultivar   | scientificname | species_id 
-------+-------------+--------------+----------------+------------
 49920 |          10 | Cave-In-Rock | Acer rubrum    |         30

@infotroph commented on Tue Jan 16 2018

Deleted some less useful comments. Here's a query that should help identify the mismatched records. Note that "correct" just means "as reported by the cultivars table".

SELECT 	traits_and_yields_view.id,
		cultivar_id, 
		species_id AS traitview_species_id,
		traits_and_yields_view.scientificname AS traitview_scientificname,
		specie_id AS correct_species_id,
		species.scientificname AS correct_scientificname
	FROM traits_and_yields_view
		JOIN cultivars ON cultivars.id = traits_and_yields_view.cultivar_id
		JOIN species ON cultivars.specie_id = species.id
	WHERE traits_and_yields_view.species_id != cultivars.specie_id;

First 10 of 618 results on my machine:

  id   | cultivar_id | traitview_species_id |             traitview_scientificname             | correct_species_id |              correct_scientificname              
-------+-------------+----------------------+--------------------------------------------------+--------------------+--------------------------------------------------
 22941 |         324 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22945 |         339 |                 2871 | Salix miyabeana                                  |               1243 | Salix
 22946 |         327 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22953 |         324 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22957 |         339 |                 2871 | Salix miyabeana                                  |               1243 | Salix
 22958 |         327 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22965 |         324 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22969 |         339 |                 2871 | Salix miyabeana                                  |               1243 | Salix
 22970 |         327 |                 2079 | Salix eriocephala                                |               1243 | Salix
 22977 |         324 |                 2079 | Salix eriocephala                                |               1243 | Salix

@dlebauer commented on Tue Jan 16 2018

@gsrohde did you ever implement or draft a constraint that required [id, specie_id] in the cultivars table to match the [cultivar_id, specie_id] in the traits table?


@gsrohde commented on Tue Jan 16 2018

I don't recall if I ever drafted a constraint for this, though I noted the need for one here: https://www.overleaf.com/2086241dwjyrd#/5297403/ (section 3.26 notes the need for this constraint in the traits table, and section 3.31 notes it for the yields table). Also, there is a Redmine issue noting the problem here: https://ebi-forecast.igb.illinois.edu/redmine/issues/2389, and Github issues #248 and #253 mention the need for species-cultivar consistency in the trait and yields tables (respectively) among the many other constraints to be implemented.

If I did ever draft a constraint, it's likely that the task of cleaning up existing data stood in the way of implementing it.


@gsrohde commented on Mon Mar 12 2018

@dlebauer I remember now. There is a constraint to this effect (a trigger function, actually). I implemented it when I added constraints to associate cultivars with sites. But existing bad data is "grandfathered in".

You can see the constraint in effect if you try to do the following update:

update traits set specie_id = 30 where id = 49920;

(The specie_id in this row already is 30, but nevertheless, this returns an error, even though nothing would actually change (except maybe updated_at) if the update succeeded.)

Updating to the proper value should succeed:

update traits set specie_id = 938 where id = 49920;

So the issue is a matter of cleaning up existing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant