MusicBrainz Summit / 11 / Session Notes

This page has not been reviewed by our documentation team (more info).

Attendees

  • Kuno Woudt (warp)
  • Pavan Chander (navap)
  • Rob Kaye (ruaok)
  • Nikki
  • Oliver Charles (ocharles)
  • Jamie McDonald (jdamcd)
  • Nicolás Tamargo (reosarevok)
  • CatCat
  • Per Øyvind Øygard (Wizzcat)
  • Paul Taylor (ijabz)
  • Mathias Kunter (mathiaskunter)
  • Hilbert Woudt (monedula)

Sponsor representatives

  • musiXmatch: Valerio Paolini
  • Last.fm: Adrian Woodhead (massdosage)
  • Google/Freebase: Micah Saul (micahsaul)
  • Zvooq: Andrey Popp (andreypopp)
  • BBC: Dave Evans (djce)

Customer introductions

Last.fm

  • They have had a lot of personnel changes over the last few years, but would like to re-establish a relationship with MB
  • Are looking to switch to NGS schema by the end of the year
  • They would like to use MBIDs internally to make communication between incoming data sets easier
  • Will consider sharing partial label feed/data
  • Might actually solve their artist disambiguation issue soon..ish

Zvooq

  • They are a Spotify competitor in Russia that focuses on music released worldwide

musiXmatch

  • They are a lyrics database
  • World wide license from Sony, Universal, EMI, Warner, BMG, Kobalt

Freebase

  • Freebase is a big data repository of various data sets covering movies, music, sports, people, locations, and others
  • http://freebase.com

BBC

  • They are looking to finally make the switch to NGS
  • Their music news website now uses ws/2
  • They outsource their album reviews and MB data entry to Unique Broadcasting Company

Discussions

Friday (Oct 14)

Single sign on & password security

Goals

  • Not storing plaintext passwords
  • Not having knowable (i.e. reversible) passwords
  • Not transmitting passwords in the clear
  • Single sign on

Questions

  • What specific password issues are we trying to solve?

Discussed proposals

  • Implement OpenID
  • Using digest authentication (still requires storing and transferring the clear text password)
  • Using SSL (requires updating web service libraries)
  • Using a separate LDAP server (password no longer in MB database and stored elsewhere, also allows for possible single sign on integration)

Conclusion: Use LDAP and phase in SSL to increase password security. Bonus: LDAP makes single sign on possible.

Saturday (Oct 15)

Cover art archive

  • Universal is considering handing over their entire cover art archive to us
  • Labels actually don't own copyright on cover art
  • There are potential messy legal issues to using cover art
  • The Internet Archive functions as a library and can act as a 'cover art shelter' for us
  • Possible process:
    • A release's MBID can be used to receieve a cover art image
    • If you know a release's MBID you can do a GET and receive a cover art image
    • Track cover art uploads by user and also use regular voting process
    • Images will be provided as a hi-res (~15 MB) and as a low-res (500 px)

Questions

  • Does the user have to upload JPEG or can the server transcode?
  • What status code will we return when a 'darkened' image exists but we're not allowed to display it?
  • If we get cover art from Universal, how do we match each image up with a release?
  • How do we handle a release group with many releases (i.e., do we use the same image?)
  • How do we handle multiple images (e.g. front, back, obi, liner notes, cd faces, etc.)

Cover_Art_Archive

Cover_Art_Wishlist

Data quality

  • See Sunday

Edit system

Goals

  • Allow grouping edits together and bulk submitting them
  • Allow editing an edit and resubmitting it without impacting the edit queue
  • Allow editing via the web service, eventually

Bookbrainz

  • Oliver's pet project and testing ground for future MB framework changes
  • BB emulates git
    • it allows building a stack of changes and then submitting all of them together in one 'commit'
    • It takes a snapshot of the data at the time. We don't have that with historical edits so migrating old edits is a problem

Further reading regarding

Web service

  • Roll out 3scale and move all commercial users over to a pay2play system with different packages
  • Non-commercial users would use the free2play rate-limited system with the option of paying for better access

Audio fingerprinting

  • We all hate PUIDs and we need to move forward
  • Acoustid looks very promising, it's open source, file oriented, and has strong ties with MB
  • http://acoustid.org/
  • May be possible to bulk fingerprint some data sources

Concert support

  • Do we go with one provider or several?
    • Start with Songkick, but stay open to the option of different providers - especially to gain global coverage
  • Do we concentrate on future events or archived events?
    • Initially link to Songkick for future events
    • Create a new setlist entity for past events
    • Create a new venue entity
  • Need to consider Location, would be useful for artist as well as for events.

Tracks vs recordings (vs works)

  • Similar to the remaster issue
  • Do we add further levels of abstraction?
    • No. We're already saturated with entities. We need better definitions
    • ...and we still haven't totally defined works
  • Do we count silence as a divergence point?

Service segregation

  • Announce the closing of trac (and all its tickets) and the deprecation of subversion
  • svn.musicbrainz.org will remain as an interface for the search server
  • Consider replacing gitweb with github in a more official capacity

Genres

  • A new field that is to be used specifically for genres
  • Features: autocomplete, canonical names,
  • Micah is offering genre data based on wikipedia

Product offering

This is not a complete or final model and not official!
  • "Drug dealer" model - free the first time, get addicted, pay for easy further access
  • Data dumps (twice a week)
    • Public $100* *suggested
    • CC-NC $250 (Paying for commercial use of NC use data)
  • Live data feed ($/mth)
    • Twice-weekly $500
    • Daily $1500
    • Hourly $2500
  • Web service calls (flat fee)
    • 10K $10
    • 25K $20
    • 50K $30
    • 100K $50
  • Virtual machine
    • VM + Data $300
    • VM + Data + Search $400
  • Tagger Affiliate Program
    • TBD: Clarification of the scope of the program
    • TBD: Web service referral kickbacks

Sunday (Oct 16)

3rd party data set integration

  • Lyrics from musiXmatch
    • daily updates, but will start with weekly ones
    • updates will include all MB/mXm matched lyrics
    • lyrics can be added also from edit interface
    • How do we best use their lyrics data?
      • Solution: Link to mxm via a lyrics icon in the tracklist and a proper link on the recording page
  • See also Monday

Tracklist/medium overhaul with video support

  • Videos are becoming increasingly common as a music release medium (e.g. iTunes)
  • Will require major schema changes and looking at the long term goals of MusicBrainz
  • Solution: Table the discussion for now, reopen in a different setting with developers

Group multiple release events (country+date) together

  • There is a need to group multiple releases together when each release is the exact same - just released in a different country
  • Due to tradition, different countries/regions issue releases on different days of the week
  • Solution: Allow multiple release events per release when the label, barcode, and tracklist is the same

Date improvements

  • Unknown end date (dead/disbanded, but we don't exactly know when)
    • Solution: Add a column to the date table to specifically state that the entity is dead/disbanded, but we don't know when
  • Fuzzy dates (16th century composer edge cases)
    • Solution: Use a 'century' column

Data quality

  • User:Wizzcat/Data_Quality_Extension
  • http://wiki.xabbu.net/Data_quality
  • Current implementation of data quality has a bad name, is poorly defined, and isn't used
  • What do we want to solve?
    • Explicitly state that a release has been reviewed/verified
      • Solution: +1 / -1 votes that decay in weight over some function of time
    • Protect against ignorance (The White Album vs The Beatles)
      • Solution: Add a 'Protected' flag (i.e. edits expire by default)
    • Measure of completeness
      • Solution: "Completed as per liner notes" checkbox that is accessible via the WS
  • Conclusion: High quality is the protected flag, default quality is default, low quality goes away

Release group attributes

  • Currently, 'remix' and 'soundtrack' are at the same meta level as 'album' or 'lp'
  • Conclusion: Postponed till a proposal can be drawn up

Reports

  • Improve the explanation that is shown at the top of each reports' page
  • Improve report flow (e.g. ability to hide items from reports)
  • Allow marking an entry as 'done'
  • Default report list should filter out all entries marked as done with more than X votes
  • Allow viewing the report with the filtered out entries

Site notifications + subscriptions

  • List all emails in a site inbox
  • Create a dynamic list of subscribed artists with open edits

Testing

  • As finances improve, employ a dedicated person that will lead the testing

Pagination

  • Filter on release group properties
  • Use infinite scroll
  • Be able to reorder, add, remove, and sort columns

Medium attributes (12" vinyl, 8 cm CD)

  • Switch from a hierarchical tree to attributes

Music dashboard

Instrument tree

  • Change from a tree to a graph
    • Flatten the graph into a tree and allow an instrument to have multiple parents
  • Add model support to the instrument tree
  • Importing freebase data
    • How often do we sync the data?
    • How do we reconcile differences in data?
    • How often do deletes/merges/changes happen?
  • Going forward, if we need a new instrument we would add it to freebase

Universal Music Group International

  • "I am very happy to declare Universal's support for MusicBrainz and its community" - Innovation Manager at Universal Music Group International

Release editor

  • Default tracklist page shows the advanced view
  • For new releases you see the add disc dialog
  • The track parser moves into the add disc dialog
  • There needs to be a way to reparse from the advanced view

Wiki

  • Remove unneeded extensions
  • Update to Ubuntu's MediaWiki package
  • Get the API working
  • Install wiki at /wiki/Article and then redirect to /Article
  • Write a wiki test suite

Monday (Oct 17th)

Initial dates on release group

  • Last.fm would like to create 'best of the decade' lists and filter out data such as the 2009 re-release of The Beatles
  • Currently, release group dates match the date of the earliest release in that group, but in the case of re-releases we often only have data on the modern release and are missing (for example) the original '70s vinyl release
  • Solution: Add an editable initial date field at the release group level
    • The date field will default to empty because anyone who wants the group date can guess via its earliest release (like MB does now)

musiXmatch

  • short description of musiXmatch expectations
  • feedback from MB Editors on musiXmatch contributions
  • Editors' willingness to help musiXmatch (IRC channel)
  • musiXmatch will report unexpected Edit Interface behaviour (for example Split Artists while adding a Release)
  • change usernames to make them easily identifiable (add customer name to username)
  • provide guidelines for interactions between MB Editors and external Editors


3rd party data set integration

  • How do we properly link to different data sets? (e.g., musiXmatch, soundunwound, last.fm, etc.)
    • Solution: Build a generic framework that allows us to import any external data set and reconcile it with the data we have
    • Use a second "integration database" that contains all raw data from external sources (label feeds, partners, etc.)
    • Import data into the main database with a de-duplication script, but do not remove any of the original raw data (this allows further parsing in the future)
    • Also look into Google Refine for manual reconciliation: http://code.google.com/p/google-refine/
  • A long term goal is to create an editing API that we can gradually open up to our data partners and the ecosystem
    • This will allow partners like Zvooq to edit data on their website, but feed the changes back to the rest of the MB ecosystem

Feature prioritization

Feature (votes):

  1. Edit system (9)
  2. Group multiple release events together (6)
  3. Data quality (6)
  4. 3rd party data set integration (5)
  5. Single sign on & password security (5)
  6. Instrument tree (4)
  7. Genres (4)
  8. Medium attributes (4)
  9. Release group attributes (3)
  10. Music dashboard (2)
  11. Tracklist/medium overhaul with video support (1)
  12. Pagination (1)
  13. Site notifications (1)
  14. Report improvements (1)
  15. Date improvements (0)
  16. Auto-editor elections (0)
  17. Full classical support (0)